pull/1/head
Oliver Kennedy 2021-02-18 13:28:10 -05:00
parent 41f986b1b4
commit be815d9adb
Signed by: okennedy
GPG Key ID: 3E5F9B3ABD3FDB60
2 changed files with 41 additions and 13 deletions

View File

@ -19,6 +19,15 @@ textbook: "Ch. 15.1-15.5, 16.7"
Might help to tighten up the time spent a little too. I had to cut out before introducing Sort-Merge Joins
-->
<section>
<h3>News</h3>
<ul>
<li>Homework 1 assigned last night, due Weds night.</li>
<li>Checkpoint 1 posted Sunday. Submissions open tonight.</li>
</ul>
</section>
<section>
<section>
@ -40,9 +49,9 @@ textbook: "Ch. 15.1-15.5, 16.7"
<h3>Analyzing Volcano Operators</h3>
<ul>
<li class="fragment highlight-grey" data-fragment-index="1">CPU Used</li>
<li>Memory Bounds</li>
<li>Disk IO Used</li>
<li class="fragment highlight-grey" data-fragment-index="1">CPU Used</li>
</ul>
<p class="fragment" data-fragment-index="1" style="margin-top: 30px;"><u>Data</u>bases are usually IO- or Memory-bound</p>
@ -81,7 +90,7 @@ textbook: "Ch. 15.1-15.5, 16.7"
<section>
<h3>Note</h3>
<p>We'll be discussing the "default" algorithm for each operator.</p>
<p>So far, we've been pretending that each operator has one algorithm.</p>
<p class="fragment">Often, there are many algorithms, some of which cover multiple operators.</p>
@ -218,8 +227,8 @@ textbook: "Ch. 15.1-15.5, 16.7"
<p>How many IOs do we need to compute $Q := R \times S$?</p>
<ol>
<li class="fragment">Getting an Iterator on $R$: 100 tuples</li>
<li class="fragment">Getting an Iterator on $S$: 20 tuples</li>
<li class="fragment">Getting an Iterator on $R$: 20 tuples</li>
<li class="fragment">Getting an Iterator on $S$: 100 tuples</li>
<li class="fragment">Getting an Iterator on $R \times S$ using the above iterators: </li>
</ol>
</section>
@ -231,7 +240,7 @@ textbook: "Ch. 15.1-15.5, 16.7"
<li class="fragment"><b>Cache</b>: $|R| \times |S| = 20 \times 100 = 2000$ extra tuples</li>
</ul>
<p class="fragment"><b>Best Total Cost</b> $100 + 20 + 1900 = 2020$</p>
<p class="fragment"><b>Best Total Cost</b> $100 + 20 + 1900 = 2010$</p>
</section>
<section>
@ -241,7 +250,7 @@ textbook: "Ch. 15.1-15.5, 16.7"
<p>How many IOs do we need to compute $Q := R \times \sigma_c(R \times S)$</p>
<ol>
<li class="fragment">Getting an Iterator on $\sigma_c(R \times S)$: 2020 tuples</li>
<li class="fragment">Getting an Iterator on $\sigma_c(R \times S)$: 2010 tuples</li>
<li class="fragment">Getting an Iterator on $R$: 20 tuples</li>
<li class="fragment">Getting an Iterator on $R \times \sigma_c(R \times S)$ using the above iterators: </li>
</ol>
@ -250,15 +259,15 @@ textbook: "Ch. 15.1-15.5, 16.7"
<section>
<ul>
<li><b>Memory</b>: 0 extra tuples</li>
<li class="fragment"><b>Replay</b>: $(|R|-1) \times \texttt{cost}(\sigma_c(R \times S)) = 19 \times 2020 = 38380$ extra tuples</li>
<li class="fragment"><b>Cache</b>: $|R| \times |S| = 20 \times 200 = 4000$ extra tuples</li>
<li class="fragment"><b>Replay</b>: $(|R|-1) \times \texttt{cost}(\sigma_c(R \times S)) = 19 \times 2010 = 38190$ extra tuples</li>
<li class="fragment"><b>Cache</b>: $|R| \times (0.1 \times (|R| \times |S|)) = 20 \times 200 = 4000$ extra tuples</li>
</ul>
<p class="fragment"><b>Best Total Cost</b> $2020 + 20 + 4000 = 6040$</p>
<p class="fragment"><b>Best Total Cost</b> $2010 + 20 + 4000 = 6030$</p>
</section>
<section>
<p>Is there a middle ground?</p>
<p>Can we do better with cartesian product<br/>(and joins)?</p>
</section>
</section>
@ -312,14 +321,15 @@ textbook: "Ch. 15.1-15.5, 16.7"
<dd class="fragment">$|S|$ tuples written.</dd>
<dd class="fragment">$(\frac{|R|}{\mathcal B} - 1) \cdot |S|$ tuples read.</dd>
</dl>
<p style="font-size: 70%;" class="fragment">In-memory caching is a special case of block-nested loop with $\mathcal B = |S|$</p>
<p style="font-size: 70%;" class="fragment">Does the block size for $R$ matter?</p>
<p style="font-size: 70%;" class="fragment">In-memory caching is a special case of block-nested loop with $\mathcal B = |R|$</p>
<p style="font-size: 70%;" class="fragment">Does the block size for $S$ matter?</p>
</section>
<section>
<p>How big should the blocks be?</p>
<aside class="notes">As big as possible! Leads to the question of distributing available memory between multiple joins: A simple linear optimization problem.</aside>
<p class="fragment">As big as possible!</p>
<p class="fragment">... but more on that later.</p>
</section>
</section>
@ -479,6 +489,21 @@ textbook: "Ch. 15.1-15.5, 16.7"
<dd>No added IO! (not counting sort).</dd>
</dl>
</section>
<section>
<h3>Recap: Joins</h3>
<dl>
<dt>Block-Nested Join</dt>
<dd>Moderate Memory, Moderate IO, High CPU</dd>
<dt>In-Memory Index Join (e.g., 1-Pass Hash)</dt>
<dd>High Memory, Low IO</dd>
<dt>Partition Join (e.g., 2-Pass Hash)</dt>
<dd>High IO, Low Memory</dd>
<dt>Sort/Merge Join</dt>
<dd>Low IO, Low Memory (But need sorted data)</dd>
</dl>
</section>
</section>
<section>

View File

@ -116,6 +116,8 @@ class_name = "CSE-4/562 Spring 2021"
{ src: '../../../../slides/reveal.js-3.7.0/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../../../../slides/reveal.js-3.7.0/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../../../../slides/reveal.js-3.7.0/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: '../../../../slides/reveal.js-3.7.0/plugin/zoom-js/zoom.js', async: true },
{ src: '../../../../slides/reveal.js-3.7.0/plugin/notes/notes.js', async: true },
// Chart.min.js
{ src: '../../../../slides/reveal.js-3.7.0/plugin/chart/Chart.min.js'},
// the plugin
@ -127,5 +129,6 @@ class_name = "CSE-4/562 Spring 2021"
</script>
</body>
</html>