Website/slides/cse4562sp2018/2018-02-26-AccessPathsAndAggregates.html
2018-02-27 00:43:22 -05:00

463 lines
20 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>CSE 4/562 - Spring 2018</title>
<meta name="description" content="CSE 4/562 - Spring 2018">
<meta name="author" content="Oliver Kennedy">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="../reveal.js-3.6.0/css/reveal.css">
<link rel="stylesheet" href="ubodin.css" id="theme">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="../reveal.js-3.6.0/lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../reveal.js-3.6.0/css/print/pdf.css' : '../reveal.js-3.6.0/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<script src="../reveal.js-3.6.0/lib/js/head.min.js"></script>
<!--[if lt IE 9]>
<script src="../reveal.js-3.6.0/lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="header">
<!-- Any Talk-Specific Header Content Goes Here -->
CSE 4/562 - Database Systems
</div>
<div class="slides">
<section>
<h1>Index Scans, Aggregates</h1>
<h3>CSE 4/562 Database Systems</h3>
<h5>February 26, 2018</h5>
</section>
<!-- ============================================ -->
<section>
<section>
<h3>General Query Optimizers</h3>
<ol style="font-size: 60%">
<li class="fragment" data-fragment-index="1">Apply blind heuristics (e.g., push down selections)</li>
<li class="fragment" data-fragment-index="2">Enumerate all possible <i>execution plans</i> by varying (or for a reasonable subset)
<ul>
<li>Join/Union Evaluation Order (commutativity, associativity, distributivity)</li>
<li class="fragment" data-fragment-index="3">Algorithms for Joins, Aggregates, Sort, Distinct, and others</li>
<li class="fragment" data-fragment-index="3">Data Access Paths</li>
</ul>
</li>
<li class="fragment" data-fragment-index="4">Estimate the cost of each execution plan</li>
<li class="fragment" data-fragment-index="5">Pick the execution plan with the lowest cost</li>
</ol>
</section>
</section>
<section>
<!-- 2018 by OK:
Need to get across:
- Define access paths
- Optimization in action
- Define INLJ
- Hint at cost optimization
-->
<section>
<h3>Data Access Paths</h3>
<p>Original Query: $\pi_A\left(\sigma_{B = 1 \wedge C < 3}(R)\right)$</p>
<p>Possible Implementations:<ul>
<li>Full Table Scan</li>
<li>Index Scan on Tree/Hash Index over $B$</li>
<li>Index Scan on Tree Index over $C$</li>
<li>Index Scan on Tree Index over $B,C$</li>
</ul></p>
</section>
<section>
<h3>Full Table Scan</h3>
<ol>
<li>Project down to $A$, reading from...</li>
<li>Filter on $B = 1 \wedge C < 3$, reading from...</li>
<li>All of the rows in $R$</li>
</ol>
</section>
<section>
<h3>Index Scan on Tree/Hash Index over $B$</h3>
<ol>
<li>Project down to $A$, reading from...</li>
<li>Filter on $C < 3$, reading from...</li>
<li>The index which provides all rows where $R.B = 1$</li>
</ol>
</section>
<section>
<h3>Index Scan on Tree Index over $C$</h3>
<ol>
<li>Project down to $A$, reading from...</li>
<li>Filter on $B = 1$, reading from...</li>
<li>The index which provides all rows where $R.C < 3$</li>
</ol>
</section>
<section>
<h3>Index Scan on Tree Index over $B, C$</h3>
<p>Lexical Sort: First sort by B, with C as a tiebreaker.</p>
<ol>
<li>Project down to $A$, reading from...</li>
<li>The index which provides all rows between <br/>$\left\lt 1, -\infty\right\gt \lt \left\lt R.B, R.C\right\gt \lt \left\lt 1, 3\right\gt$ </li>
</ol>
</section>
<section>
<p>Which index to use (if several are available)?</p>
</section>
</section>
<section>
<section>
<h3>Strategies for Implementing $(\ldots \bowtie_{c} S)$</h3>
<dl>
<dt>Sort/Merge Join</dt>
<dd>Sort all of the data upfront, then scan over both sides.</dd>
<dt>In-Memory Index Join (1-pass Hash; Hash Join)</dt>
<dd>Build an in-memory index on one table, scan the other.</dd>
<dt>Partition Join (2-pass Hash; External Hash Join)</dt>
<dd>Partition both sides so that tuples don't join across partitions.</dd>
<dt class="fragment" data-fragment-index="1">Index Nested Loop Join</dt>
<dd class="fragment" data-fragment-index="1">Use an <i>existing</i> index instead of buildling one.</dd>
</dl>
</section>
<section>
<h3>Index Nested Loop Join</h3>
To compute $R \bowtie_{R.A < S.B} S$ with an index on $S.B$
<ol>
<li>Read One Row of $R$</li>
<li>Get the value of $R.A$</li>
<li>Start index scan on $S.B > [R.A]$</li>
<li>Return rows as normal</li>
</ol>
</section>
</section>
<section>
<section>
Aggregation
</section>
<section>
<h3>Aggregation</h3>
<dl>
<div class="fragment">
<dt>Normal Aggregates</dt>
<dd><code class="sql">SELECT COUNT(*) FROM R</code></dd>
<dd><code class="sql">SELECT SUM(A) FROM R</code></dd>
</div>
<div class="fragment">
<dt>Group-By Aggregates</dt>
<dd><code class="sql">SELECT A, SUM(B) FROM R GROUP BY A</code></dd>
</div>
<div class="fragment">
<dt>Distinct</dt>
<dd><code class="sql">SELECT DISTINCT A FROM R</code></dd>
<dd class="fragment"><code class="sql">SELECT A FROM R GROUP BY A</code></dd>
</div>
</dl>
</section>
<section>
<h3>Normal Aggregates</h3>
<table style="font-size: 80%;" class="fragment">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr><td>189565</td><td>'American linden'</td><td>'Brooklyn'</td><td>21</td></tr>
<tr><td style="font-weight: bold;" colspan="3">... and 683783 more</td></tr>
</table>
<div style="margin-top: 60px">
<code class="sql fragment">SELECT COUNT(*) FROM TREES</code>
</div>
</section>
<section>
<table style="font-size: 70%;">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr class="fragment" data-fragment-index="1"><td colspan="4">COUNT = 0</td></tr>
<tr class="fragment" data-fragment-index="2" style="font-size: smaller;"><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="3"><td colspan="4">COUNT = 1</td></tr>
<tr class="fragment" data-fragment-index="4" style="font-size: smaller;"><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="5"><td colspan="4">COUNT = 2</td></tr>
<tr class="fragment" data-fragment-index="6" style="font-size: smaller;"><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="6"><td colspan="4">COUNT = 3</td></tr>
<tr class="fragment" data-fragment-index="7" style="font-size: smaller;"><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr class="fragment" data-fragment-index="7"><td colspan="4">COUNT = 4</td></tr>
<tr class="fragment" data-fragment-index="8" style="font-size: smaller;"><td>189565</td><td>'American linden'</td><td>'Brooklyn'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="8"><td colspan="4">COUNT = 5</td></tr>
<tr class="fragment" data-fragment-index="9"><td style="font-weight: bold;" colspan="3">... and 683783 more</td></tr>
<tr class="fragment" data-fragment-index="9"><td colspan="4">COUNT = 683788</td></tr>
</table>
</section>
<section>
<div style="margin-bottom: 60px">
<code class="sql">SELECT SUM(TREE_DBH) FROM TREES</code>
</div>
<table style="font-size: 70%;">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr class="fragment" data-fragment-index="1"><td colspan="4">SUM = 0</td></tr>
<tr class="fragment" data-fragment-index="2" style="font-size: smaller;"><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="3"><td colspan="4">SUM = 3</td></tr>
<tr class="fragment" data-fragment-index="4" style="font-size: smaller;"><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="5"><td colspan="4">SUM = 24</td></tr>
<tr class="fragment" data-fragment-index="6" style="font-size: smaller;"><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="6"><td colspan="4">SUM = 27</td></tr>
<tr class="fragment" data-fragment-index="7" style="font-size: smaller;"><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr class="fragment" data-fragment-index="7"><td colspan="4">SUM = 37</td></tr>
<tr class="fragment" data-fragment-index="8" style="font-size: smaller;"><td>189565</td><td>'American linden'</td><td>'Brooklyn'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="8"><td colspan="4">SUM = 58</td></tr>
<tr class="fragment" data-fragment-index="9"><td style="font-weight: bold;" colspan="3">... and 683783 more</td></tr>
</table>
</section>
<section>
<h3>Basic Aggregate Pattern</h3>
<p class="fragment" style="font-size: 70%">This is also sometimes called a "fold"</p>
<dl>
<dt>Init</dt>
<dd>Define a starting value for the accumulator</dd>
<dt>Fold(Accum, New)</dt>
<dd>Merge a new value into the accumulator</dd>
</dl>
</section>
<section>
<h3>COUNT(*)</h3>
<dl>
<dt>Init</dt>
<dd class="fragment">$0$</dd>
<dt>Fold(Accum, New)</dt>
<dd class="fragment">$Accum + 1$</dd>
</dl>
</section>
<section>
<h3>SUM(A)</h3>
<dl>
<dt>Init</dt>
<dd class="fragment">$0$</dd>
<dt>Fold(Accum, New)</dt>
<dd class="fragment">$Accum + New$</dd>
</dl>
</section>
<section>
<h3>AVG(A)</h3>
<dl>
<dt>Init</dt>
<dd class="fragment">$\{ sum = 0, count = 0 \}$</dd>
<dt>Fold(Accum, New)</dt>
<dd class="fragment">$\{ sum = Accum.sum + New, \\\;count = Accum.count + 1\}$</dd>
<dt class="fragment">Finalize(Accum)</dt>
<dd class="fragment">$\frac{Accum.sum}{Accum.count}$</dd>
</dl>
</section>
<section>
<h3>Basic Aggregate Pattern</h3>
<dl>
<dt>Init</dt>
<dd>Define a starting value for the accumulator</dd>
<dt>Fold(Accum, New)</dt>
<dd>Merge a new value into the accumulator</dd>
<dt>Finalize(Accum)</dt>
<dd>Extract the aggregate from the accumulator.</dd>
</dl>
</section>
<section>
<h3>Basic Aggregate Types</h3>
<p class="fragment" style="font-size: 60%">Grey et. al. "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals</p>
<dl>
<dt>Distributive</dt>
<dd>Finite-sized accumulator and doesn't need a finalize (COUNT, SUM)</dd>
<dt>Algebraic</dt>
<dd>Finite-sized accumulator but needs a finalize (AVG)</dd>
<dt>Holistic</dt>
<dd>Unbounded accumulator (MEDIAN)</dd>
</dl>
</section>
</section>
<section>
<section>
<h3>Group-By Aggregates</h3>
<div style="margin-top: 60px">
<code class="sql">SELECT SPC_COMMON, COUNT(*) FROM TREES GROUP BY SPC_COMMON</code>
</div>
</section>
<section>
<p><b>Naive Idea:</b> Keep a separate accumulator for each group</p>
</section>
<section>
<table style="font-size: 70%;">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr class="fragment" data-fragment-index="1"><td colspan="4">{}</td></tr>
<tr class="fragment" data-fragment-index="2" style="font-size: smaller;"><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="3"><td colspan="4">{ 'red maple' = 1 }</td></tr>
<tr class="fragment" data-fragment-index="4" style="font-size: smaller;"><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr class="fragment" data-fragment-index="5"><td colspan="4">{ 'red maple' = 1, 'honeylocust' = 1 }</td></tr>
<tr class="fragment" data-fragment-index="6" style="font-size: smaller;"><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="7"><td colspan="4">{ 'red maple' = 1, 'honeylocust' = 1, 'pin oak' = 1 }</td></tr>
<tr class="fragment" data-fragment-index="8" style="font-size: smaller;"><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="9"><td colspan="4">{ 'red maple' = 1, 'honeylocust' = 2, 'pin oak' = 1 }</td></tr>
</table>
</section>
<section>
<p>What could go wrong?</p>
</section>
<section>
<h3>Alternative Grouping Algorithms</h3>
<dl>
<dt>2-pass Hash Aggregate</dt>
<dd>Like 2-pass Hash Join: Distribute groups across buckets, then do an in-memory aggregate for each bucket.</dd>
<dt>Sort-Aggregate</dt>
<dd>Like Sort-Merge Join: Sort data by groups, then group elements will be adjacent.</dd>
</dl>
</section>
<section>
<table style="font-size: 70%;">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr class="fragment"><td colspan="4">{}</td></tr>
<tr class="fragment" style="font-size: smaller;"><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr class="fragment"><td colspan="4">{ 'honeylocust' = 1 }</td></tr>
<tr class="fragment" style="font-size: smaller;"><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr class="fragment"><td colspan="4">{ 'honeylocust' = 2 }</td></tr>
<tr class="fragment"><td colspan="4" style="font-weight: bold">... and more</td></tr>
<tr class="fragment" style="font-size: smaller;"><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr class="fragment"><td colspan="4">{ <span class="fragment highlight-grey">'honeylocust' = 3206,</span> 'pin oak' = 1 }</td></tr>
<tr class="fragment"><td colspan="4" style="font-weight: bold">... and more</td></tr>
<tr class="fragment" style="font-size: smaller;"><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr class="fragment"><td colspan="4">{ <span class="fragment highlight-grey">'pin oak' = 53814,</span> 'red maple' = 1 }</td></tr>
</table>
</section>
</section>
<!-- ============================================ -->
</div></div>
<script src="../reveal.js-3.6.0/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/../reveal.js#configuration
Reveal.initialize({
controls: true,
progress: true,
history: true,
center: true,
slideNumber: true,
transition: 'fade', // none/fade/slide/convex/concave/zoom
chart: {
defaults: {
global: {
title: { fontColor: "#333", fontSize: 24 },
legend: {
labels: { fontColor: "#333", fontSize: 20 },
},
responsiveness: true
},
scale: {
scaleLabel: { fontColor: "#333", fontSize: 20 },
gridLines: { color: "#333", zeroLineColor: "#333" },
ticks: { fontColor: "#333", fontSize: 16 },
}
},
line: { borderColor: [ "rgba(20,220,220,.8)" , "rgba(220,120,120,.8)", "rgba(20,120,220,.8)" ], "borderDash": [ [5,10], [0,0] ]},
bar: { backgroundColor: [
"rgba(220,220,220,0.8)",
"rgba(151,187,205,0.8)",
"rgba(205,151,187,0.8)",
"rgba(187,205,151,0.8)"
]
},
pie: { backgroundColor: [ ["rgba(0,0,0,.8)" , "rgba(220,20,20,.8)", "rgba(20,220,20,.8)", "rgba(220,220,20,.8)", "rgba(20,20,220,.8)"] ]},
radar: { borderColor: [ "rgba(20,220,220,.8)" , "rgba(220,120,120,.8)", "rgba(20,120,220,.8)" ]},
},
// Optional ../reveal.js plugins
dependencies: [
{ src: '../reveal.js-3.6.0/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: '../reveal.js-3.6.0/plugin/math/math.js',
condition: function() { return true; },
mathjax: '../reveal.js-3.6.0/js/MathJax.js'
},
{ src: '../reveal.js-3.6.0/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js-3.6.0/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js-3.6.0/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: '../reveal.js-3.6.0/plugin/zoom-js/zoom.js', async: true },
{ src: '../reveal.js-3.6.0/plugin/notes/notes.js', async: true },
// Chart.min.js
{ src: '../reveal.js-3.6.0/plugin/chart/Chart.min.js'},
// the plugin
{ src: '../reveal.js-3.6.0/plugin/chart/csv2chart.js'},
{ src: '../reveal.js-3.6.0/plugin/svginline/es6-promise.auto.js', async: false },
{ src: '../reveal.js-3.6.0/plugin/svginline/data-src-svg.js', async: false }
]
});
</script>
</body>
</html>