Website/slides/cse4562sp2018/2018-02-26-AccessPathsAndAggregates.html

463 lines
20 KiB
HTML
Raw Normal View History

2018-02-26 00:30:14 -05:00
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>CSE 4/562 - Spring 2018</title>
<meta name="description" content="CSE 4/562 - Spring 2018">
<meta name="author" content="Oliver Kennedy">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="../reveal.js-3.6.0/css/reveal.css">
<link rel="stylesheet" href="ubodin.css" id="theme">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="../reveal.js-3.6.0/lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../reveal.js-3.6.0/css/print/pdf.css' : '../reveal.js-3.6.0/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<script src="../reveal.js-3.6.0/lib/js/head.min.js"></script>
<!--[if lt IE 9]>
<script src="../reveal.js-3.6.0/lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="header">
<!-- Any Talk-Specific Header Content Goes Here -->
CSE 4/562 - Database Systems
</div>
<div class="slides">
<section>
<h1>Index Scans, Aggregates</h1>
<h3>CSE 4/562 Database Systems</h3>
<h5>February 26, 2018</h5>
</section>
<!-- ============================================ -->
<section>
<section>
<h3>General Query Optimizers</h3>
<ol style="font-size: 60%">
<li class="fragment" data-fragment-index="1">Apply blind heuristics (e.g., push down selections)</li>
<li class="fragment" data-fragment-index="2">Enumerate all possible <i>execution plans</i> by varying (or for a reasonable subset)
<ul>
<li>Join/Union Evaluation Order (commutativity, associativity, distributivity)</li>
<li class="fragment" data-fragment-index="3">Algorithms for Joins, Aggregates, Sort, Distinct, and others</li>
<li class="fragment" data-fragment-index="3">Data Access Paths</li>
</ul>
</li>
<li class="fragment" data-fragment-index="4">Estimate the cost of each execution plan</li>
<li class="fragment" data-fragment-index="5">Pick the execution plan with the lowest cost</li>
</ol>
</section>
</section>
<section>
<!-- 2018 by OK:
Need to get across:
- Define access paths
- Optimization in action
- Define INLJ
- Hint at cost optimization
-->
<section>
<h3>Data Access Paths</h3>
<p>Original Query: $\pi_A\left(\sigma_{B = 1 \wedge C < 3}(R)\right)$</p>
<p>Possible Implementations:<ul>
<li>Full Table Scan</li>
<li>Index Scan on Tree/Hash Index over $B$</li>
<li>Index Scan on Tree Index over $C$</li>
<li>Index Scan on Tree Index over $B,C$</li>
</ul></p>
</section>
<section>
<h3>Full Table Scan</h3>
<ol>
<li>Project down to $A$, reading from...</li>
<li>Filter on $B = 1 \wedge C < 3$, reading from...</li>
<li>All of the rows in $R$</li>
</ol>
</section>
<section>
<h3>Index Scan on Tree/Hash Index over $B$</h3>
<ol>
<li>Project down to $A$, reading from...</li>
<li>Filter on $C < 3$, reading from...</li>
<li>The index which provides all rows where $R.B = 1$</li>
</ol>
</section>
<section>
<h3>Index Scan on Tree Index over $C$</h3>
<ol>
<li>Project down to $A$, reading from...</li>
<li>Filter on $B = 1$, reading from...</li>
<li>The index which provides all rows where $R.C < 3$</li>
</ol>
</section>
<section>
<h3>Index Scan on Tree Index over $B, C$</h3>
<p>Lexical Sort: First sort by B, with C as a tiebreaker.</p>
<ol>
<li>Project down to $A$, reading from...</li>
<li>The index which provides all rows between <br/>$\left\lt 1, -\infty\right\gt \lt \left\lt R.B, R.C\right\gt \lt \left\lt 1, 3\right\gt$ </li>
</ol>
</section>
<section>
<p>Which index to use (if several are available)?</p>
</section>
</section>
<section>
<section>
<h3>Strategies for Implementing $(\ldots \bowtie_{c} S)$</h3>
<dl>
<dt>Sort/Merge Join</dt>
<dd>Sort all of the data upfront, then scan over both sides.</dd>
<dt>In-Memory Index Join (1-pass Hash; Hash Join)</dt>
<dd>Build an in-memory index on one table, scan the other.</dd>
<dt>Partition Join (2-pass Hash; External Hash Join)</dt>
<dd>Partition both sides so that tuples don't join across partitions.</dd>
<dt class="fragment" data-fragment-index="1">Index Nested Loop Join</dt>
<dd class="fragment" data-fragment-index="1">Use an <i>existing</i> index instead of buildling one.</dd>
</dl>
</section>
<section>
<h3>Index Nested Loop Join</h3>
To compute $R \bowtie_{R.A < S.B} S$ with an index on $S.B$
<ol>
<li>Read One Row of $R$</li>
<li>Get the value of $R.A$</li>
<li>Start index scan on $S.B > [R.A]$</li>
<li>Return rows as normal</li>
</ol>
</section>
</section>
<section>
<section>
Aggregation
</section>
<section>
<h3>Aggregation</h3>
<dl>
<div class="fragment">
<dt>Normal Aggregates</dt>
<dd><code class="sql">SELECT COUNT(*) FROM R</code></dd>
<dd><code class="sql">SELECT SUM(A) FROM R</code></dd>
</div>
<div class="fragment">
<dt>Group-By Aggregates</dt>
<dd><code class="sql">SELECT A, SUM(B) FROM R GROUP BY A</code></dd>
</div>
<div class="fragment">
<dt>Distinct</dt>
<dd><code class="sql">SELECT DISTINCT A FROM R</code></dd>
<dd class="fragment"><code class="sql">SELECT A FROM R GROUP BY A</code></dd>
</div>
</dl>
</section>
<section>
<h3>Normal Aggregates</h3>
<table style="font-size: 80%;" class="fragment">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr><td>189565</td><td>'American linden'</td><td>'Brooklyn'</td><td>21</td></tr>
<tr><td style="font-weight: bold;" colspan="3">... and 683783 more</td></tr>
</table>
<div style="margin-top: 60px">
<code class="sql fragment">SELECT COUNT(*) FROM TREES</code>
</div>
</section>
<section>
<table style="font-size: 70%;">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr class="fragment" data-fragment-index="1"><td colspan="4">COUNT = 0</td></tr>
<tr class="fragment" data-fragment-index="2" style="font-size: smaller;"><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="3"><td colspan="4">COUNT = 1</td></tr>
<tr class="fragment" data-fragment-index="4" style="font-size: smaller;"><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="5"><td colspan="4">COUNT = 2</td></tr>
<tr class="fragment" data-fragment-index="6" style="font-size: smaller;"><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="6"><td colspan="4">COUNT = 3</td></tr>
<tr class="fragment" data-fragment-index="7" style="font-size: smaller;"><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr class="fragment" data-fragment-index="7"><td colspan="4">COUNT = 4</td></tr>
<tr class="fragment" data-fragment-index="8" style="font-size: smaller;"><td>189565</td><td>'American linden'</td><td>'Brooklyn'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="8"><td colspan="4">COUNT = 5</td></tr>
<tr class="fragment" data-fragment-index="9"><td style="font-weight: bold;" colspan="3">... and 683783 more</td></tr>
<tr class="fragment" data-fragment-index="9"><td colspan="4">COUNT = 683788</td></tr>
</table>
</section>
<section>
<div style="margin-bottom: 60px">
<code class="sql">SELECT SUM(TREE_DBH) FROM TREES</code>
</div>
<table style="font-size: 70%;">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr class="fragment" data-fragment-index="1"><td colspan="4">SUM = 0</td></tr>
<tr class="fragment" data-fragment-index="2" style="font-size: smaller;"><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="3"><td colspan="4">SUM = 3</td></tr>
<tr class="fragment" data-fragment-index="4" style="font-size: smaller;"><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="5"><td colspan="4">SUM = 24</td></tr>
<tr class="fragment" data-fragment-index="6" style="font-size: smaller;"><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="6"><td colspan="4">SUM = 27</td></tr>
<tr class="fragment" data-fragment-index="7" style="font-size: smaller;"><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr class="fragment" data-fragment-index="7"><td colspan="4">SUM = 37</td></tr>
<tr class="fragment" data-fragment-index="8" style="font-size: smaller;"><td>189565</td><td>'American linden'</td><td>'Brooklyn'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="8"><td colspan="4">SUM = 58</td></tr>
<tr class="fragment" data-fragment-index="9"><td style="font-weight: bold;" colspan="3">... and 683783 more</td></tr>
</table>
</section>
<section>
<h3>Basic Aggregate Pattern</h3>
<p class="fragment" style="font-size: 70%">This is also sometimes called a "fold"</p>
<dl>
<dt>Init</dt>
<dd>Define a starting value for the accumulator</dd>
<dt>Fold(Accum, New)</dt>
<dd>Merge a new value into the accumulator</dd>
</dl>
</section>
<section>
<h3>COUNT(*)</h3>
<dl>
<dt>Init</dt>
<dd class="fragment">$0$</dd>
<dt>Fold(Accum, New)</dt>
<dd class="fragment">$Accum + 1$</dd>
</dl>
</section>
<section>
<h3>SUM(A)</h3>
<dl>
<dt>Init</dt>
<dd class="fragment">$0$</dd>
<dt>Fold(Accum, New)</dt>
<dd class="fragment">$Accum + New$</dd>
</dl>
</section>
<section>
<h3>AVG(A)</h3>
<dl>
<dt>Init</dt>
<dd class="fragment">$\{ sum = 0, count = 0 \}$</dd>
<dt>Fold(Accum, New)</dt>
<dd class="fragment">$\{ sum = Accum.sum + New, \\\;count = Accum.count + 1\}$</dd>
<dt class="fragment">Finalize(Accum)</dt>
<dd class="fragment">$\frac{Accum.sum}{Accum.count}$</dd>
</dl>
</section>
<section>
<h3>Basic Aggregate Pattern</h3>
<dl>
<dt>Init</dt>
<dd>Define a starting value for the accumulator</dd>
<dt>Fold(Accum, New)</dt>
<dd>Merge a new value into the accumulator</dd>
<dt>Finalize(Accum)</dt>
<dd>Extract the aggregate from the accumulator.</dd>
</dl>
</section>
<section>
<h3>Basic Aggregate Types</h3>
<p class="fragment" style="font-size: 60%">Grey et. al. "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals</p>
<dl>
<dt>Distributive</dt>
2018-02-27 00:43:22 -05:00
<dd>Finite-sized accumulator and doesn't need a finalize (COUNT, SUM)</dd>
2018-02-26 00:30:14 -05:00
<dt>Algebraic</dt>
2018-02-27 00:43:22 -05:00
<dd>Finite-sized accumulator but needs a finalize (AVG)</dd>
2018-02-26 00:30:14 -05:00
<dt>Holistic</dt>
<dd>Unbounded accumulator (MEDIAN)</dd>
</dl>
</section>
</section>
<section>
<section>
<h3>Group-By Aggregates</h3>
<div style="margin-top: 60px">
<code class="sql">SELECT SPC_COMMON, COUNT(*) FROM TREES GROUP BY SPC_COMMON</code>
</div>
</section>
<section>
<p><b>Naive Idea:</b> Keep a separate accumulator for each group</p>
</section>
<section>
<table style="font-size: 70%;">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr class="fragment" data-fragment-index="1"><td colspan="4">{}</td></tr>
<tr class="fragment" data-fragment-index="2" style="font-size: smaller;"><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="3"><td colspan="4">{ 'red maple' = 1 }</td></tr>
<tr class="fragment" data-fragment-index="4" style="font-size: smaller;"><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr class="fragment" data-fragment-index="5"><td colspan="4">{ 'red maple' = 1, 'honeylocust' = 1 }</td></tr>
<tr class="fragment" data-fragment-index="6" style="font-size: smaller;"><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr class="fragment" data-fragment-index="7"><td colspan="4">{ 'red maple' = 1, 'honeylocust' = 1, 'pin oak' = 1 }</td></tr>
<tr class="fragment" data-fragment-index="8" style="font-size: smaller;"><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr class="fragment" data-fragment-index="9"><td colspan="4">{ 'red maple' = 1, 'honeylocust' = 2, 'pin oak' = 1 }</td></tr>
</table>
</section>
<section>
<p>What could go wrong?</p>
</section>
<section>
<h3>Alternative Grouping Algorithms</h3>
<dl>
<dt>2-pass Hash Aggregate</dt>
<dd>Like 2-pass Hash Join: Distribute groups across buckets, then do an in-memory aggregate for each bucket.</dd>
<dt>Sort-Aggregate</dt>
<dd>Like Sort-Merge Join: Sort data by groups, then group elements will be adjacent.</dd>
</dl>
</section>
<section>
<table style="font-size: 70%;">
<tr><th>TREE_ID</th><th>SPC_COMMON</th><th>BORONAME</th><th>TREE_DBH</th></tr>
<tr class="fragment"><td colspan="4">{}</td></tr>
<tr class="fragment" style="font-size: smaller;"><td>204337</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>10</td></tr>
<tr class="fragment"><td colspan="4">{ 'honeylocust' = 1 }</td></tr>
<tr class="fragment" style="font-size: smaller;"><td>204026</td><td>'honeylocust'</td><td>'Brooklyn'</td><td>3</td></tr>
<tr class="fragment"><td colspan="4">{ 'honeylocust' = 2 }</td></tr>
<tr class="fragment"><td colspan="4" style="font-weight: bold">... and more</td></tr>
<tr class="fragment" style="font-size: smaller;"><td>315986</td><td>'pin oak'</td><td>'Queens'</td><td>21</td></tr>
<tr class="fragment"><td colspan="4">{ <span class="fragment highlight-grey">'honeylocust' = 3206,</span> 'pin oak' = 1 }</td></tr>
<tr class="fragment"><td colspan="4" style="font-weight: bold">... and more</td></tr>
<tr class="fragment" style="font-size: smaller;"><td>180683</td><td>'red maple'</td><td>'Queens'</td><td>3</td></tr>
<tr class="fragment"><td colspan="4">{ <span class="fragment highlight-grey">'pin oak' = 53814,</span> 'red maple' = 1 }</td></tr>
</table>
</section>
</section>
<!-- ============================================ -->
</div></div>
<script src="../reveal.js-3.6.0/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/../reveal.js#configuration
Reveal.initialize({
2018-02-26 19:44:30 -05:00
controls: true,
2018-02-26 00:30:14 -05:00
progress: true,
history: true,
center: true,
slideNumber: true,
transition: 'fade', // none/fade/slide/convex/concave/zoom
chart: {
defaults: {
global: {
title: { fontColor: "#333", fontSize: 24 },
legend: {
labels: { fontColor: "#333", fontSize: 20 },
},
responsiveness: true
},
scale: {
scaleLabel: { fontColor: "#333", fontSize: 20 },
gridLines: { color: "#333", zeroLineColor: "#333" },
ticks: { fontColor: "#333", fontSize: 16 },
}
},
line: { borderColor: [ "rgba(20,220,220,.8)" , "rgba(220,120,120,.8)", "rgba(20,120,220,.8)" ], "borderDash": [ [5,10], [0,0] ]},
bar: { backgroundColor: [
"rgba(220,220,220,0.8)",
"rgba(151,187,205,0.8)",
"rgba(205,151,187,0.8)",
"rgba(187,205,151,0.8)"
]
},
pie: { backgroundColor: [ ["rgba(0,0,0,.8)" , "rgba(220,20,20,.8)", "rgba(20,220,20,.8)", "rgba(220,220,20,.8)", "rgba(20,20,220,.8)"] ]},
radar: { borderColor: [ "rgba(20,220,220,.8)" , "rgba(220,120,120,.8)", "rgba(20,120,220,.8)" ]},
},
// Optional ../reveal.js plugins
dependencies: [
{ src: '../reveal.js-3.6.0/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: '../reveal.js-3.6.0/plugin/math/math.js',
condition: function() { return true; },
mathjax: '../reveal.js-3.6.0/js/MathJax.js'
},
{ src: '../reveal.js-3.6.0/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js-3.6.0/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js-3.6.0/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: '../reveal.js-3.6.0/plugin/zoom-js/zoom.js', async: true },
{ src: '../reveal.js-3.6.0/plugin/notes/notes.js', async: true },
// Chart.min.js
{ src: '../reveal.js-3.6.0/plugin/chart/Chart.min.js'},
// the plugin
{ src: '../reveal.js-3.6.0/plugin/chart/csv2chart.js'},
{ src: '../reveal.js-3.6.0/plugin/svginline/es6-promise.auto.js', async: false },
{ src: '../reveal.js-3.6.0/plugin/svginline/data-src-svg.js', async: false }
]
});
</script>
</body>
</html>