class updates

This commit is contained in:
Oliver Kennedy 2019-02-26 21:04:16 -05:00
parent a612559c29
commit 7f0710f4a5
5 changed files with 917 additions and 976 deletions

View file

@ -13,7 +13,7 @@ title: CSE-4/562; Checkpoint 1
</ul>
</li>
</ul>
In this project, you will implement a simple SQL query evaluator with support for Select, Project, Join, Bag Union, and Aggregate operations.  You will receive a set of data files, schema information, and be expected to evaluate multiple SELECT queries over those data files.
In this project, you will implement a simple SQL query evaluator with support for Select, Project, Join, and Bag Union operations.  You will receive a set of data files, schema information, and be expected to evaluate multiple SELECT queries over those data files.
Your code is expected to evaluate the SELECT statements on provided data, and produce output in a standardized form. Your code will be evaluated for both correctness and performance (in comparison to a naive evaluator based on iterators and nested-loop joins).
<h1>Parsing SQL</h1>

View file

@ -5,6 +5,10 @@ date: February 25, 2019
textbook: Ch. 16
---
<!-- 2019 by OK
This went pretty well. If anything, it might be nice to adjust the \select \distinct propagation formula (uniform prior section) lists into animated tables for consistency with the rest of the presentation.
-->
<section>
<section>
<h3>General Query Optimizers</h3>
@ -216,26 +220,26 @@ textbook: Ch. 16
<section>
<table style="font-size: 70%">
<tr><th>Symbol</th><th>Parameter</th><th>Source</th></th></tr>
<tr><th>Symbol</th><th>Parameter</th><th>Type</th></th></tr>
<tr>
<td>$\mathcal P$</td><td>Tuples Per Page</td>
<td class="fragment" data-fragment-index="1">Data, Schema</td>
<td class="fragment" data-fragment-index="1">Fixed ($\frac{|\text{page}|}{|\text{tuple}|}$)</td>
</tr>
<tr>
<td>$|R|$</td><td>Size of $R$</td>
<td class="fragment" data-fragment-index="2">Data<span class="fragment" data-fragment-index="6">$^*$</span></td>
<td class="fragment" data-fragment-index="2">Precomputed<span class="fragment" data-fragment-index="6">$^*$</span> ($|R|$)</td>
</tr>
<tr>
<td>$\mathcal B$</td><td>Pages of Buffer</td>
<td class="fragment" data-fragment-index="3">User</td>
<td class="fragment" data-fragment-index="3">Configurable Parameter</td>
</tr>
<tr>
<td>$\mathcal I$</td><td>Keys per Index Page</td>
<td class="fragment" data-fragment-index="4">Data</td>
<td class="fragment" data-fragment-index="4">Fixed ($\frac{|\text{page}|}{|\text{key+pointer}|}$)</td>
</tr>
<tr>
<td>$adom(A)$</td><td>Number of distinct values of $A$</td>
<td class="fragment" data-fragment-index="5">Data<span class="fragment" data-fragment-index="6">$^*$</span></td>
<td class="fragment" data-fragment-index="5">Precomputed<span class="fragment" data-fragment-index="6">$^*$</span> ($|\delta_A(R)|$)</td>
</tr>
</table>
<p class="fragment" data-fragment-index="6" style="font-size: 50%">* unless $R$ is a query</p>
@ -246,7 +250,7 @@ textbook: Ch. 16
<section>
<section>
<p>Estimating IOs requires Estimating $|Q(R)|$</p>
<p>Estimating IOs requires Estimating $|Q(R)|$, $|\delta_A(Q(R))|$</p>
</section>
<section>
@ -412,22 +416,6 @@ textbook: Ch. 16
<section>
<section>
<h3>COUNT(DISTINCT A)</h3>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$\texttt{UNIQ}(A, \pi_{A, \ldots}(R)) = \texttt{UNIQ}(A, R)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$\texttt{UNIQ}(A, \sigma(R)) \approx \texttt{UNIQ}(A, R)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$\texttt{UNIQ}(A, R \times S) = \texttt{UNIQ}(A, R)$ or $\texttt{UNIQ}(A, S)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$$max(\texttt{UNIQ}(A, R), \texttt{UNIQ}(A, S)) \leq\\ \texttt{UNIQ}(A, R \uplus S)\\ \leq \texttt{UNIQ}(A, R) + \texttt{UNIQ}(A, S)$$</p>
</section>
<section>
<h3>MIN(A), MAX(A)</h3>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(\pi_{A, \ldots}(R)) = min_A(R)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(\sigma_{A, \ldots}(R)) \approx min_A(R)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(R \times S) = min_A(R)$ or $min_A(S)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(R \uplus S) = min(min_A(R), min_A(S))$</p>
</section>
<section>
<h3>Uniform Prior</h3>
@ -442,8 +430,27 @@ textbook: Ch. 16
<li>No inter-attribute correlations.</li>
</ol>
<p class="fragment" style="font-size: 80%; font-weight: bold; margin-top: 20px;">
If the above isn't true, fall back to the 10% rule.
If necessary statistics aren't available (point 1), fall back to the 10% rule.
</p>
<p class="fragment" style="font-size: 80%; font-weight: bold; margin-top: 20px;">
If statistical assumptions (points 2, 3) aren't perfectly true, we'll still likely be getting a better estimate than the 10% rule.
</p>
</section>
<section>
<h3>COUNT(DISTINCT A)</h3>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$\texttt{UNIQ}(A, \pi_{A, \ldots}(R)) = \texttt{UNIQ}(A, R)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$\texttt{UNIQ}(A, \sigma(R)) \approx \texttt{UNIQ}(A, R)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$\texttt{UNIQ}(A, R \times S) = \texttt{UNIQ}(A, R)$ or $\texttt{UNIQ}(A, S)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$$max(\texttt{UNIQ}(A, R), \texttt{UNIQ}(A, S)) \leq\\ \texttt{UNIQ}(A, R \uplus S)\\ \leq \texttt{UNIQ}(A, R) + \texttt{UNIQ}(A, S)$$</p>
</section>
<section>
<h3>MIN(A), MAX(A)</h3>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(\pi_{A, \ldots}(R)) = min_A(R)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(\sigma_{A, \ldots}(R)) \approx min_A(R)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(R \times S) = min_A(R)$ or $min_A(S)$</p>
<p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(R \uplus S) = min(min_A(R), min_A(S))$</p>
</section>
<section>

View file

@ -0,0 +1,279 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="113.48098mm"
height="129.51984mm"
viewBox="0 0 113.48098 129.51984"
version="1.1"
id="svg8"
inkscape:version="0.92.2 5c3e80d, 2017-08-06"
sodipodi:docname="2018-03-05-JoinIssue.svg">
<defs
id="defs2" />
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="0.64"
inkscape:cx="214.95617"
inkscape:cy="143.89299"
inkscape:document-units="mm"
inkscape:current-layer="layer2"
showgrid="false"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0"
inkscape:window-width="1440"
inkscape:window-height="852"
inkscape:window-x="0"
inkscape:window-y="0"
inkscape:window-maximized="1" />
<metadata
id="metadata5">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title></dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1"
transform="translate(-11.141665,-21.581365)">
<text
xml:space="preserve"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:22.57777786px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
x="82.398811"
y="34.684521"
id="text12"><tspan
sodipodi:role="line"
x="82.398811"
y="34.684521"
style="font-size:22.57777786px;stroke-width:0.26458332"
id="tspan16">⋈</tspan></text>
<text
id="text24"
y="70.970238"
x="58.208336"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:22.57777786px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
xml:space="preserve"><tspan
id="tspan22"
style="font-size:22.57777786px;stroke-width:0.26458332"
y="70.970238"
x="58.208336"
sodipodi:role="line">⋈</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:22.57777786px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
x="30.994051"
y="113.30357"
id="text28"><tspan
sodipodi:role="line"
x="30.994051"
y="113.30357"
style="font-size:22.57777786px;stroke-width:0.26458332"
id="tspan32">σ</tspan></text>
<flowRoot
xml:space="preserve"
id="flowRoot38"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:48px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none"
transform="scale(0.26458333)"><flowRegion
id="flowRegion40"><rect
id="rect42"
width="108.57143"
height="608.57141"
x="114.28571"
y="525.37683" /></flowRegion><flowPara
id="flowPara44"></flowPara></flowRoot> <text
xml:space="preserve"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:19.75555611px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
x="32.203571"
y="151.1012"
id="text57"><tspan
sodipodi:role="line"
id="tspan55"
x="32.203571"
y="151.1012"
style="font-size:19.75555611px;stroke-width:0.26458332">R</tspan></text>
<text
id="text61"
y="114.05953"
x="79.072617"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:19.75555611px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
xml:space="preserve"><tspan
style="font-size:19.75555611px;stroke-width:0.26458332"
y="114.05953"
x="79.072617"
id="tspan59"
sodipodi:role="line">S</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:19.75555611px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
x="111.57857"
y="72.482147"
id="text65"><tspan
sodipodi:role="line"
id="tspan63"
x="111.57857"
y="72.482147"
style="font-size:19.75555611px;stroke-width:0.26458332">T</tspan></text>
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="M 66.145832,56.807941 89.296874,36.550779 114.51497,54.740885"
id="path69"
inkscape:connector-curvature="0" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="M 84.335936,96.49544 65.732421,71.690753 37.207031,98.149088"
id="path71"
inkscape:connector-curvature="0" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="m 37.207031,115.09896 v 18.60351"
id="path73"
inkscape:connector-curvature="0" />
</g>
<g
inkscape:groupmode="layer"
id="layer2"
inkscape:label="Layer 2"
transform="translate(-11.141665,-21.581365)">
<g
id="g893"
transform="translate(-5.374349,-3.3072916)"
class="fragment">
<rect
ry="2.4804688"
y="138.24998"
x="17.016014"
height="14.882812"
width="53.330074"
id="rect884"
style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none" />
<text
id="text888"
y="148.02716"
x="43.424736"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
xml:space="preserve"><tspan
style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332"
y="148.02716"
x="43.424736"
id="tspan886"
sodipodi:role="line">100 Tuples</tspan></text>
</g>
<g
transform="translate(-5.374349,-40.927734)"
id="g901"
class="fragment">
<rect
style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none"
id="rect895"
width="53.330074"
height="14.882812"
x="17.016014"
y="138.24998"
ry="2.4804688" />
<text
xml:space="preserve"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
x="43.424736"
y="148.02716"
id="text899"><tspan
sodipodi:role="line"
id="tspan897"
x="43.424736"
y="148.02716"
style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332">10 Tuples</tspan></text>
</g>
<g
class="fragment"
transform="translate(53.776558,-40.927734)"
id="g949">
<rect
style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none"
id="rect943"
width="53.330074"
height="14.882812"
x="17.016014"
y="138.24998"
ry="2.4804688" />
<text
xml:space="preserve"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
x="43.424736"
y="148.02716"
id="text947"><tspan
sodipodi:role="line"
id="tspan945"
x="43.424736"
y="148.02716"
style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332">100 Tuples</tspan></text>
</g>
<g
class="fragment"
id="g941"
transform="translate(20.257161,-80.615234)">
<rect
ry="2.4804688"
y="138.24998"
x="17.016014"
height="14.882812"
width="53.330074"
id="rect935"
style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none" />
<text
id="text939"
y="148.02716"
x="43.424736"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
xml:space="preserve"><tspan
style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332"
y="148.02716"
x="43.424736"
id="tspan937"
sodipodi:role="line">0 Tuples</tspan></text>
</g>
<g
transform="translate(46.302083,-116.16862)"
id="g925"
class="fragment">
<rect
style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none"
id="rect919"
width="53.330074"
height="14.882812"
x="17.016014"
y="138.24998"
ry="2.4804688" />
<text
xml:space="preserve"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
x="43.424736"
y="148.02716"
id="text923"><tspan
sodipodi:role="line"
id="tspan921"
x="43.424736"
y="148.02716"
style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332">0 Tuples</tspan></text>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 14 KiB

View file

@ -73,26 +73,10 @@ emails:
papers:
- title: "HomeRun: Scalable Sparse-Spectrum Reconstruction of Aggregated Historical Data"
- title: "Incremental Knowledge Base Construction Using DeepDive"
- name: Sneha Sudhakaran Nair
- name: Poojitha Alahari
papers:
- title: Big Data Linkage for Product Specification Pages
- title: "ICARUS: Minimizing Human Effort in Iterative Data Completion"
- title: "Data canopy: Accelerating Exploratory Statistical Analysis"
- name: Meha Rajeev Raote
papers:
- title: "Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-Source Code"
- title: "Data canopy: Accelerating exploratory statistical analysis"
- title: "Incremental knowledge base construction using deepdive"
- name: Diksha Harishchandra Marathe
papers:
- title: "Data canopy: Accelerating exploratory statistical analysis."
- title: "Incremental knowledge base construction using deepdive."
- title: "Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source Code."
- title: "Northstar: An Interactive Data Science System"
claims:
- title: "A sample-and-clean framework for fast and accurate query processing on dirty data"
speaker: Jason Kim
- title: "Wrangler: Interactive visual specification of data transformation scripts"
speaker: Poonam Kumari
- title: "HomeRun: Scalable Sparse-Spectrum Reconstruction of Aggregated Historical Data"
speaker: William Spoth
- title: "SMOKE: Fine-grained Lineage at Interactive Speed"
@ -119,7 +103,13 @@ schedule:
- date: Mar 13
event: Spring Break
- date: Mar 20
speakers:
- title: "Wrangler: Interactive visual specification of data transformation scripts"
speaker: Poonam Kumari
- date: Mar 27
speakers:
- title: "Northstar: An Interactive Data Science System"
speaker: Poojitha Alahari
- date: Apr 3
- date: Apr 10
- date: Apr 17