Minor talk tweaks

pull/1/head
Oliver Kennedy 2017-10-06 09:27:25 -04:00
parent 7d09995b63
commit bd9d0e19a2
1 changed files with 70 additions and 50 deletions

View File

@ -290,7 +290,8 @@
</section>
<section>
<h2>Data Cleaning is Hard!</h2>
<p>Loading requires curation...</p>
<h2 class="fragment">Data Curation is Hard!</h2>
</section>
<section>
@ -299,21 +300,39 @@
<img src="graphics/BI-Analyst.jpg" height="400" />
<imagecredits>(skilledup.com)</imagecredits>
<p>Alice spends weeks cleaning her data before using it.</p>
<p>Alice spends weeks curating her data before using it.</p>
</section>
<section>
<h3>Relational databases make this worse...</h3>
<p>The data needs...
<ul>
<li>... a complete schema (e.g., Tables, Columns, Types, ...).</li>
<li>... to satisfy constraints (e.g., NOT NULL, Key, F-Key).</li>
</ul>
</p>
<p class="fragment">This is all required upfront. <b>Before asking a single question</b>.</p>
</section>
</section>
<section>
<section>
<h2>The database is in the way</h2>
<h3 class="fragment">Why?</h3>
<p>Relational DBs are useless in early stages of curation.</p>
<h2>Why?</h2>
</section>
<section>
<h3>
In the name of Codd,<br/><span class="fragment grow highlight-current-blue">thou shalt not give the user a wrong answer.</span>
In the name of Codd,<br/><span class="fragment grow highlight-current-blue" data-fragment-index="2">thou shalt not give the user a wrong answer.</span>
</h3>
<p class="fragment" data-fragment-index="1" style="margin-top: 80px;">There are tons of good heuristics available for <span class="fragment highlight-current-blue" data-fragment-index="2">guessing</span> how to clean data.</p>
</section>
<section>
<p>
Thou shalt not give the user a wrong answer.
</p>
<h4 class="fragment">
... but what if we did?
@ -417,7 +436,7 @@
width="93" height="103"
x="0" y="10"
/>
<g class="fragment" data-fragment-index="1">
<g class="fragment" data-fragment-index="2">
<image
xlink:href="graphics/db.svg"
width="93" height="103"
@ -437,7 +456,7 @@
</g>
<g
transform="translate(250, 0)"
class="fragment" data-fragment-index="2"
class="fragment" data-fragment-index="1"
style="
fill: rgba(200, 50, 50, 0);
stroke-width: 4;
@ -488,7 +507,7 @@
</g>
<g
transform="translate(0, 0)"
class="fragment" data-fragment-index="7"
class="fragment" data-fragment-index="6"
style="
fill: rgba(200, 50, 50, 0);
stroke-width: 4;
@ -536,34 +555,33 @@
points="20,60 140,60 120,50 140,60 120,70 140,60"
transform="translate(0,390) rotate(-60)"
/>
<text x="120" y="230">Probab.</text>
<text x="120" y="280">Cert. A.</text>
<polyline
points="110,270 240,270"
style="stroke: red;"
class="fragment" data-fragment-index="5"
/>
<g class="fragment" data-fragment-index="8">
<g style="font-size: 18px; stroke-width: 0; fill: rgba(120,120,120,1); ">
<text x="130" y="211">Probability</text>
<text x="130" y="237">Expectation</text>
<text x="130" y="263">Variance</text>
<text x="130" y="289">Histogram</text>
</g>
<g class="fragment" data-fragment-index="7">
<image
xlink:href="graphics/dagobert83-female-user-icon-800px.png"
width="100" height="100"
x="110" y="180"
x="110" y="190"
/>
</g>
</g>
</svg>
<p class="fragment" data-fragment-index="6" style="font-size: smaller">
<p class="fragment" data-fragment-index="5" style="font-size: smaller">
We've gotten good at query processing on uncertain data.<br/>
<span class="fragment" data-fragment-index="7">But not at "sourcing" uncertain data
<span class="fragment" data-fragment-index="8">... or communicating results.</span></span>
<span class="fragment" data-fragment-index="6">But not sourcing uncertain data
<span class="fragment" data-fragment-index="7">... or communicating results to humans.</span></span>
</p>
</section>
<section>
<h3>Challenges</h3>
<ul>
<li>Where do Probabilities/Possible Worlds Come From?</li>
<li>How do I use the output of a probablistic DB query?</li>
<li>Where do probabilities/possible worlds come from?</li>
<li>How do humans use the output of probabilistic queries?</li>
<li class="fragment">Probablistic DB queries are sloooooow.</li>
</ul>
<p class="fragment" style="font-size: smaller;">A small shift in how we think about PDBs addresses all three points.</p>
@ -948,18 +966,18 @@
<center><div style="width: 600px" class="fragment" data-fragment-index="1">
<table style="float: left">
<thead>
<tr><th>R |</th><th>A</th><th>B</th></tr>
<tr><th style="border-right: 1px solid;">R</th><th>A</th><th>B</th></tr>
</thead><tbody>
<tr><td align="right">|</td><td>1</td><td>2</td></tr>
<tr><td align="right">|</td><td>3</td><td>4</td></tr>
<tr><td align="right">|</td><td>5</td><td>4</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>1</td><td>2</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>3</td><td>4</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>5</td><td>4</td></tr>
</tbody>
</table>
<table style="float: right" class="fragment" data-fragment-index="2">
<tr><th>A</th><th>C</th></tr>
<tr><td>1</td><td>$X_2$</td></tr>
<tr><td>3</td><td>$X_4$</td></tr>
<tr><td>5</td><td>$X_4$</td></tr>
<tr><th style="border-right: 1px solid;">Q(R)</th><th>A</th><th>C</th></tr>
<tr><td style="border-right: 1px solid;"></td><td>1</td><td>$X_2$</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>3</td><td>$X_4$</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>5</td><td>$X_4$</td></tr>
</table>
</div></center>
<div style="clear: both;">&nbsp;</div>
@ -1223,7 +1241,7 @@
<section>
<section>
<h3>ETL Question 1</h3>
<h3>Provenance Question 1</h3>
<p>How much of my query result is affected by unvalidated variables?</p>
<p class="fragment"><b>Idea:</b> Mark values in query results that depend on unvalidated variables.</p>
@ -1360,7 +1378,8 @@ CREATE VIEW R_CLEANED AS
<section>
<section>
<h3>Which variables affect my query results?</h3>
<h3>Provenance Question 2</h3>
<p>Which variables affect my query results?</p>
<p class="fragment" data-fragment-index="1"><b>Idea: </b> Static dependency analysis produces a list of variable families and queries to generate all relevant indexes.</p>
<citation class="fragment" data-fragment-index="1">Mimir: Bringing CTables into Practice; Nandi et. al.; ArXiV</citation>
</section>
@ -1371,7 +1390,8 @@ CREATE VIEW R_CLEANED AS
<section>
<h3>How bad is the situation?</h3>
<h3>Provenance Question 3</h3>
<p>How bad is the situation?</p>
<p class="fragment"><b>Idea: </b> Sample from the space of alternatives to...
<ul>
<li class="fragment">Estimate error, expectations, or other statistical measures.</li>
@ -1403,21 +1423,21 @@ CREATE VIEW R_CLEANED AS
<table>
<tr><td>
<table>
<tr><th style="border-right: 1px black;">$R_1$</th><th>A</th><th>B</th></tr>
<tr><td style="border-right: 1px black;"></td><td>1</td><td>2</td></tr>
<tr><td style="border-right: 1px black;"></td><td>3</td><td>4</td></tr>
<tr><th style="border-right: 1px solid;">$R_1$</th><th>A</th><th>B</th></tr>
<tr><td style="border-right: 1px solid;"></td><td>1</td><td>2</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>3</td><td>4</td></tr>
<tr><td></td><td></td><td></td></tr>
<tr><th style="border-right: 1px black;">$R_2$</th><th>A</th><th>B</th></tr>
<tr><td style="border-right: 1px black;"></td><td>1</td><td>5</td></tr>
<tr><th style="border-right: 1px solid;">$R_2$</th><th>A</th><th>B</th></tr>
<tr><td style="border-right: 1px solid;"></td><td>1</td><td>5</td></tr>
</table>
</td><td style="vertical-align: middle;">
</td><td style="vertical-align: middle;">
<table>
<tr><th style="border-right: 1px black;">$R_{sparse}$</th><th>A</th><th>B</th><th>S#</th></tr>
<tr><td style="border-right: 1px black;"></td><td>1</td><td>2</td><td>1</td></tr>
<tr><td style="border-right: 1px black;"></td><td>3</td><td>4</td><td>1</td></tr>
<tr><td style="border-right: 1px black;"></td><td>1</td><td>5</td><td>2</td></tr>
<tr><th style="border-right: 1px solid;">$R_{sparse}$</th><th>A</th><th>B</th><th>S#</th></tr>
<tr><td style="border-right: 1px solid;"></td><td>1</td><td>2</td><td>1</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>3</td><td>4</td><td>1</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>1</td><td>5</td><td>2</td></tr>
</table>
</td></tr>
</table>
@ -1428,20 +1448,20 @@ CREATE VIEW R_CLEANED AS
<table>
<tr><td>
<table>
<tr><th style="border-right: 1px black;">$R_1$</th><th>A</th><th>B</th></tr>
<tr><td style="border-right: 1px black;"></td><td>1</td><td>2</td></tr>
<tr><td style="border-right: 1px black;"></td><td>3</td><td>4</td></tr>
<tr><th style="border-right: 1px solid;">$R_1$</th><th>A</th><th>B</th></tr>
<tr><td style="border-right: 1px solid;"></td><td>1</td><td>2</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>3</td><td>4</td></tr>
<tr><td></td><td></td><td></td></tr>
<tr><th style="border-right: 1px black;">$R_2$</th><th>A</th><th>B</th></tr>
<tr><td style="border-right: 1px black;"></td><td>1</td><td>5</td></tr>
<tr><th style="border-right: 1px solid;">$R_2$</th><th>A</th><th>B</th></tr>
<tr><td style="border-right: 1px solid;"></td><td>1</td><td>5</td></tr>
</table>
</td><td style="vertical-align: middle;">
</td><td style="vertical-align: middle;">
<table>
<tr><th style="border-right: 1px black;">$R_{bundle}$</th><th>A</th><th>B</th><th>$\phi$</th></tr>
<tr><td style="border-right: 1px black;"></td><td>1</td><td>[2,5]</td><td>[T,T]</td></tr>
<tr><td style="border-right: 1px black;"></td><td>3</td><td>4</td><td>[T,F]</td></tr>
<tr><th style="border-right: 1px solid;">$R_{bundle}$</th><th>A</th><th>B</th><th>$\phi$</th></tr>
<tr><td style="border-right: 1px solid;"></td><td>1</td><td>[2,5]</td><td>[T,T]</td></tr>
<tr><td style="border-right: 1px solid;"></td><td>3</td><td>4</td><td>[T,F]</td></tr>
</table>
</td></tr>
</table>