Website/slides/cse501/2017/index.html
2017-09-05 10:05:40 -04:00

509 lines
21 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Embracing Uncertainty</title>
<meta name="description" content="Mimir">
<meta name="author" content="Oliver Kennedy">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="../../reveal.js-3.1.0/css/reveal.css">
<link rel="stylesheet" href="ubodin.css" id="theme">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="../../reveal.js-3.1.0/lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../../reveal.js-3.1.0/css/print/pdf.css' : '../../reveal.js-3.1.0/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="../../reveal.js-3.1.0/lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="header">
<!-- Any Talk-Specific Header Content Goes Here -->
Embracing Uncertainty
</div>
<div class="footer">
<!-- Any Talk-Specific Footer Content Goes Here -->
<div style="float: left; margin-top: 15px; font-size: 12pt;">
Exploring <u><b>O</b></u>nline <u><b>D</b></u>ata <u><b>In</b></u>teractions
</div>
<img src="graphics/FullText-white.png" height="40" style="float: right;"/>
</div>
<div class="slides">
<section>
<h4>Embracing uncertainty with</h4>
<img src="graphics/mimir_logo_final.png" />
</section>
<section>
<dl>
<dt>Student Collaborators (PhD/MS/BS)</dt>
<dd>
Poonam Kumari, William Spoth, Aaron Huber, <br/>
Lisa Lu, Olivia Alphonce, Shivang Aggarwal
</dd>
<dt>Alumni</dt>
<dd>
Niccolo Meneghetti, Arindam Nandi <span style="font-size: 14pt;">(both HPE/Vertica)</span>,<br/>
Vinayak Karuppasamy <span style="font-size: 14pt;">(Bloomberg)</span>, Ying Yang <span style="font-size: 14pt;">(Oracle)</span>
</dd>
<dt>Other Collaborators</dt>
<dd>
Mike Brachmann (UB), Ronny Fehling (Airbus), <br/>
Zhen-Hua Liu (Oracle), Dieter Gawlick (Oracle), <br/>
Boris Glavic (IIT), Juliana Freire (NYU)
</dd>
</dl>
</p><p>
</p><p>
</p>
</section>
<section>
<section>
<h3>A Big Data Fairy Tale</h3>
</section>
<section>
<img src="graphics/dagobert83-female-user-icon-800px.png" height="300" />
<h4>Meet Alice</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/dagobert83-female-user-icon-800px.png" height="300" />
<img src="graphics/littlestorefront-800px.png" height="300" />
<h4>Alice has a Store</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/littlestorefront-800px.png" height="300" style=" vertical-align: middle;"/>
<span style="font-size: 3em; vertical-align: middle;"></span>
<img src="graphics/matt-icons_text-x-log-300px.png" height="300" style=" vertical-align: middle;" />
<h4>Alice's store collects sales data</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/dagobert83-female-user-icon-800px.png" height="300" style=" vertical-align: middle;"/>
<span style="font-size: 3em; vertical-align: middle;">+</span>
<img src="graphics/matt-icons_text-x-log-300px.png" height="300" style=" vertical-align: middle;" />
<span style="font-size: 3em; vertical-align: middle;">=</span>
<img src="graphics/saco-800px.png" height="300" style=" vertical-align: middle;" />
<h4>Alice wants to use her sales data to run a promotion</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/matt-icons_text-x-log-300px.png" height="300" style=" vertical-align: middle;"/>
<span style="font-size: 3em; vertical-align: middle;"></span>
<img src="graphics/database-server-800px.png" height="300" style=" vertical-align: middle;" />
<h4>So Alice loads up her sales data in her trusty database/hadoop/spark/etc... server.</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/database-server-800px.png" height="300" style=" vertical-align: middle;" />
<span style="font-size: 3em; vertical-align: middle;">+&nbsp;?</span>
<h4>... asks her question ...</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/database-server-800px.png" height="300" style=" vertical-align: middle;" />
<span style="font-size: 3em; vertical-align: middle;">+&nbsp;?&nbsp;</span>
<img src="graphics/crystalball-800px.png" height="300" style=" vertical-align: middle;" />
<h4>... and basks in the limitless possibilities of big data.</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
</section>
<section>
<section>
<h2>Why is this a fairy tale?</h2>
</section>
<section>
<img src="graphics/matt-icons_text-x-log-300px.png" height="300" style=" vertical-align: middle;"/>
<span style="font-size: 3em; vertical-align: middle;"></span>
<img src="graphics/database-server-800px.png" height="300" style=" vertical-align: middle;" />
<h4>It's never this easy...</h4>
</section>
</section>
<section>
<section>
<h2>CSV Import</h2>
<h4>Run a <code>SELECT</code> on a raw CSV File</h4>
<ul class="fragment">
<li>File may not have column headers</li>
<li>CSV does not provide "types"</li>
<li>Lines may be missing fields</li>
<li>Fields may be mistyped (typo, missing comma)</li>
<li>Comment text can be inlined into the file</li>
</ul>
<p class="fragment">
<b>State of the art</b>: External Table Defn <span class="fragment">+ "Manually" edit CSV</span>
</p>
</section>
<section>
<h2>Merge Two Datasets</h2>
<h4><code>UNION</code> two data sources</h4>
<ul class="fragment">
<li>Schema matching</li>
<li>Deduplication</li>
<li>Format alignment (GIS coordinates, $ vs €)
<li>Precision alignment (State vs County)</li>
</ul>
<p class="fragment">
<b>State of the art</b>: Manually map schema
</p>
</section>
<section>
<h2>JSON Shredding</h2>
<h4>Run a <code>SELECT</code> on JSON or a Doc Store</h4>
<ul class="fragment">
<li>Separating fields and record sets:<br/>(e.g., <code>{ A: "Bob", B: "Alice" }</code>)</li>
<li>Missing fields (Records with no 'address')</li>
<li>Type alignment (Records with 'address' as an array)</li>
<li>Schema matching$^2$</li>
</ul>
<p class="fragment">
<b>State of the art</b>: DataGuide, Wrangler, etc...
</p>
</section>
</section>
<section>
<section>
<h2>Data Cleaning is Hard!</h2>
</section>
<section>
<h3>State of the Art</h3>
<img src="graphics/BI-Analyst.jpg" height="400" />
<attribution>(skilledup.com)</attribution>
<p>Alice spends weeks cleaning her data before using it.</p>
</section>
<section>
<h3>Newer State of the Art</h3>
<img src="graphics/iu.jpeg" height=500 />
<attribution>(azure.microsoft.com)</attribution>
</section>
<section>
<img src="graphics/data-lake-to-data-swamp.jpg" height=500 />
<attribution>(timoelliott.com)</attribution>
</section>
</section>
<section>
<section>
<h2>Structure is hard!</h2>
<ul>
<li class="fragment">Structured models (RelDBs) force curation during loading.
<ul><li class="fragment"><b>Problem:</b> All curation costs are upfront.</li></ul>
</li>
<li class="fragment">Unstructured models (NoSQL) force curation into queries.
<ul><li class="fragment"><b>Problem:</b> Complexity/redundancy blowup in queries.</li></ul>
</li>
</ul>
<p class="fragment" style="margin-top: 50px;">Add structure, curation effort <b>On-Demand</b></p>
</section>
<section>
<h3>But... you still need some sort of structure?!?</h3>
<h3 class="fragment">Let the database make a guess!</h3>
</section>
<section>
<h3>
In the name of Codd,<br/><span class="fragment grow highlight-current-blue">thou shalt not give the user a wrong answer.</span>
</h3>
<h4 class="fragment">
... but what if we did?
</h4>
<h4 class="fragment">
What would it take for that to be ok?
</h4>
</section>
</section>
<section>
<section>
<h2>Industry says...</h2>
</section>
<section>
<img src="graphics/maybe-screen.png" height="500px" />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<img src="graphics/maybe-detail.png" height="500px" class="fragment" /><br/>
<p class="fragment">My phone is guessing, but is letting me know that it did</p>
</section>
<section>
<img src="graphics/Calendar_Base.png" height="500px" />
</section>
<section>
<img src="graphics/Calendar_Explain.png" height="500px" />
<p>Easy interactions to <i>accept</i>, <i>reject</i>, or <i>explain</i> uncertainty</p>
</section>
<section>
<img src="graphics/Bing-Translate.png" height="500px" />
<p class="fragment">Good Explanations, Alternatives, and Feedback Vectors</p>
</section>
<section>
<h2>Communication</h2>
<ul>
<li>What data is uncertain?</li>
<li>Why is my data uncertain?</li>
<li>How bad is it?</li>
<li>What can I do about it?</li>
</ul>
</section>
<section>
<h2>What if a database did the same?</h2>
</section>
<section>
<ul style="width:35%; font-size: 24pt; margin-top: 50px;">
<li class="fragment"><b>A:</b> Standard SQL.</li>
<li class="fragment"><b>B:</b> Annotated Output.</li>
<li class="fragment"><b>C:</b> Lens Diagram.</li>
<li class="fragment"><b>D:</b> Result Explanations.</li>
</ul>
<img src="graphics/UIExample.png" style="width:60%; float:right"/>
</section>
</section>
<section>
<section>
<h3>Lenses</h3>
<p class="fragment">Here's a problem with my data. <span class="fragment">Fix it.</span></p>
<ul>
<li class="fragment">What type is this column? (majority vote)</li>
<li class="fragment">How do the columns of these relations line up? (pick your favorite schema matching paper)</li>
<li class="fragment">How do I query heterogeneous JSON objects? (see above)</li>
<li class="fragment">What should these missing values be? (learning-based interpolation)</li>
</ul>
</section>
<section>
<svg width=500 height=350>
<g transform="scale(1.2)">
<text x="0" y="45">View:</text>
<image xlink:href="graphics/db.svg" x="130" y="10" height="50px" width="50px"/>
<text x="225" y="20" style="font-family: courier; font-size: 60%">SELECT</text>
<polygon
points="190,35 340,35 325,30 325,40 340,35"
style="
stroke: black;
fill: black;
stroke-width: 2;
"
/>
<image xlink:href="graphics/jean-victor-balin-icon-table.svg" x="350" y="10" height="50px" width="50px"/>
</g>
<g transform="translate(0,150) scale(1.2)" class="fragment">
<text x="0" y="45">Lens:</text>
<image xlink:href="graphics/db.svg" x="130" y="10" height="50px" width="50px"/>
<text x="225" y="20" style="font-family: courier; font-size: 60%">SELECT</text>
<polygon
points="190,35 340,35 325,30 325,40 340,35"
style="
stroke: black;
fill: black;
stroke-width: 2;
"
/>
<image xlink:href="graphics/jean-victor-balin-icon-table.svg" x="350" y="10" height="50px" width="50px"/>
<g class="fragment">
<text x="212" y="20" style="font-family: courier; font-size: 60%">[&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]</text>
<image xlink:href="graphics/jean-victor-balin-icon-table.svg" x="355" y="15" height="50px" width="50px"/>
<image xlink:href="graphics/jean-victor-balin-icon-table.svg" x="360" y="20" height="50px" width="50px"/>
<image xlink:href="graphics/jean-victor-balin-icon-table.svg" x="365" y="25" height="50px" width="50px"/>
<g class="fragment">
<image xlink:href="graphics/jean-victor-balin-icon-table.svg" x="350" y="110" height="60px" width="60px"/>
<polygon
points="380,80 380,105 385,90 375,90 380,105"
style="
stroke: black;
fill: black;
stroke-width: 2;
"
/>
<text x="220" y="142" style="font-size: 60%">(best guess)</text>
</g>
</g>
</g>
</svg>
<p class="fragment">Lenses introduce <i>uncertainty</i></p>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<h2>The User's View</h2>
<pre><code>
SELECT NAME, DEPARTMENT FROM PRODUCTS;
</code></pre>
<table class="fragment" data-fragment-index="1">
<tr><th>Name</th><th>Department</th></tr>
<tr><td>Apple 6s, White</td><td>Phone</td></tr>
<tr><td>Dell, Intel 4 core</td><td>Computer</td></tr>
<tr><td>HP, AMD 2 core</td><td class="fragment highlight-red" data-fragment-index="2">Computer</td></tr>
<tr><td>...</td><td>...</td></tr>
</table>
<p class="fragment" data-fragment-index="2"><b>Simple UI:</b> Highlight values that are based on guesses.</p>
</section>
<section>
<pre><code>
SELECT NAME, DEPARTMENT FROM PRODUCTS;
</code></pre>
<small>
<table>
<tr><th>Name</th><th>Department</th></tr>
<tr><td>Apple 6s, White</td><td>Phone</td></tr>
<tr><td>Dell, Intel 4 core</td><td>Computer</td></tr>
<tr><td>HP, AMD 2 core</td><td style="color: red;">Computer</td></tr>
<tr><td>...</td><td>...</td></tr>
</table>
</small>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" version="1.1" viewBox="241 277 265 125" width="265pt" height="125pt" xmlns:dc="http://purl.org/dc/elements/1.1/" class="fragment" data-fragment-index="1">
<metadata> Produced by OmniGraffle 6.2.5 <dc:date>2015-09-20 14:45:55 +0000</dc:date></metadata>
<defs><font-face font-family="Helvetica Neue" font-size="16" panose-1="2 0 8 3 0 0 0 9 0 4" units-per-em="1000" underline-position="-100" underline-thickness="50" slope="0" x-height="517" cap-height="714" ascent="975.0061" descent="-216.99524" font-weight="bold"><font-face-src><font-face-name name="HelveticaNeue-Bold"/></font-face-src></font-face><font-face font-family="Helvetica Neue" font-size="16" panose-1="2 0 5 3 0 0 0 2 0 4" units-per-em="1000" underline-position="-100" underline-thickness="50" slope="0" x-height="517" cap-height="714" ascent="951.99585" descent="-212.99744" font-weight="500"><font-face-src><font-face-name name="HelveticaNeue"/></font-face-src></font-face></defs>
<g stroke="none" stroke-opacity="1" stroke-dasharray="none" fill="none" fill-opacity="1">
<title>Canvas 1</title>
<g>
<title>Layer 1</title>
<path d="M 279 351 L 243 369 L 279 387 L 279 389 C 279 394.52285 283.47715 399 289 399 L 494 399 C 499.52285 399 504 394.52285 504 389 L 504 289 C 504 283.47715 499.52285 279 494 279 L 289 279 C 283.47715 279 279 283.47715 279 289 Z" fill="white"/>
<path d="M 279 351 L 243 369 L 279 387 L 279 389 C 279 394.52285 283.47715 399 289 399 L 494 399 C 499.52285 399 504 394.52285 504 389 L 504 289 C 504 283.47715 499.52285 279 494 279 L 289 279 C 283.47715 279 279 283.47715 279 289 Z" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(293 293)" fill="black"><tspan font-family="Helvetica Neue" font-size="16" font-weight="bold" x="0" y="16" textLength="16.896" class="fragment" data-fragment-index="2">Pr</tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="bold" x="16.608" y="16" textLength="69.28" class="fragment" data-fragment-index="2">obability:</tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="500" x="85.888" y="16" textLength="38.24" class="fragment" data-fragment-index="2"> 95%</tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="bold" x="0" y="53" textLength="62.224" class="fragment" data-fragment-index="3">Reason:</tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="500" x="62.224" y="53" textLength="144.912" class="fragment" data-fragment-index="3"> Because I guessed </tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="500" x="0" y="71" textLength="206.592" class="fragment" data-fragment-index="3">Computer for Department </tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="500" x="0" y="89" textLength="196.16" class="fragment" data-fragment-index="3">on Row 3 of PRODUCTS</tspan></text>
</g>
</g>
</svg>
<p class="fragment" data-fragment-index="1">Allow users to <code>EXPLAIN</code> uncertain outputs</p>
<p class="fragment" data-fragment-index="3">Explanations include reasons given in English</p>
</section>
<!--
<section>
<div style="padding: 30px;">
<p>$PRODUCTS.DEPARTMENT_{3}$</p>
<div style="font-size: 2em">⬍</div>
<p>"I guessed 'Computer' for 'Department' on Row '3'"</p>
</div>
</section>
-->
<section>
<h3>Explanations</h3>
<ol>
<li>Mark <i>uncertain</i> data and results.</li>
<li>Upon request, provide more detail:
<ul style="font-size:80%; width: 600px">
<li>Why is my data uncertain? <span style="float:right; font-size:80%; margin-top: 5px">(provenance)</span></li>
<li>How bad is it? <span style="float:right; font-size:80%; margin-top: 5px">(confidence, entropy, bounds)</span></li>
<li>What are other possibile answers? <span style="float:right; font-size:80%; margin-top: 5px">(samples)</span></li>
<li>What can I do to fix it? <span style="float:right; font-size:80%; margin-top: 5px">(repairs)</span></li>
</ul></li>
</ol>
</section>
</section>
<section>
<p><b>Email:</b> okennedy@buffalo.edu</p>
<p><b>Office:</b> Davis 338H</p>
<p><b>Web: </b> <a href="https://odin.cse.buffalo.edu">https://odin.cse.buffalo.edu</a></p>
<p><b>Mimir: </b> <a href="http://mimirdb.info">http://mimirdb.info</a></p>
<p class="fragment" style="margin-top: 100px">Today's password is <b>Frances Allen</b></p>
</section>
</div></div>
<script src="../../reveal.js-3.1.0/lib/js/head.min.js"></script>
<script src="../../reveal.js-3.1.0/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/../../reveal.js#configuration
Reveal.initialize({
controls: false,
progress: true,
history: true,
center: true,
slideNumber: true,
transition: 'fade', // none/fade/slide/convex/concave/zoom
// Optional ../../reveal.js plugins
dependencies: [
{ src: '../../reveal.js-3.1.0/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: '../../reveal.js-3.1.0/plugin/math/math.js',
condition: function() { return true; },
mathjax: '../../reveal.js-3.1.0/js/MathJax.js'
},
{ src: '../../reveal.js-3.1.0/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../../reveal.js-3.1.0/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../../reveal.js-3.1.0/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: '../../reveal.js-3.1.0/plugin/zoom-js/zoom.js', async: true },
{ src: '../../reveal.js-3.1.0/plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>