Website/slides/talks/2015-5-UpBeat/index.bak.html

460 lines
18 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Embracing Uncertainty</title>
<meta name="description" content="Mimir, an awesome system for embracing uncertainty">
<meta name="author" content="Oliver Kennedy">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="../reveal.js-3.1.0/css/reveal.css">
<link rel="stylesheet" href="ubodin.css" id="theme">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="../reveal.js-3.1.0/lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../reveal.js-3.1.0/css/print/pdf.css' : '../reveal.js-3.1.0/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="../reveal.js-3.1.0/lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="header">
<!-- Any Talk-Specific Header Content Goes Here -->
Embracing Uncertainty &amp; ODIn Lab Overview
</div>
<div class="footer">
<!-- Any Talk-Specific Footer Content Goes Here -->
<div style="float: left; margin-top: 15px; ">
Exploring <u><b>O</b></u>nline <u><b>D</b></u>ata <u><b>In</b></u>teractions
</div>
<img src="graphics/FullText-white.png" height="40" style="float: right;"/>
</div>
<div class="slides">
<section>
<h2>Embracing Uncertainty</h2>
<h4>ODIn Lab</h4>
<h5><a href="https://odin.cse.buffalo.edu">https://odin.cse.buffalo.edu</a></h5>
<img src="graphics/qrcode.31361737.png" />
</section>
<section>
<h2>Embracing Uncertainty</h2>
<h4>Oliver Kennedy</h4>
<h4 style="color: blue">Ying Yang, Niccolo Meneghetti, <br/> Arindam Nandi, Vinayak Karuppasamy<br/>(UB)</h3>
<h4 style="color: red">Ronny Fehling, Zhen-Hua Liu, Dieter Gawlick<br/>(Oracle)</h3>
</section>
<section>
<h2>Before we begin...</h2>
</section>
<section>
<h2>Insider Threats</h2>
<ul>
<li>How do we identify <i>abnormal</i> query behavior from users?</li>
<li>What is <i>normal</i> user behavior?</li>
<li>Multiple gigs of query logs from M&amp;T</li>
</ul>
<p>...with <b>Gokhan Kul, Duc Thanh Anh Luong, Ting Xie</b>, Shambhu, Varun, Hung</p>
</section>
<section>
<h2>Pocket Data</h2>
<ul>
<li>Months of query logs from PhoneLab Phones (2 queries per phone per second)</li>
<li>SQLite is inefficient</li>
<li>SQLite is being used inefficiently</li>
<li>Let's develop a benchmark to help shine a light on these inefficiencies</li>
</ul>
<p>...with <b>Jerry Ajay</b>, Geoff, Luke</p>
</section>
<section>
<h2>Just-in-Time Datastructures</h2>
<ul>
<li>Decouple Physical Structure from Logical Interface.</li>
<li>Express Datastructure Organization through Rewrite Rules.</li>
<li>...allows hybridized datastructures for intermediate tradeoffs.</li>
<li>...allows for semifunctional datastructures with all the benefits but fewer tradeoffs.</li>
</ul>
<p>...with Luke</p>
</section>
<section>
<section>
<h3>A Big Data Fairy Tale</h3>
</section>
<section>
<img src="graphics/dagobert83-female-user-icon-800px.png" height="300" />
<h4>Meet Alice</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/dagobert83-female-user-icon-800px.png" height="300" />
<img src="graphics/littlestorefront-800px.png" height="300" />
<h4>Alice has a Store</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/littlestorefront-800px.png" height="300" style=" vertical-align: middle;"/>
<span style="font-size: 3em; vertical-align: middle;"></span>
<img src="graphics/matt-icons_text-x-log-300px.png" height="300" style=" vertical-align: middle;" />
<h4>Alice's store collects sales data</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/dagobert83-female-user-icon-800px.png" height="300" style=" vertical-align: middle;"/>
<span style="font-size: 3em; vertical-align: middle;">+</span>
<img src="graphics/matt-icons_text-x-log-300px.png" height="300" style=" vertical-align: middle;" />
<span style="font-size: 3em; vertical-align: middle;">=</span>
<img src="graphics/saco-800px.png" height="300" style=" vertical-align: middle;" />
<h4>Alice wants to use her sales data to run a promotion</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/matt-icons_text-x-log-300px.png" height="300" style=" vertical-align: middle;"/>
<span style="font-size: 3em; vertical-align: middle;"></span>
<img src="graphics/database-server-800px.png" height="300" style=" vertical-align: middle;" />
<h4>So Alice loads up her sales data in her trusty database/hadoop/spark/etc... server.</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/database-server-800px.png" height="300" style=" vertical-align: middle;" />
<span style="font-size: 3em; vertical-align: middle;">+&nbsp;?</span>
<h4>... asks her question ...</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
<section>
<img src="graphics/database-server-800px.png" height="300" style=" vertical-align: middle;" />
<span style="font-size: 3em; vertical-align: middle;">+&nbsp;?&nbsp;</span>
<img src="graphics/crystalball-800px.png" height="300" style=" vertical-align: middle;" />
<h4>... and basks in the limitless possibilities of big data.</h4>
<attribution>(OpenClipArt.org)</attribution>
</section>
</section>
<section>
<section>
<h2>Why is this a fairy tale?</h2>
</section>
<section>
<img src="graphics/matt-icons_text-x-log-300px.png" height="300" style=" vertical-align: middle;"/>
<span style="font-size: 3em; vertical-align: middle;"></span>
<img src="graphics/database-server-800px.png" height="300" style=" vertical-align: middle;" />
<h4>It's never this easy...</h4>
</section>
</section>
<section>
<section>
<h2>Data Cleaning is Hard!</h2>
</section>
<section>
<h3>State of the Art</h3>
<img src="graphics/BI-Analyst.jpg" height="400" />
<attribution>(skilledup.com)</attribution>
<p>Alice spends weeks cleaning her data before using it.</p>
</section>
<section>
<h3>Newer State of the Art</h3>
<img src="graphics/azure-data-lake.png" height=500 />
<attribution>(azure.microsoft.com)</attribution>
</section>
<section>
<img src="graphics/data-lake-to-data-swamp.jpg" height=500 />
<attribution>(timoelliott.com)</attribution>
</section>
</section>
<section>
<section>
<h2>Making Cleaning Easier</h2>
<svg width=500 height=300>
<polygon
points="60,50 60,60 40,50 60,40 60,50 440,50 440,40 460,50 440,60 440,50"
style="
stroke: black;
fill: black;
stroke-width: 2;
"
/>
<text x=0 y=30 style="font-size: 0.75em">Scalability</text>
<text x=370 y=30 style="font-size: 0.75em">Reliability</text>
<text class="fragment" x=-220 y=400 style="font-size: 0.75em" transform="rotate(-90 20,20)">Expert Analysis</text>
<text class="fragment" x=-220 y=250 style="font-size: 0.75em" transform="rotate(-90 20,20)">Crowdsourcing</text>
<text class="fragment" x=-180 y=100 style="font-size: 0.75em" transform="rotate(-90 20,20)">Automation</text>
</svg>
<p class="fragment">Can we start with automation and work our way up?</p>
</section>
</section>
<section>
<ul>
<li>Automate educated guesses for fast cleaning<ul>
<li><b class="fragment highlight-blue" data-fragment-index="5">Lenses</b>: A family of simple data-cleaning operators</li>
<li class="fragment" data-fragment-index="1">... but what if the guesses are wrong?</li>
</ul></li>
<li class="fragment" data-fragment-index="2">Annotate 'best guess' relations with the guesses<ul>
<li class="fragment shrink fade-out" data-fragment-index="5"><b>Virtual C-Tables</b>: A lineage model based on views, labeled nulls, and lazy evaluation.</li>
<li class="fragment" data-fragment-index="3">... so now the user needs to interpret your guesses?</li>
</ul></li>
<li class="fragment" data-fragment-index="4">Rank guesses by their impact on result uncertainty<ul>
<li class="fragment shrink fade-out" data-fragment-index="5"><b>CPI</b>: A greedy heuristic for ranking sources of uncertainty.</li>
</ul></li>
</ul>
</section>
<section>
<section>
<h3>Lenses</h3>
<p class="fragment">Here's a problem with my data. <span class="fragment">Fix it.</span></p>
<ul>
<li class="fragment">What type is this column? (majority vote)</li>
<li class="fragment">How do the columns of these relations line up? (pick your favorite schema matching paper)</li>
<li class="fragment">How do I query heterogeneous JSON objects? (see above)</li>
<li class="fragment">What should these missing values be? (learning-based interpolation)</li>
<ul>
</section>
<section>
<h3>Lenses</h3>
<p>Each lens implements one automated data repair task with <b>minimal configuration or training</b>.</p>
<ul>
<li class="fragment">A "SQL" Expression</li>
<li class="fragment">A Model that defines configuration parameters and best-guesses for data repairs.</li>
</ul>
</section>
<section>
<pre><code>
CREATE LENS PRODUCTS
AS SELECT * FROM PRODUCTS_RAW
USING DOMAIN_REPAIR(DEPARTMENT NOT NULL);
</code></pre>
<ul>
<li><code>AS</code> clause defines source data.</li>
<li><code>USING</code> clause requests repairs.</li>
</ul>
</section>
<section>
<pre><code>
CREATE LENS PRODUCTS
AS SELECT * FROM PRODUCTS_RAW
USING DOMAIN_REPAIR(DEPARTMENT NOT NULL);
</code></pre>
<div>
<h4>The Query</h4>
<pre><code>
CREATE VIEW PRODUCTS
AS SELECT ID, NAME, ...,
CASE WHEN DEPARTMENT IS NOT NULL THEN DEPARTMENT
ELSE VAR('PRODUCTS.DEPARTMENT', ROWID)
END AS DEPARTMENT
FROM PRODUCTS_RAW;
</code></pre>
</div>
<small class="fragment">
<table>
<tr><th>ID</th><th>Name</th><th>...</th><th>Department</th></tr>
<tr><td>123</td><td>Apple 6s, White</td><td>...</td><td>Phone</td></tr>
<tr><td>34234</td><td>Dell, Intel 4 core</td><td>...</td><td>Computer</td></tr>
<tr><td>34235</td><td>HP, AMD 2 core</td><td>...</td><td class="fragment">$Prod.Dept_3$</td></tr>
<tr><td>...</td><td>...</td><td>...</td><td>...</td></tr>
</table>
</small>
</section>
<section>
<pre><code>
CREATE LENS PRODUCTS
AS SELECT * FROM PRODUCTS_RAW
USING DOMAIN_REPAIR(DEPARTMENT NOT NULL);
</code></pre>
<div>
<h4>The Model</h4>
<pre><code>
SELECT * FROM PRODUCTS_RAW;
</code></pre>
</div>
<div class="fragment">
<div style="font-size: 1em; vertical-align: middle;"></div>
<div>
<img src="graphics/weka.png" />
</div>
</div>
<div class="fragment">
<div style="font-size: 1em; vertical-align: middle;"></div>
<div><p>An estimator for each <small style="vertical-align: baseline;">$Prod.Dept_{ROWID}$</small><p></div>
</div>
</section>
</section>
<section>
<section>
<h2>The User's View</h2>
<pre><code>
SELECT NAME, DEPARTMENT FROM PRODUCTS;
</code></pre>
<table class="fragment" data-fragment-index="1">
<tr><th>Name</th><th>Department</th></tr>
<tr><td>Apple 6s, White</td><td>Phone</td></tr>
<tr><td>Dell, Intel 4 core</td><td>Computer</td></tr>
<tr><td>HP, AMD 2 core</td><td class="fragment highlight-red" data-fragment-index="2">Computer</td></tr>
<tr><td>...</td><td>...</td></tr>
</table>
<p class="fragment" data-fragment-index="2"><b>Simple UI:</b> Highlight values (and rows) based on guesses.</p>
</section>
<section>
<pre><code>
SELECT NAME, DEPARTMENT FROM PRODUCTS;
</code></pre>
<small>
<table>
<tr><th>Name</th><th>Department</th></tr>
<tr><td>Apple 6s, White</td><td>Phone</td></tr>
<tr><td>Dell, Intel 4 core</td><td>Computer</td></tr>
<tr><td>HP, AMD 2 core</td><td style="color: red;">Computer</td></tr>
<tr><td>...</td><td>...</td></tr>
</table>
</small>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" version="1.1" viewBox="241 277 265 125" width="265pt" height="125pt" xmlns:dc="http://purl.org/dc/elements/1.1/" class="fragment" data-fragment-index="1">
<metadata> Produced by OmniGraffle 6.2.5 <dc:date>2015-09-20 14:45:55 +0000</dc:date></metadata>
<defs><font-face font-family="Helvetica Neue" font-size="16" panose-1="2 0 8 3 0 0 0 9 0 4" units-per-em="1000" underline-position="-100" underline-thickness="50" slope="0" x-height="517" cap-height="714" ascent="975.0061" descent="-216.99524" font-weight="bold"><font-face-src><font-face-name name="HelveticaNeue-Bold"/></font-face-src></font-face><font-face font-family="Helvetica Neue" font-size="16" panose-1="2 0 5 3 0 0 0 2 0 4" units-per-em="1000" underline-position="-100" underline-thickness="50" slope="0" x-height="517" cap-height="714" ascent="951.99585" descent="-212.99744" font-weight="500"><font-face-src><font-face-name name="HelveticaNeue"/></font-face-src></font-face></defs>
<g stroke="none" stroke-opacity="1" stroke-dasharray="none" fill="none" fill-opacity="1">
<title>Canvas 1</title>
<g>
<title>Layer 1</title>
<path d="M 279 351 L 243 369 L 279 387 L 279 389 C 279 394.52285 283.47715 399 289 399 L 494 399 C 499.52285 399 504 394.52285 504 389 L 504 289 C 504 283.47715 499.52285 279 494 279 L 289 279 C 283.47715 279 279 283.47715 279 289 Z" fill="white"/>
<path d="M 279 351 L 243 369 L 279 387 L 279 389 C 279 394.52285 283.47715 399 289 399 L 494 399 C 499.52285 399 504 394.52285 504 389 L 504 289 C 504 283.47715 499.52285 279 494 279 L 289 279 C 283.47715 279 279 283.47715 279 289 Z" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(293 293)" fill="black"><tspan font-family="Helvetica Neue" font-size="16" font-weight="bold" x="0" y="16" textLength="16.896" class="fragment" data-fragment-index="2">Pr</tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="bold" x="16.608" y="16" textLength="69.28" class="fragment" data-fragment-index="2">obability:</tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="500" x="85.888" y="16" textLength="38.24" class="fragment" data-fragment-index="2"> 95%</tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="bold" x="0" y="53" textLength="62.224" class="fragment" data-fragment-index="3">Reason:</tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="500" x="62.224" y="53" textLength="144.912" class="fragment" data-fragment-index="3"> Because I guessed </tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="500" x="0" y="71" textLength="206.592" class="fragment" data-fragment-index="3">Computer for Department </tspan><tspan font-family="Helvetica Neue" font-size="16" font-weight="500" x="0" y="89" textLength="196.16" class="fragment" data-fragment-index="3">on Row 3 of PRODUCTS</tspan></text>
</g>
</g>
</svg>
<p class="fragment" data-fragment-index="1">Allow users to <code>EXPLAIN</code> uncertain outputs</p>
<p class="fragment" data-fragment-index="3">Explanations include reasons given in English</p>
</section>
<section>
<h3>Other Lenses</h3>
<ul>
<li>Schema Matching (equivalently JSON/XML import)</li>
<li>Archival (how stale is my data?)</li>
<li>Type Inference</li>
<li style="color: grey;">Deduplication / Entity Resolution</li>
<li style="color: grey;">Schema Name Inference</li>
<li>And more...</li>
</ul>
</section>
</section>
<section>
<section>
<h2>Demo (Mimir)</h2>
<p><a href="http://demo.odin.cse.buffalo.edu"><img src="https://odin.cse.buffalo.edu/wp-content/uploads/2015/08/Mimir_Screenshot.png" height="400"/></a></p>
</section>
<section>
<h2>Intuitive Uncertainty</h2>
<p><b>UB</b>: Ying Yang, Niccolo Meneghetti, <br/> Arindam Nandi, Vinayak Karuppasamy</p>
<p><b>Oracle</b>: Ronny Fehling, Zhen-Hua Liu, Dieter Gawlick</p>
<h4>Thanks to Oracle for multiple gifts that made this research possible</h4>
</section>
</section>
</div></div>
<script src="../reveal.js-3.1.0/lib/js/head.min.js"></script>
<script src="../reveal.js-3.1.0/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/../reveal.js#configuration
Reveal.initialize({
controls: false,
progress: true,
history: true,
center: true,
slideNumber: true,
transition: 'fade', // none/fade/slide/convex/concave/zoom
// Optional ../reveal.js plugins
dependencies: [
{ src: '../reveal.js-3.1.0/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: '../reveal.js-3.1.0/plugin/math/math.js',
condition: function() { return true; },
mathjax: '../reveal.js-3.1.0/js/MathJax.js'
},
{ src: '../reveal.js-3.1.0/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js-3.1.0/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js-3.1.0/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: '../reveal.js-3.1.0/plugin/zoom-js/zoom.js', async: true },
{ src: '../reveal.js-3.1.0/plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>