Website/slides/talks/2017-1-EDBT-Inference/index.html

473 lines
17 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Leaky Joins</title>
<meta name="description" content="Convergent Interactive Inference with Leaky Joins">
<meta name="author" content="Oliver Kennedy">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="../reveal.js-3.1.0/css/reveal.css">
<link rel="stylesheet" href="ubodin.css" id="theme">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="../reveal.js-3.1.0/lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../reveal.js-3.1.0/css/print/pdf.css' : '../reveal.js-3.1.0/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="../reveal.js-3.1.0/lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="header">
<!-- Any Talk-Specific Header Content Goes Here -->
<center>
<a href="http://www.buffalo.edu" target="_blank">
<img src="../graphics/logos/ub-1line-ro-white.png" height="20"/>
</a>
</center>
</div>
<div class="footer">
<!-- Any Talk-Specific Footer Content Goes Here -->
<div style="float: left; margin-top: 15px; ">
Exploring <u><b>O</b></u>nline <u><b>D</b></u>ata <u><b>In</b></u>teractions
</div>
<a href="https://odin.cse.buffalo.edu" target="_blank">
<img src="../graphics/logos/odin-1line-white.png" height="40" style="float: right;"/>
</a>
</div>
<div class="slides">
<section>
<section>
<h1>Leaky Joins</h1>
<h3><u>Ying Yang</u>, Oliver Kennedy</h3>
</section>
<section>
<h1>Leaky Joins</h1>
<h3>Ying Yang, <u>Oliver Kennedy</u></h3>
</section>
<section>
<img src="graphics/yingyang.jpg" style="float: right; margin-left: 20px; padding-top: 20px" />
<h3>Disclaimer</h3>
<p>Ying could not be here today. If you like her ideas, get in touch with her. </p>
<p class="fragment">(If you don't, blame my presentation)</p>
<p class="fragment">(Also, Ying is on the job market)</p>
</section>
</section>
<section>
<section>
<img src="graphics/mimir_logo_final.png" />
<p><a href="http://mimirdb.info">http://mimirdb.info</a></p>
<p style="font-size: smaller" class="fragment">(not immediately relevant to the talk, but you should check it out)</p>
</section>
<section>
<h2>Roughly 1-2 years ago...</h2>
<p><b>Ying</b>: To implement {cool feature in Mimir}, we'll need to be able to perform <u>inference</u> on <u>Graphical Models</u>, but we will <u>not know how complex they are</u>.</p>
</section>
<section>
<h2>Graphical Models</h2>
<p>Joint probability distributions are expensive to store<br/>
$$p(D, I, G, S, J)$$</p>
<p class="fragment">Bayes rule lets us break apart the distribution<br/>
$$= p(D, I, G, S) \cdot p(J | D, I, G, S)$$</p>
<p class="fragment">And conditional independence lets us further simplify<br/>
$$= p(D, I, G, S) \cdot p(J | G, S)$$</p>
<p class="fragment">This is basis for a type of graphical model called a "Bayes Net"</p>
</section>
<section>
<h2>Bayesean Networks</h2>
<svg width="500" height="400">
<image xlink:href="graphics/studentBN.svg" x="-125" y="-90"
height="650" width="650" />
<rect
class="fragment" data-fragment-index="1"
x="62" y="78" width="78" height="15"
style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
/>
<rect
class="fragment" data-fragment-index="2"
x="253" y="58.5" width="78" height="15"
style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
/>
<rect
class="fragment" data-fragment-index="3"
x="316" y="120" width="135" height="15"
style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
/>
<rect
class="fragment" data-fragment-index="4"
x="1" y="204" width="169" height="15"
style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
/>
<rect
class="fragment" data-fragment-index="5"
x="276" y="249" width="194" height="15"
style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
/>
</svg>
<p>$p(D=1, I=0, S=0, G=2, J=1)$<br/>
<span class="fragment" data-fragment-index="1">$=\;0.5$</span>
<span class="fragment" data-fragment-index="2">$\cdot\;0.7$</span>
<span class="fragment" data-fragment-index="3">$\cdot\;0.95$</span>
<span class="fragment" data-fragment-index="4">$\cdot\;0.25$</span>
<span class="fragment" data-fragment-index="5">$\cdot\;0.8$</span>
<span class="fragment" data-fragment-index="6">$=\;0.0665$</span>
</p>
</section>
<section>
<p>
$p(D,I,S,G,J)$ <br/>
$=$<br/>
$p(D) \bowtie p(I) \bowtie p(S|I) \bowtie p(G|D,I) \bowtie p(J|G,S)$
</p>
</section>
<section>
<h2>Inference</h2>
<p>$p(J) = \sum_{D,I,S,G}p(D,I,S,G,J)$</p>
(aka the computing the marginal probability)
</section>
<section>
<h2>Inference Algorithms</h2>
<dl>
<dt>Exact (e.g. Variable Elimination)</dt>
<dd>Fast and precise, but scales poorly with graph complexity.</dd>
<dt>Approximate (e.g. Gibbs Sampling)</dt>
<dd>Consistent performance, but only asymptotic convergence.</dd>
</dl>
<p class="fragment"><b>Key Challenge</b>: For {really cool feature} we don't know whether we should use exact or approximate inference.</p>
</section>
<section>
<p>Can we gracefully degrade from exact to approximate inference?</p>
</section>
</section>
<section>
<section>
<pre><code>
SELECT J.J, SUM(D.p * I.p * S.p * G.p * J.p) AS p
FROM D NATURAL JOIN I NATURAL JOIN S
NATURAL JOIN G NATURAL JOIN J
GROUP BY J.J
</code></pre>
<p class="fragment">(Inference is essentially a big group-by aggregate join query)</p>
<p class="fragment" style="font-size: smaller">(Variable elimination is Aggregate Pushdown + Join Ordering)</p>
</section>
<section>
<h2>Idea: Online Aggregation</h2>
</section>
<section>
<h2>Online Aggregation (OLA)</h2>
<p style="margin-top: 60px;">$Avg(3,6,10,9,1,3,9,7,9,4,7,9,2,1,2,4,10,8,9,7) = 6$</p>
<p class="fragment">$Avg(3,6,10,9,1) = 5.8$ <span class="fragment">$\approx 6$</span></p>
<p class="fragment">$Sum\left(\frac{k}{N} Samples\right) \cdot \frac{N}{k} \approx Sum(*)$</p>
<p class="fragment" style="font-weight: bold; margin-top: 60px;">Sampling lets you approximate aggregate values with orders of magnitude less data.</p>
</section>
<section>
<h2>Typical OLA Challenges</h2>
<dl>
<dt>Birthday Paradox</dt>
<dd>$Sample(R) \bowtie Sample(S)$ is likely to be empty.</dd>
<dt>Stratified Sampling</dt>
<dd>It doesn't matter how important they are to the aggregate, rare samples are still rare.</dd>
<dt>Replacement</dt>
<dd> Does the sampling algorithm converge exactly or asymptotically?</dd>
</dl>
</section>
<section>
<h2>Replacement</h2>
<dl>
<dt>Sampling Without Replacement</dt>
<dd>... eventually converges to a precise answer.</dd>
<dt>Sampling With Replacement</dt>
<dd>... doesn't need to track what's been sampled.</dd>
<dd>... produces a better behaved estimate distribution.</dd>
</dl>
</section>
<section>
<h2>OLA over GMs</h2>
<dl>
<dt>Tables are Small</dt>
<dd>Compute, not IO is the bottleneck.</dd>
<dt>Tables are Dense</dt>
<dd>Birthday Paradox and Stratified Sampling irrelevant.</dd>
<dt>Queries have High Tree-Width</dt>
<dd>Intermediate tables are large.</dd>
</dl>
<p class="fragment" style="font-weight: bold;">Classical OLA techniques aren't entirely appropriate.</p>
</section>
</section>
<section>
<section>
<h2>(Naive) OLA: Cyclic Sampling</h2>
</section>
<section>
<h2>A Few Quick Insights</h2>
<ol>
<li class="fragment">Small Tables make random access to data possible.</li>
<li class="fragment">Dense Tables mean we can sample directly from join outputs.</li>
<li class="fragment">Cyclic PRNGs like Linear Congruential Generators can be used to generate a <u>randomly ordered</u>, but <u>non-repeating</u> sequence of integers from $0$ to any $N$ in constant memory.</li>
</ol>
</section>
<section>
<h2>Linear Congruential Generators</h2>
<p>If you pick $a$, $b$, and $N$ correctly, then the sequence:<p>
<p>$K_i = (a\cdot K_{i1}+b)\;mod\;N$</p>
<p>will produce $N$ distinct, pseudorandom integers $K_i \in [0, N)$</p>
</section>
<section>
<h2>Cyclic Sampling</h2>
<p>To marginalize $p(\{X_i\})$...</p>
<ol>
<li>Init an LCG with a cycle of $N = \prod_i |dom(X_i)|$</li>
<li>Use the LCG to sample $\{x_i\} \in \{X_i\}$</li>
<li>Incorporate $p(x_i = X_i)$ into the OLA estimate</li>
<li>Repeat from 2 until done</li>
</ol>
</section>
<section>
<h2>Accuracy</h2>
<dl>
<dt>Sampling with Replacement</dt>
<dd>Chernoff Bounds, Hoeffding Bounds give an $\epsilon-\delta$ guarantee on the sum/avg of a sample <u>with replacement</u>.</dd>
<dt>Without Replacement?</dt>
<dd class="fragment">Serfling et. al. have a variant of Hoeffding Bounds for sampling without replacement.</dd>
</dl>
<p></p>
</section>
<section>
<h2>Cyclic Sampling</h2>
<dl>
<dt>Advantages</dt>
<dd>Progressively better estimates over time.</dd>
<dd>Converges in bounded time.</dd>
<dt>Disadvantages</dt>
<dd>Exponential time in the number of variables.</dd>
</dl>
</section>
</section>
<section>
<section>
<h2>Better OLA: Leaky Joins</h2>
<p class="fragment">Make Cyclic Sampling into a composable operator</p>
</section>
<section>
<img src="graphics/JoinGraph.svg" height="400" style="float:left"/>
<table style="float:right">
<tr><th>$G$</th><th>#</th><th>$\sum p_{\psi_2}$</th></tr>
<tr class="fragment" data-fragment-index="2"><td>1</td><td>1</td><td>0.126</td></tr>
<tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
</table>
<table>
<tr><th>$I$</th><th>$G$</th><th>#</th><th>$\sum p_{\psi_1}$</th></tr>
<tr class="fragment" data-fragment-index="1"><td>0</td><td>1</td><td>1</td><td>0.18</td></tr>
<tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
</table>
</section>
<section>
<img src="graphics/JoinGraph.svg" height="400" style="float:left"/>
<table style="float:right">
<tr><th>$G$</th><th>#</th><th>$\sum p_{\psi_2}$</th></tr>
<tr><td>1</td><td>3</td><td>0.348</td></tr>
<tr><td>2</td><td>4</td><td>0.288</td></tr>
<tr><td>3</td><td>4</td><td>0.350</td></tr>
</table>
<table>
<tr><th>$I$</th><th>$G$</th><th>#</th><th>$\sum p_{\psi_1}$</th></tr>
<tr><td>0</td><td>1</td><td>2</td><td>0.140</td></tr>
<tr><td>1</td><td>1</td><td>2</td><td>0.222</td></tr>
<tr><td>0</td><td>2</td><td>2</td><td>0.238</td></tr>
<tr><td>1</td><td>2</td><td>2</td><td>0.050</td></tr>
<tr><td>0</td><td>3</td><td>2</td><td>0.322</td></tr>
<tr><td>1</td><td>3</td><td>2</td><td>0.028</td></tr>
</table>
</section>
<section>
<img src="graphics/JoinGraph.svg" height="400" style="float:left"/>
<table style="float:right">
<tr><th>$G$</th><th>#</th><th>$\sum p_{\psi_2}$</th></tr>
<tr><td>1</td><td>4</td><td>0.362</td></tr>
<tr><td>2</td><td>4</td><td>0.288</td></tr>
<tr><td>3</td><td>4</td><td>0.350</td></tr>
</table>
<table>
<tr><th>$I$</th><th>$G$</th><th>#</th><th>$\sum p_{\psi_1}$</th></tr>
<tr><td>0</td><td>1</td><td>2</td><td>0.140</td></tr>
<tr><td>1</td><td>1</td><td>2</td><td>0.222</td></tr>
<tr><td>0</td><td>2</td><td>2</td><td>0.238</td></tr>
<tr><td>1</td><td>2</td><td>2</td><td>0.050</td></tr>
<tr><td>0</td><td>3</td><td>2</td><td>0.322</td></tr>
<tr><td>1</td><td>3</td><td>2</td><td>0.028</td></tr>
</table>
</section>
<section>
<h2>Leaky Joins</h2>
<ol>
<li>Build a normal join/aggregate graph as in variable elimination: One Cyclic Sampler for each Join+Aggregate.</li>
<li>Keep advancing Cyclic Samplers in parallel, resetting their output after every cycle so samples "leak" through.</li>
<li>When the sampler completes one full cycle with a complete input, mark it complete and stop sampling it.</li>
<li>Continue until a desired accuracy is reached or all tables marked complete.</li>
</ol>
</section>
<section>
<p>There's a bit of extra math to compute $\epsilon-\delta$ bounds by adapting Serfling's results. It's in the paper.</p>
</section>
</section>
<section>
<section>
<h2>Experiments</h2>
<dl>
<dt>Microbenchmarks</dt>
<dd>Fix time, vary domain size, measure accuracy</dd>
<dd>Fix domain size, vary time, measure accuracy</dd>
<dd>Vary domain size, measure time to completion</dd>
<dt>Macrobenchmarks</dt>
<dd>4 graphs from the bnlearn Repository</dd>
</dl>
</section>
<section>
<h2>Microbenchmarks</h2>
<img src="graphics/extended_student.png" />
<p><b>Student</b>: A common benchmark graph.</p>
</section>
<section>
<h2>Accuracy vs Domain</h2>
<img src="graphics/fixed_time.png" height="400"/>
<p class="fragment" style="font-weight: bold">VE is binary: It completes, or it doesn't.</p>
</section>
<section>
<h2>Accuracy vs Time</h2>
<img src="graphics/student_avg.png" height="400"/>
<p class="fragment" style="font-weight: bold">CS gets early results faster, but is overtaken by LJ.</p>
</section>
<section>
<h2>Domain vs Time to 100%</h2>
<img src="graphics/student_scaling.png" height="400"/>
<p class="fragment" style="font-weight: bold">LJ is only 3-5x slower than VE.</p>
</section>
<section>
<h2>"Child"</h2>
<div>
<img src="graphics/child.png" style="float:left" height="280">
<img src="graphics/child_avg.png" height="350">
</div>
<p class="fragment" style="font-weight: bold; float:clear">LJ converges to an exact result before Gibbs gets an approx.</p>
</section>
<section>
<h2>"Insurance"</h2>
<div>
<img src="graphics/insurance.png" style="float:left" height="280">
<img src="graphics/insurance_avg.png" height="350">
</div>
<p class="fragment" style="font-weight: bold; float:clear">On some graphs Gibbs is better, but only marginally.</p>
</section>
<section>
<h2>More graphs in the paper.</h2>
</section>
</section>
<section>
<h2>Leaky Joins</h2>
<ul>
<li>Classical OLA isn't appropriate for GMs.</li>
<li><b>Idea 1</b>: LCGs can sample <u>without</u> replacement.</li>
<li><b>Idea 2</b>: "Leak" samples through a normal join graph.</li>
<li>Compared to both Variable Elim. and Gibbs Sampling, Leaky Joins are often better and never drastically worse.</li>
</ul>
<p class="fragment" style="font-weight: bold">Questions?</p>
</section>
</div></div>
<script src="../reveal.js-3.1.0/lib/js/head.min.js"></script>
<script src="../reveal.js-3.1.0/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/../reveal.js#configuration
Reveal.initialize({
controls: false,
progress: true,
history: true,
center: true,
slideNumber: true,
transition: 'fade', // none/fade/slide/convex/concave/zoom
// Optional ../reveal.js plugins
dependencies: [
{ src: '../reveal.js-3.1.0/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: '../reveal.js-3.1.0/plugin/math/math.js',
condition: function() { return true; },
mathjax: '../reveal.js-3.1.0/js/MathJax.js'
},
{ src: '../reveal.js-3.1.0/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js-3.1.0/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js-3.1.0/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: '../reveal.js-3.1.0/plugin/zoom-js/zoom.js', async: true },
{ src: '../reveal.js-3.1.0/plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>