Website/slides/talks/2017-1-EDBT-Inference/index.html

<!doctype html>
<html lang="en">

	<head>
		<meta charset="utf-8">

		<title>Leaky Joins</title>

		<meta name="description" content="Convergent Interactive Inference with Leaky Joins">
		<meta name="author" content="Oliver Kennedy">

		<meta name="apple-mobile-web-app-capable" content="yes" />
		<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />

		<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">

		<link rel="stylesheet" href="../reveal.js-3.1.0/css/reveal.css">
		<link rel="stylesheet" href="ubodin.css" id="theme">

		<!-- Code syntax highlighting -->
		<link rel="stylesheet" href="../reveal.js-3.1.0/lib/css/zenburn.css">

		<!-- Printing and PDF exports -->
		<script>
			var link = document.createElement( 'link' );
			link.rel = 'stylesheet';
			link.type = 'text/css';
			link.href = window.location.search.match( /print-pdf/gi ) ? '../reveal.js-3.1.0/css/print/pdf.css' : '../reveal.js-3.1.0/css/print/paper.css';
			document.getElementsByTagName( 'head' )[0].appendChild( link );
		</script>

		<!--[if lt IE 9]>
		<script src="../reveal.js-3.1.0/lib/js/html5shiv.js"></script>
		<![endif]-->
	</head>

	<body>

		<div class="reveal">
		<!-- Any section element inside of this container is displayed as a slide -->

		<div class="header">
			<!-- Any Talk-Specific Header Content Goes Here -->
			<center>
				<a href="http://www.buffalo.edu" target="_blank">
					<img src="../graphics/logos/ub-1line-ro-white.png" height="20"/>
				</a>
			</center>
		</div>
		<div class="footer">
			<!-- Any Talk-Specific Footer Content Goes Here -->
			<div style="float: left; margin-top: 15px; ">
			Exploring <u><b>O</b></u>nline <u><b>D</b></u>ata <u><b>In</b></u>teractions
			</div>
			<a href="http://odin.cse.buffalo.edu" target="_blank">
				<img src="../graphics/logos/odin-1line-white.png" height="40" style="float: right;"/>
			</a>
		</div>

		<div class="slides">

		<section>
			<section>
				<h1>Leaky Joins</h1>
				<h3><u>Ying Yang</u>, Oliver Kennedy</h3>
			</section>

			<section>
				<h1>Leaky Joins</h1>
				<h3>Ying Yang, <u>Oliver Kennedy</u></h3>
			</section>

			<section>
				<img src="graphics/yingyang.jpg" style="float: right; margin-left: 20px;" />
				<h3>Disclaimer</h3>
				<p>Ying could not be here today. If you like her ideas, get in touch with her. </p>
				<p class="fragment">(she's on the job market)</p>
				<p class="fragment">(If you don't, blame my presentation)</p>
			</section>
		</section>

		<section>
			<section>
				<h2>Online Aggregation (OLA)</h2>

				<p style="margin-top: 60px;">$Avg(3,6,10,9,1,3,9,7,9,4,7,9,2,1,2,4,10,8,9,7) = 6$</p>
				<p class="fragment">$Avg(3,6,10,9,1) = 5.8$ <span class="fragment">$\approx 6$</span></p>

				<p class="fragment">$Sum\left(\frac{k}{N} Samples\right) \cdot \frac{N}{k} \approx Sum(*)$</p>

				<p class="fragment" style="font-weight: bold; margin-top: 60px;">Sampling lets you approximate aggregate values with orders of magnitude less data.</p>
			</section>

			<section>
				<h2>Typical OLA Challenges</h2>
				<dl>
					<dt>Birthday Paradox</dt>
						<dd>$Sample(R) \bowtie Sample(S)$ is likely to be empty.</dd>
					<dt>Stratified Sampling</dt>
						<dd>It doesn't matter how important they are to the aggregate, rare samples are still rare.</dd>
					<dt>Replacement</dt>
						<dd> Does the sampling algorithm converge exactly or asymptotically?</dd>
				</dl>
			</section>

			<section>
				<h2>Replacement</h2>
				<dl>
					<dt>Sampling Without Replacement</dt>
						<dd>... eventually converges to a precise answer.</dd>
					<dt>Sampling With Replacement</dt>
						<dd>... doesn't need to track what's been sampled.</dd>
						<dd>... produces a better behaved estimate distribution.</dd>
				</dl>
			</section>
		</section>

		<section>
			<section>
				<img src="graphics/mimir_logo_final.png" />
				<p><a href="http://mimirdb.info">http://mimirdb.info</a></p>
				<p style="font-size: smaller" class="fragment">(not immediately relevant to the talk, but you should check it out)</p>
			</section>
			<section>
				<h2>Roughly 1-2 years ago...</h2>
				<p><b>Ying</b>: To implement {cool feature in Mimir}, we'll need to be able to perform <u>inference</u> on <u>Graphical Models</u>, but we will <u>not know how complex they are</u>.</p>
			</section>
			<section>
				<h2>Graphical Models</h2>
				<p>Joint probability distributions are expensive to store<br/>
				$$p(D, I, G, S, J)$$</p>
				<p>Bayes rule lets us break apart the distribution<br/>
				$$= p(D, I, G, S) \cdot p(J | D, I, G, S)$$</p>
				<p>And conditional independence lets us further simplify<br/>
				$$= p(D, I, G, S) \cdot p(J | G, S)$$</p>
				<p class="fragment">This is basis for a type of graphical model called a "Bayes Net"</p>
			</section>
			<section>
				<h2>Bayesean Networks</h2>
				<svg width="500" height="400">
					<image xlink:href="graphics/studentBN.svg" x="-125" y="-90"
      height="650" width="650" /> 
      		<rect 
    		 		class="fragment" data-fragment-index="1" 
    		 		x="62" y="78" width="78" height="15" 
    		 		style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
      		/>
      		<rect 
    		 		class="fragment" data-fragment-index="2" 
    		 		x="253" y="58.5" width="78" height="15" 
    		 		style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
      		/>
      		<rect 
    		 		class="fragment" data-fragment-index="3" 
    		 		x="316" y="120" width="135" height="15" 
    		 		style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
      		/>
      		<rect 
    		 		class="fragment" data-fragment-index="4" 
    		 		x="1" y="204" width="169" height="15" 
    		 		style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
      		/>
      		<rect 
    		 		class="fragment" data-fragment-index="5" 
    		 		x="276" y="249" width="194" height="15" 
    		 		style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"
      		/>
				</svg>
				<p>$p(D=1, I=0, S=0, G=2, J=1)$<br/>
					<span class="fragment" data-fragment-index="1">$=\;0.5$</span>
					<span class="fragment" data-fragment-index="2">$\cdot\;0.7$</span>
					<span class="fragment" data-fragment-index="3">$\cdot\;0.95$</span>
					<span class="fragment" data-fragment-index="4">$\cdot\;0.25$</span>
					<span class="fragment" data-fragment-index="5">$\cdot\;0.8$</span>
				</p>
			</section>
			<section>
				<p>
					$p(D,I,S,G,J)$ <br/>
					$=$<br/>
					$p(D) \bowtie p(I) \bowtie p(S|I) \bowtie p(G|D,I) \bowtie p(J|G,S)$
				</p>
			</section>
			<section>
				<h2>Inference</h2>
				<p>$p(J) = \sum_{D,I,S,G}p(D,I,S,G,J)$</p>
				(aka the computing the marginal probability)
			</section>
			<section>
				<h2>Inference Algorithms</h2>
				<dl>
					<dt>Exact (e.g. Variable Elimination)</dt>
						<dd>Fast and precise, but scales poorly with graph complexity.</dd>
				  <dt>Approximate (e.g. Gibbs Sampling)</dt>
					  <dd>Consistent performance, but only asymptotic convergence.</dd>
				</dl>
				<p class="fragment"><b>Key Challenge</b>: For {really cool feature} we don't know whether we should use exact or approximate inference.</p>
			</section>
			<section>
				<p>Can we gracefully degrade from exact to approximate inference?</p>
			</section>
		</section>

		<section>
			<section>
				<pre><code>
	SELECT J.J, SUM(D.p * I.p * S.p * G.p * J.p) AS p
	FROM D NATURAL JOIN I NATURAL JOIN S
	       NATURAL JOIN G NATURAL JOIN J
	GROUP BY J.J
				</code></pre>
				<p class="fragment">(Inference is essentially a big group-by aggregate join query)</p>
				<p class="fragment" style="font-size: smaller">(Variable elimination is Aggregate Pushdown + Join Ordering)</p>
			</section>

			<section>
				<h2>Key Idea: OLA</h2>
				<p></p>
			</section>
		</section>

		</div></div>

		<script src="../reveal.js-3.1.0/lib/js/head.min.js"></script>
		<script src="../reveal.js-3.1.0/js/reveal.js"></script>

		<script>

			// Full list of configuration options available at:
			// https://github.com/hakimel/../reveal.js#configuration
			Reveal.initialize({
				controls: false,
				progress: true,
				history: true,
				center: true,
				slideNumber: true,

				transition: 'fade', // none/fade/slide/convex/concave/zoom

				// Optional ../reveal.js plugins
				dependencies: [
					{ src: '../reveal.js-3.1.0/lib/js/classList.js', condition: function() { return !document.body.classList; } },
					{ src: '../reveal.js-3.1.0/plugin/math/math.js', 
						condition: function() { return true; },
						mathjax: '../reveal.js-3.1.0/js/MathJax.js'
					 },
					{ src: '../reveal.js-3.1.0/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
					{ src: '../reveal.js-3.1.0/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
					{ src: '../reveal.js-3.1.0/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
					{ src: '../reveal.js-3.1.0/plugin/zoom-js/zoom.js', async: true },
					{ src: '../reveal.js-3.1.0/plugin/notes/notes.js', async: true }
				]
			});

		</script>

	</body>
</html>
Starting EDBT presentation 2017-03-19 04:39:25 -04:00			`<!doctype html>`
			`<html lang="en">`

			`<head>`
			`<meta charset="utf-8">`

			`<title>Leaky Joins</title>`

			`<meta name="description" content="Convergent Interactive Inference with Leaky Joins">`
			`<meta name="author" content="Oliver Kennedy">`

			`<meta name="apple-mobile-web-app-capable" content="yes" />`
			`<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />`

			`<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">`

			`<link rel="stylesheet" href="../reveal.js-3.1.0/css/reveal.css">`
			`<link rel="stylesheet" href="ubodin.css" id="theme">`

			`<!-- Code syntax highlighting -->`
			`<link rel="stylesheet" href="../reveal.js-3.1.0/lib/css/zenburn.css">`

			`<!-- Printing and PDF exports -->`
			`<script>`
			`var link = document.createElement( 'link' );`
			`link.rel = 'stylesheet';`
			`link.type = 'text/css';`
			`link.href = window.location.search.match( /print-pdf/gi ) ? '../reveal.js-3.1.0/css/print/pdf.css' : '../reveal.js-3.1.0/css/print/paper.css';`
			`document.getElementsByTagName( 'head' )[0].appendChild( link );`
			`</script>`

			`<!--[if lt IE 9]>`
			`<script src="../reveal.js-3.1.0/lib/js/html5shiv.js"></script>`
			`<![endif]-->`
			`</head>`

			`<body>`

			`<div class="reveal">`
			`<!-- Any section element inside of this container is displayed as a slide -->`

			`<div class="header">`
			`<!-- Any Talk-Specific Header Content Goes Here -->`
			`<center>`
			`<a href="http://www.buffalo.edu" target="_blank">`
			`<img src="../graphics/logos/ub-1line-ro-white.png" height="20"/>`
			`</a>`
			`</center>`
			`</div>`
			`<div class="footer">`
			`<!-- Any Talk-Specific Footer Content Goes Here -->`
			`<div style="float: left; margin-top: 15px; ">`
			`Exploring <u><b>O</b></u>nline <u><b>D</b></u>ata <u><b>In</b></u>teractions`
			`</div>`
			`<a href="http://odin.cse.buffalo.edu" target="_blank">`
			`<img src="../graphics/logos/odin-1line-white.png" height="40" style="float: right;"/>`
			`</a>`
			`</div>`

			`<div class="slides">`

			`<section>`
			`<section>`
			`<h1>Leaky Joins</h1>`
			`<h3><u>Ying Yang</u>, Oliver Kennedy</h3>`
			`</section>`

			`<section>`
			`<h1>Leaky Joins</h1>`
			`<h3>Ying Yang, <u>Oliver Kennedy</u></h3>`
			`</section>`

			`<section>`
			`<img src="graphics/yingyang.jpg" style="float: right; margin-left: 20px;" />`
			`<h3>Disclaimer</h3>`
			`<p>Ying could not be here today. If you like her ideas, get in touch with her. </p>`
			`<p class="fragment">(she's on the job market)</p>`
			`<p class="fragment">(If you don't, blame my presentation)</p>`
			`</section>`
			`</section>`

			`<section>`
			`<section>`
			`<h2>Online Aggregation (OLA)</h2>`

			`<p style="margin-top: 60px;">$Avg(3,6,10,9,1,3,9,7,9,4,7,9,2,1,2,4,10,8,9,7) = 6$</p>`
			`<p class="fragment">$Avg(3,6,10,9,1) = 5.8$ <span class="fragment">$\approx 6$</span></p>`

			`<p class="fragment">$Sum\left(\frac{k}{N} Samples\right) \cdot \frac{N}{k} \approx Sum(*)$</p>`

			`<p class="fragment" style="font-weight: bold; margin-top: 60px;">Sampling lets you approximate aggregate values with orders of magnitude less data.</p>`
			`</section>`

			`<section>`
			`<h2>Typical OLA Challenges</h2>`
			`<dl>`
			`<dt>Birthday Paradox</dt>`
			`<dd>$Sample(R) \bowtie Sample(S)$ is likely to be empty.</dd>`
			`<dt>Stratified Sampling</dt>`
			`<dd>It doesn't matter how important they are to the aggregate, rare samples are still rare.</dd>`
			`<dt>Replacement</dt>`
			`<dd> Does the sampling algorithm converge exactly or asymptotically?</dd>`
			`</dl>`
			`</section>`

			`<section>`
			`<h2>Replacement</h2>`
			`<dl>`
			`<dt>Sampling Without Replacement</dt>`
			`<dd>... eventually converges to a precise answer.</dd>`
			`<dt>Sampling With Replacement</dt>`
			`<dd>... doesn't need to track what's been sampled.</dd>`
			`<dd>... produces a better behaved estimate distribution.</dd>`
			`</dl>`
			`</section>`
			`</section>`

			`<section>`
			`<section>`
			`<img src="graphics/mimir_logo_final.png" />`
			`<p><a href="http://mimirdb.info">http://mimirdb.info</a></p>`
			`<p style="font-size: smaller" class="fragment">(not immediately relevant to the talk, but you should check it out)</p>`
			`</section>`
			`<section>`
			`<h2>Roughly 1-2 years ago...</h2>`
			`<p><b>Ying</b>: To implement {cool feature in Mimir}, we'll need to be able to perform <u>inference</u> on <u>Graphical Models</u>, but we will <u>not know how complex they are</u>.</p>`
			`</section>`
			`<section>`
			`<h2>Graphical Models</h2>`
			`<p>Joint probability distributions are expensive to store<br/>`
			`$$p(D, I, G, S, J)$$</p>`
			`<p>Bayes rule lets us break apart the distribution<br/>`
			`$$= p(D, I, G, S) \cdot p(J \| D, I, G, S)$$</p>`
			`<p>And conditional independence lets us further simplify<br/>`
			`$$= p(D, I, G, S) \cdot p(J \| G, S)$$</p>`
			`<p class="fragment">This is basis for a type of graphical model called a "Bayes Net"</p>`
			`</section>`
			`<section>`
			`<h2>Bayesean Networks</h2>`
			`<svg width="500" height="400">`
			`<image xlink:href="graphics/studentBN.svg" x="-125" y="-90"`
			`height="650" width="650" />`
			`<rect`
			`class="fragment" data-fragment-index="1"`
			`x="62" y="78" width="78" height="15"`
			`style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"`
			`/>`
			`<rect`
			`class="fragment" data-fragment-index="2"`
			`x="253" y="58.5" width="78" height="15"`
			`style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"`
			`/>`
			`<rect`
			`class="fragment" data-fragment-index="3"`
			`x="316" y="120" width="135" height="15"`
			`style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"`
			`/>`
			`<rect`
			`class="fragment" data-fragment-index="4"`
			`x="1" y="204" width="169" height="15"`
			`style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"`
			`/>`
			`<rect`
			`class="fragment" data-fragment-index="5"`
			`x="276" y="249" width="194" height="15"`
			`style="fill: rgba(0,0,0,0); stroke: red; stroke-width: 3"`
			`/>`
			`</svg>`
			`<p>$p(D=1, I=0, S=0, G=2, J=1)$<br/>`
			`<span class="fragment" data-fragment-index="1">$=\;0.5$</span>`
			`<span class="fragment" data-fragment-index="2">$\cdot\;0.7$</span>`
			`<span class="fragment" data-fragment-index="3">$\cdot\;0.95$</span>`
			`<span class="fragment" data-fragment-index="4">$\cdot\;0.25$</span>`
			`<span class="fragment" data-fragment-index="5">$\cdot\;0.8$</span>`
			`</p>`
			`</section>`
			`<section>`
			`<p>`
			`$p(D,I,S,G,J)$ <br/>`
			`$=$<br/>`
			$p(D) \bowtie p(I) \bowtie p(S\|I) \bowtie p(G\|D,I) \bowtie p(J\|G,S)$
			`</p>`
			`</section>`
			`<section>`
			`<h2>Inference</h2>`
			`<p>$p(J) = \sum_{D,I,S,G}p(D,I,S,G,J)$</p>`
			`(aka the computing the marginal probability)`
			`</section>`
			`<section>`
			`<h2>Inference Algorithms</h2>`
			`<dl>`
			`<dt>Exact (e.g. Variable Elimination)</dt>`
			`<dd>Fast and precise, but scales poorly with graph complexity.</dd>`
			`<dt>Approximate (e.g. Gibbs Sampling)</dt>`
			`<dd>Consistent performance, but only asymptotic convergence.</dd>`
			`</dl>`
			`<p class="fragment"><b>Key Challenge</b>: For {really cool feature} we don't know whether we should use exact or approximate inference.</p>`
			`</section>`
			`<section>`
			`<p>Can we gracefully degrade from exact to approximate inference?</p>`
			`</section>`
			`</section>`

			`<section>`
			`<section>`
			`<pre><code>`
			`SELECT J.J, SUM(D.p * I.p * S.p * G.p * J.p) AS p`
			`FROM D NATURAL JOIN I NATURAL JOIN S`
			`NATURAL JOIN G NATURAL JOIN J`
			`GROUP BY J.J`
			`</code></pre>`
			`<p class="fragment">(Inference is essentially a big group-by aggregate join query)</p>`
			`<p class="fragment" style="font-size: smaller">(Variable elimination is Aggregate Pushdown + Join Ordering)</p>`
			`</section>`

			`<section>`
			`<h2>Key Idea: OLA</h2>`
			`<p></p>`
			`</section>`
			`</section>`

			`</div></div>`

			`<script src="../reveal.js-3.1.0/lib/js/head.min.js"></script>`
			`<script src="../reveal.js-3.1.0/js/reveal.js"></script>`

			`<script>`

			`// Full list of configuration options available at:`
			`// https://github.com/hakimel/../reveal.js#configuration`
			`Reveal.initialize({`
			`controls: false,`
			`progress: true,`
			`history: true,`
			`center: true,`
			`slideNumber: true,`

			`transition: 'fade', // none/fade/slide/convex/concave/zoom`

			`// Optional ../reveal.js plugins`
			`dependencies: [`
			`{ src: '../reveal.js-3.1.0/lib/js/classList.js', condition: function() { return !document.body.classList; } },`
			`{ src: '../reveal.js-3.1.0/plugin/math/math.js',`
			`condition: function() { return true; },`
			`mathjax: '../reveal.js-3.1.0/js/MathJax.js'`
			`},`
			`{ src: '../reveal.js-3.1.0/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },`
			`{ src: '../reveal.js-3.1.0/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },`
			`{ src: '../reveal.js-3.1.0/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },`
			`{ src: '../reveal.js-3.1.0/plugin/zoom-js/zoom.js', async: true },`
			`{ src: '../reveal.js-3.1.0/plugin/notes/notes.js', async: true }`
			`]`
			`});`

			`</script>`

			`</body>`
			`</html>`