This commit is contained in:
Oliver Kennedy 2017-09-25 13:20:04 -04:00
parent e1e470c642
commit 7bebdc85b3
4 changed files with 150 additions and 25 deletions

View file

Before

Width:  |  Height:  |  Size: 9.6 KiB

After

Width:  |  Height:  |  Size: 9.6 KiB

View file

@ -246,28 +246,28 @@
<section>
<h2>CSV Import</h2>
<h4>Run a <code>SELECT</code> on a raw CSV File</h4>
<ul class="fragment">
<ul>
<li>File may not have column headers</li>
<li>CSV does not provide "types"</li>
<li>Lines may be missing fields</li>
<li>Fields may be mistyped (typo, missing comma)</li>
<li>Comment text can be inlined into the file</li>
</ul>
<p class="fragment">
<b>State of the art</b>: External Table Defn <span class="fragment">+ "Manually" edit CSV</span>
<p>
<b>State of the art</b>: External Table Defn <span>+ "Manually" edit CSV</span>
</p>
</section>
<section>
<h2>Merge Two Datasets</h2>
<h4><code>UNION</code> two data sources</h4>
<ul class="fragment">
<ul>
<li>Schema matching</li>
<li>Deduplication</li>
<li>Format alignment (GIS coordinates, $ vs €)
<li>Precision alignment (State vs County)</li>
</ul>
<p class="fragment">
<p>
<b>State of the art</b>: Manually map schema
</p>
</section>
@ -275,19 +275,17 @@
<section>
<h2>JSON Shredding</h2>
<h4>Run a <code>SELECT</code> on JSON or a Doc Store</h4>
<ul class="fragment">
<ul>
<li>Separating fields and record sets:<br/>(e.g., <code>{ A: "Bob", B: "Alice" }</code>)</li>
<li>Missing fields (Records with no 'address')</li>
<li>Type alignment (Records with 'address' as an array)</li>
<li>Schema matching$^2$</li>
</ul>
<p class="fragment">
<p>
<b>State of the art</b>: DataGuide, Wrangler, etc...
</p>
</section>
</section>
<section>
<section>
<h2>Data Cleaning is Hard!</h2>
</section>
@ -300,21 +298,14 @@
<p>Alice spends weeks cleaning her data before using it.</p>
</section>
<section>
<h3>Newer State of the Art</h3>
<img src="graphics/iu.jpeg" height=500 />
<attribution>(azure.microsoft.com)</attribution>
</section>
<section>
<img src="graphics/data-lake-to-data-swamp.jpg" height=500 />
<attribution>(timoelliott.com)</attribution>
</section>
</section>
<section>
<section>
<h2>The database is in the way</h2>
</section>
<section>
<h3>
In the name of Codd,<br/><span class="fragment grow highlight-current-blue">thou shalt not give the user a wrong answer.</span>
@ -327,6 +318,7 @@
What would it take for that to be ok?
</h4>
</section>
<section>
<h2>Industry says...</h2>
</section>
@ -396,7 +388,7 @@
</section>
<section>
<h2>What if a database did the same?</h2>
<h3>What if a database did the same?</h3>
<h4 class="fragment">(they can)</h4>
</section>
@ -404,12 +396,145 @@
<section>
<h3>On representing incomplete information in a relational data base</h3>
<h4>T. Imielinski &amp; W. Lipski Jr.<span style="margin-left: 40px">(<i>VLDB <span class="fragment grow highlight-current-red">1981</span></i>)</span></h4>
<p class="fragment">
Incomplete and Probabilistic
<h4>T. Imielinski &amp; W. Lipski Jr.<span style="margin-left: 40px">(<i>VLDB <span class="fragment highlight-current-red" data-fragment-index="1">1981</span></i>)</span></h4>
<p class="fragment" data-fragment-index="1" style="margin-top: 60px">
Incomplete and Probabilistic Databases<br/>have existed since the 1980s
</p>
</section>
<section>
<svg width="800" height="500">
<g transform="translate(150,0)">
<image
xlink:href="graphics/db.svg"
width="93" height="103"
x="0" y="10"
/>
<image
xlink:href="graphics/db.svg"
width="93" height="103"
x="0" y="130"
/>
<image
xlink:href="graphics/db.svg"
width="93" height="103"
x="0" y="250"
/>
<image
xlink:href="graphics/db.svg"
width="93" height="103"
x="0" y="370"
/>
</g>
<g
transform="translate(250, 0)"
class="fragment"
style="
fill: rgba(200, 50, 50, 0);
stroke-width: 4;
stroke: rgba(200, 200, 200, 1);
">
<polyline
points="0,60 220,60 200,50 220,60 200,70 220,60 0,60"
transform="translate(0,0)"
/>
<polyline
points="0,60 220,60 200,50 220,60 200,70 220,60 0,60"
transform="translate(0,120)"
/>
<polyline
points="0,60 220,60 200,50 220,60 200,70 220,60 0,60"
transform="translate(0,240)"
/>
<polyline
points="0,60 220,60 200,50 220,60 200,70 220,60 0,60"
transform="translate(0,360)"
/>
<text x="60" y="50">Q(D)</text>
<text x="60" y="170">Q(D)</text>
<text x="60" y="290">Q(D)</text>
<text x="60" y="410">Q(D)</text>
<image
xlink:href="graphics/jean-victor-balin-icon-table.svg"
width="96" height="96"
x="230" y="15"
/>
<image
xlink:href="graphics/jean-victor-balin-icon-table.svg"
width="96" height="96"
x="230" y="135"
/>
<image
xlink:href="graphics/jean-victor-balin-icon-table.svg"
width="96" height="96"
x="230" y="255"
/>
<image
xlink:href="graphics/jean-victor-balin-icon-table.svg"
width="96" height="96"
x="230" y="375"
/>
</g>
<g
transform="translate(0, 0)"
class="fragment"
style="
fill: rgba(200, 50, 50, 0);
stroke-width: 4;
stroke: rgba(200, 200, 200, 1);
">
<polyline
points="20,60 140,60 120,50 140,60 120,70 140,60"
transform="translate(0,200) rotate(-60)"
/>
<polyline
points="70,60 140,60 120,50 140,60 120,70 140,60"
transform="translate(-15,200) rotate(-20)"
/>
<polyline
points="70,60 140,60 120,50 140,60 120,70 140,60"
transform="translate(25,170) rotate(20)"
/>
<polyline
points="20,60 140,60 120,50 140,60 120,70 140,60"
transform="translate(102,220) rotate(60)"
/>
<text x="40" y="250">?</text>
</g>
<g
transform="translate(540, 0)"
class="fragment"
style="
fill: rgba(200, 50, 50, 0);
stroke-width: 4;
stroke: rgba(200, 200, 200, 1);
">
<polyline
points="20,60 140,60 120,50 140,60 120,70 140,60"
transform="translate(102,30) rotate(60)"
/>
<polyline
points="70,60 140,60 120,50 140,60 120,70 140,60"
transform="translate(0,120) rotate(20)"
/>
<polyline
points="70,60 140,60 120,50 140,60 120,70 140,60"
transform="translate(-40,240) rotate(-20)"
/>
<polyline
points="20,60 140,60 120,50 140,60 120,70 140,60"
transform="translate(0,390) rotate(-60)"
/>
<image
xlink:href="graphics/dagobert83-female-user-icon-800px.png"
width="100" height="100"
x="110" y="180"
/>
<text x="180" y="190">?</text>
</g>
</svg>
</section>
</div></div>

View file

@ -12,7 +12,7 @@ UB-CSE is celebrating [50 years of Computer Science and Engineering](http://cse.
The ODIn Lab will be showing up in force at the [CSE50 Undergraduate and Gradeuate conferences](https://engineering.buffalo.edu/computer-science-engineering/news-events/cse50.program.html).
* Lisa and Olivia will demo Mimir at the Undergraduate Event during the Welcome Reception on Thursday.
* Poonam, Will, Aaron, and Lisa will present on <a href="/papers/2017/CSE50/mimir.pdf">Mimir</a> at the Graduate Poster Session on Saturday.
* Poonam, Will, Aaron, Shivang, and Lisa will present on <a href="/papers/2017/CSE50/mimir.pdf">Mimir</a> at the Graduate Poster Session on Saturday.
* Saurav and Darshana will present on <a href="/papers/2017/CSE50/jitds.pdf">JITDs</a> at the Graduate Poster Session on Saturday.
* Duc, Ting, and Gokhan will present on <a href="/papers/2017/CSE50/insiderthreats.pdf">The Insider Threats project</a> at the Graduate Poster Session on Saturday.
* Gourab, Gokhan, and Carl will present on <a href="papers/2017/CSE50/pocketdata.pgf">The PocketData project</a> at the Graduate Poster Session on Saturday.