oops
This commit is contained in:
parent
e1e470c642
commit
7bebdc85b3
Before Width: | Height: | Size: 9.6 KiB After Width: | Height: | Size: 9.6 KiB |
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
|
@ -246,28 +246,28 @@
|
|||
<section>
|
||||
<h2>CSV Import</h2>
|
||||
<h4>Run a <code>SELECT</code> on a raw CSV File</h4>
|
||||
<ul class="fragment">
|
||||
<ul>
|
||||
<li>File may not have column headers</li>
|
||||
<li>CSV does not provide "types"</li>
|
||||
<li>Lines may be missing fields</li>
|
||||
<li>Fields may be mistyped (typo, missing comma)</li>
|
||||
<li>Comment text can be inlined into the file</li>
|
||||
</ul>
|
||||
<p class="fragment">
|
||||
<b>State of the art</b>: External Table Defn <span class="fragment">+ "Manually" edit CSV</span>
|
||||
<p>
|
||||
<b>State of the art</b>: External Table Defn <span>+ "Manually" edit CSV</span>
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h2>Merge Two Datasets</h2>
|
||||
<h4><code>UNION</code> two data sources</h4>
|
||||
<ul class="fragment">
|
||||
<ul>
|
||||
<li>Schema matching</li>
|
||||
<li>Deduplication</li>
|
||||
<li>Format alignment (GIS coordinates, $ vs €)
|
||||
<li>Precision alignment (State vs County)</li>
|
||||
</ul>
|
||||
<p class="fragment">
|
||||
<p>
|
||||
<b>State of the art</b>: Manually map schema
|
||||
</p>
|
||||
</section>
|
||||
|
@ -275,19 +275,17 @@
|
|||
<section>
|
||||
<h2>JSON Shredding</h2>
|
||||
<h4>Run a <code>SELECT</code> on JSON or a Doc Store</h4>
|
||||
<ul class="fragment">
|
||||
<ul>
|
||||
<li>Separating fields and record sets:<br/>(e.g., <code>{ A: "Bob", B: "Alice" }</code>)</li>
|
||||
<li>Missing fields (Records with no 'address')</li>
|
||||
<li>Type alignment (Records with 'address' as an array)</li>
|
||||
<li>Schema matching$^2$</li>
|
||||
</ul>
|
||||
<p class="fragment">
|
||||
<p>
|
||||
<b>State of the art</b>: DataGuide, Wrangler, etc...
|
||||
</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<section>
|
||||
<h2>Data Cleaning is Hard!</h2>
|
||||
</section>
|
||||
|
@ -300,21 +298,14 @@
|
|||
|
||||
<p>Alice spends weeks cleaning her data before using it.</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Newer State of the Art</h3>
|
||||
<img src="graphics/iu.jpeg" height=500 />
|
||||
<attribution>(azure.microsoft.com)</attribution>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<img src="graphics/data-lake-to-data-swamp.jpg" height=500 />
|
||||
<attribution>(timoelliott.com)</attribution>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
|
||||
<section>
|
||||
<h2>The database is in the way</h2>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>
|
||||
In the name of Codd,<br/><span class="fragment grow highlight-current-blue">thou shalt not give the user a wrong answer.</span>
|
||||
|
@ -327,6 +318,7 @@
|
|||
What would it take for that to be ok?
|
||||
</h4>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h2>Industry says...</h2>
|
||||
</section>
|
||||
|
@ -396,7 +388,7 @@
|
|||
</section>
|
||||
|
||||
<section>
|
||||
<h2>What if a database did the same?</h2>
|
||||
<h3>What if a database did the same?</h3>
|
||||
<h4 class="fragment">(they can)</h4>
|
||||
</section>
|
||||
|
||||
|
@ -404,12 +396,145 @@
|
|||
|
||||
<section>
|
||||
<h3>On representing incomplete information in a relational data base</h3>
|
||||
<h4>T. Imielinski & W. Lipski Jr.<span style="margin-left: 40px">(<i>VLDB <span class="fragment grow highlight-current-red">1981</span></i>)</span></h4>
|
||||
<p class="fragment">
|
||||
Incomplete and Probabilistic
|
||||
<h4>T. Imielinski & W. Lipski Jr.<span style="margin-left: 40px">(<i>VLDB <span class="fragment highlight-current-red" data-fragment-index="1">1981</span></i>)</span></h4>
|
||||
<p class="fragment" data-fragment-index="1" style="margin-top: 60px">
|
||||
Incomplete and Probabilistic Databases<br/>have existed since the 1980s
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<svg width="800" height="500">
|
||||
<g transform="translate(150,0)">
|
||||
<image
|
||||
xlink:href="graphics/db.svg"
|
||||
width="93" height="103"
|
||||
x="0" y="10"
|
||||
/>
|
||||
<image
|
||||
xlink:href="graphics/db.svg"
|
||||
width="93" height="103"
|
||||
x="0" y="130"
|
||||
/>
|
||||
<image
|
||||
xlink:href="graphics/db.svg"
|
||||
width="93" height="103"
|
||||
x="0" y="250"
|
||||
/>
|
||||
<image
|
||||
xlink:href="graphics/db.svg"
|
||||
width="93" height="103"
|
||||
x="0" y="370"
|
||||
/>
|
||||
</g>
|
||||
<g
|
||||
transform="translate(250, 0)"
|
||||
class="fragment"
|
||||
style="
|
||||
fill: rgba(200, 50, 50, 0);
|
||||
stroke-width: 4;
|
||||
stroke: rgba(200, 200, 200, 1);
|
||||
">
|
||||
<polyline
|
||||
points="0,60 220,60 200,50 220,60 200,70 220,60 0,60"
|
||||
transform="translate(0,0)"
|
||||
/>
|
||||
<polyline
|
||||
points="0,60 220,60 200,50 220,60 200,70 220,60 0,60"
|
||||
transform="translate(0,120)"
|
||||
/>
|
||||
<polyline
|
||||
points="0,60 220,60 200,50 220,60 200,70 220,60 0,60"
|
||||
transform="translate(0,240)"
|
||||
/>
|
||||
<polyline
|
||||
points="0,60 220,60 200,50 220,60 200,70 220,60 0,60"
|
||||
transform="translate(0,360)"
|
||||
/>
|
||||
<text x="60" y="50">Q(D)</text>
|
||||
<text x="60" y="170">Q(D)</text>
|
||||
<text x="60" y="290">Q(D)</text>
|
||||
<text x="60" y="410">Q(D)</text>
|
||||
<image
|
||||
xlink:href="graphics/jean-victor-balin-icon-table.svg"
|
||||
width="96" height="96"
|
||||
x="230" y="15"
|
||||
/>
|
||||
<image
|
||||
xlink:href="graphics/jean-victor-balin-icon-table.svg"
|
||||
width="96" height="96"
|
||||
x="230" y="135"
|
||||
/>
|
||||
<image
|
||||
xlink:href="graphics/jean-victor-balin-icon-table.svg"
|
||||
width="96" height="96"
|
||||
x="230" y="255"
|
||||
/>
|
||||
<image
|
||||
xlink:href="graphics/jean-victor-balin-icon-table.svg"
|
||||
width="96" height="96"
|
||||
x="230" y="375"
|
||||
/>
|
||||
</g>
|
||||
<g
|
||||
transform="translate(0, 0)"
|
||||
class="fragment"
|
||||
style="
|
||||
fill: rgba(200, 50, 50, 0);
|
||||
stroke-width: 4;
|
||||
stroke: rgba(200, 200, 200, 1);
|
||||
">
|
||||
<polyline
|
||||
points="20,60 140,60 120,50 140,60 120,70 140,60"
|
||||
transform="translate(0,200) rotate(-60)"
|
||||
/>
|
||||
<polyline
|
||||
points="70,60 140,60 120,50 140,60 120,70 140,60"
|
||||
transform="translate(-15,200) rotate(-20)"
|
||||
/>
|
||||
<polyline
|
||||
points="70,60 140,60 120,50 140,60 120,70 140,60"
|
||||
transform="translate(25,170) rotate(20)"
|
||||
/>
|
||||
<polyline
|
||||
points="20,60 140,60 120,50 140,60 120,70 140,60"
|
||||
transform="translate(102,220) rotate(60)"
|
||||
/>
|
||||
<text x="40" y="250">?</text>
|
||||
</g>
|
||||
<g
|
||||
transform="translate(540, 0)"
|
||||
class="fragment"
|
||||
style="
|
||||
fill: rgba(200, 50, 50, 0);
|
||||
stroke-width: 4;
|
||||
stroke: rgba(200, 200, 200, 1);
|
||||
">
|
||||
<polyline
|
||||
points="20,60 140,60 120,50 140,60 120,70 140,60"
|
||||
transform="translate(102,30) rotate(60)"
|
||||
/>
|
||||
<polyline
|
||||
points="70,60 140,60 120,50 140,60 120,70 140,60"
|
||||
transform="translate(0,120) rotate(20)"
|
||||
/>
|
||||
<polyline
|
||||
points="70,60 140,60 120,50 140,60 120,70 140,60"
|
||||
transform="translate(-40,240) rotate(-20)"
|
||||
/>
|
||||
<polyline
|
||||
points="20,60 140,60 120,50 140,60 120,70 140,60"
|
||||
transform="translate(0,390) rotate(-60)"
|
||||
/>
|
||||
<image
|
||||
xlink:href="graphics/dagobert83-female-user-icon-800px.png"
|
||||
width="100" height="100"
|
||||
x="110" y="180"
|
||||
/>
|
||||
<text x="180" y="190">?</text>
|
||||
</g>
|
||||
|
||||
</svg>
|
||||
</section>
|
||||
|
||||
</div></div>
|
||||
|
||||
|
|
|
@ -12,7 +12,7 @@ UB-CSE is celebrating [50 years of Computer Science and Engineering](http://cse.
|
|||
The ODIn Lab will be showing up in force at the [CSE50 Undergraduate and Gradeuate conferences](https://engineering.buffalo.edu/computer-science-engineering/news-events/cse50.program.html).
|
||||
|
||||
* Lisa and Olivia will demo Mimir at the Undergraduate Event during the Welcome Reception on Thursday.
|
||||
* Poonam, Will, Aaron, and Lisa will present on <a href="/papers/2017/CSE50/mimir.pdf">Mimir</a> at the Graduate Poster Session on Saturday.
|
||||
* Poonam, Will, Aaron, Shivang, and Lisa will present on <a href="/papers/2017/CSE50/mimir.pdf">Mimir</a> at the Graduate Poster Session on Saturday.
|
||||
* Saurav and Darshana will present on <a href="/papers/2017/CSE50/jitds.pdf">JITDs</a> at the Graduate Poster Session on Saturday.
|
||||
* Duc, Ting, and Gokhan will present on <a href="/papers/2017/CSE50/insiderthreats.pdf">The Insider Threats project</a> at the Graduate Poster Session on Saturday.
|
||||
* Gourab, Gokhan, and Carl will present on <a href="papers/2017/CSE50/pocketdata.pgf">The PocketData project</a> at the Graduate Poster Session on Saturday.
|
||||
|
|
Loading…
Reference in a new issue