diff --git a/slides/talks/2017-5-Tour-Mimir/index.html b/slides/talks/2017-5-Tour-Mimir/index.html
index ce3af00e..ee5acf71 100644
--- a/slides/talks/2017-5-Tour-Mimir/index.html
+++ b/slides/talks/2017-5-Tour-Mimir/index.html
@@ -290,7 +290,8 @@
Loading requires curation... Alice spends weeks cleaning her data before using it. Alice spends weeks curating her data before using it. The data needs...
+ Data Cleaning is Hard!
+ Data Curation is Hard!
Relational databases make this worse...
+
+
+
This is all required upfront. Before asking a single question.
Relational DBs are useless in early stages of curation.
+There are tons of good heuristics available for guessing how to clean data.
++ Thou shalt not give the user a wrong answer. +
+
We've gotten good at query processing on uncertain data.
- But not at "sourcing" uncertain data
- ... or communicating results.
+ But not sourcing uncertain data
+ ... or communicating results to humans.
A small shift in how we think about PDBs addresses all three points.
@@ -948,18 +966,18 @@R | | A | B |
---|---|---|
R | A | B |
| | 1 | 2 |
| | 3 | 4 |
| | 5 | 4 |
1 | 2 | |
3 | 4 | |
5 | 4 |
A | C | |
---|---|---|
1 | $X_2$ | |
3 | $X_4$ | |
5 | $X_4$ | |
Q(R) | A | C |
1 | $X_2$ | |
3 | $X_4$ | |
5 | $X_4$ |
How much of my query result is affected by unvalidated variables?
Idea: Mark values in query results that depend on unvalidated variables.
@@ -1360,7 +1378,8 @@ CREATE VIEW R_CLEANED ASWhich variables affect my query results?
Idea: Static dependency analysis produces a list of variable families and queries to generate all relevant indexes.
How bad is the situation?
Idea: Sample from the space of alternatives to...
| ➔ |
|
| ➔ |
|