Students | |||||
---|---|---|---|---|---|
Poonam |
Will |
Aaron |
Shivang |
Lisa |
Olivia |
Alumni | ||
---|---|---|
Ying |
Niccolò |
Arindam |
Dev |
---|
Mike |
External Collaborators | |||
---|---|---|---|
Dieter Gawlick (Oracle) |
Zhen Hua Liu (Oracle) |
Ronny Fehling (Airbus) |
Beda Hammerschmidt (Oracle) |
Boris Glavic (IIT) |
Juliana Freire (NYU) |
Wolfgang Gatterbauer (NEU) |
Heiko Mueller (NYU) |
Remi Rampin (NYU) |
SELECT
on a raw CSV FileState of the art: External Table Defn + "Manually" edit CSV
UNION
two data sourcesState of the art: Manually map schema
SELECT
on JSON or a Doc Store{ A: "Bob", B: "Alice" }
)State of the art: DataGuide, Wrangler, etc...
Alice spends weeks cleaning her data before using it.
My phone is guessing, but is letting me know that it did
Good Explanations, Alternatives, and Feedback Vectors
Incomplete and Probabilistic Databases
have existed since the 1980s
We've gotten good at query processing on uncertain data.
But not at "sourcing" uncertain data
... or communicating results.
A small shift in how we think about PDBs addresses all three points.
Time | Sensor Reading | Temp Around Sensor |
---|---|---|
1 | 31.6 | Roughly 31.6˚C |
2 | -999 | Around 30˚C? |
4 | 28.1 | Roughly 28.1˚C? |
3 | 32.2 | Roughly 32.2˚C |
The reading is deterministic
... but what we care about is what the reading measures
Insight: Treat data as 100% deterministic.
Instead, queries propose alternative interpretations.
Introduce Best-Guess queries and the idea of explanations. Key points:
Optimizing sampling-based query evaluation