Learned Index Structures due Weds (1 week)
Lots of devices with...
Not all data sources are created equal.
Even within one data set, some data may be more trustworthy than others.
How do you train a classifier/neural net/markov model/etc... on mixed-quality data?
Problem: Usually easier to "fix" than to label missing data.
But what if the data is already labeled!
Ideally the model is interpretable as well.
Have a question?
Most people will give you a bad answer.
A few will give you a bad answer.
The average of a bunch of bad answers and a few good answers is a good answer?
Problem: Often there is a very large number of possible worlds.
Solution: Break down possible worlds by choices.
Question: Which choices have the biggest impact on a query result?
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases.
Kanagal, Li, Deshpande (SIGMOD 2011)
Tracing data errors with view-conditioned causality
Meliou, Gatterbauer, Nath, Suciu (SIGMOD 2011)
Unit of Choice: Is a tuple (fact) in the source data or not?
Let queries call a nondeterministic "choice" function that decides which "world" to visit.
SELECT CASE VGTerm("A", ROWID) WHEN 1 THEN "FOO"
ELSE "BAR"
END AS A, Input.*
FROM Input;
VGTerm("A", ROWID) generates a separate value for each row.