adding applicationsd
parent
64ee16c4ff
commit
86687bfab0
|
@ -324,6 +324,48 @@ In contrast, known approximation techniques in set-\abbrPDB\xplural are at most
|
|||
%\OK{Atri's (and most theoretician's) statements about complexity always need to be suffixed with ``to within a log factor''}
|
||||
%(iii) We finally observe that our results trivially extend to higher moments of the tuple multiplicity (instead of just the expectation).
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
\mypar{Applications}
|
||||
Probabilistic databases have been explored extensively for a variety of specialized tasks from probabilistic programming~\cite{DBLP:journals/tods/OlteanuS16}, to simulations~\cite{DBLP:conf/sigmod/GaoLPJ17,DBLP:conf/sigmod/CaiVPAHJ13}, and more.
|
||||
Especially notable is work on data cleaning~\cite{yang:2015:pvldb:lenses,DBLP:journals/vldb/SaRR0W0Z17,DBLP:journals/pvldb/RekatsinasCIR17,DBLP:journals/pvldb/BeskalesIG10}, where probabilistic databases facilitate queries over heuristically cleaned datasets.
|
||||
This work is particularly crucial, as larger and more noisy datasets make fully manual data cleaning progressively less feasible.
|
||||
As observed by Feng et. al. \cite{feng:2019:sigmod:uncertainty}, users of classical, deterministic databases are forced to choose between discarding potentially irrelevant data due to the cost of cleaning it properly, and ignoring the error and hoping that query outputs remain informative.
|
||||
Thanks to existing work on probabilistic data cleaning, probabilistic databases can be leveraged provide a convenient middle ground between these extremes, allowing users to avoid the upfront costs of data cleaning, while simultaneously receiving meaningful query outputs (or at least an output that indicates the need for manual cleaning).
|
||||
Unfortunately, probabilistic databases remain impractically slow (e.g., by orders of magnitude~\cite{feng:2019:sigmod:uncertainty}).
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Detecting data errors, outliers, or other corner cases is often easier than
|
||||
|
||||
|
||||
Manual validation of data is becoming increasingly intractable as data sizes grow.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Probabilistic databases are a powerful tool for managing data that,
|
||||
|
||||
|
||||
As data sizes grow and manual validation becomes more difficult, it becomes more critical than ever for tools to treat data unc
|
||||
|
||||
Probabilistic databases are a valuable tool for
|
||||
|
||||
|
||||
provide a
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
\mypar{Overview of our Techniques} All of our results rely on working with a {\em reduced} form of the lineage polynomial $\poly$. In fact, it turns out that for the \abbrTIDB (and \abbrBIDB) case, computing the expected multiplicity is {\em exactly} the same as evaluating this reduced polynomial over the probabilities that define the \abbrTIDB/\abbrBIDB. Next, we motivate this reduced polynomial.
|
||||
Consider the query $\query$ defined as follows over the bag relations of \Cref{fig:two-step}:
|
||||
\begin{lstlisting}
|
||||
|
|
92
main.bib
92
main.bib
|
@ -664,3 +664,95 @@ Maximilian Schleich},
|
|||
biburl = {https://dblp.org/rec/journals/siamcomp/AtseriasGM13.bib},
|
||||
bibsource = {dblp computer science bibliography, https://dblp.org}
|
||||
}
|
||||
|
||||
|
||||
|
||||
@article{DBLP:journals/vldb/SaRR0W0Z17,
|
||||
author = {Christopher De Sa and
|
||||
Alexander Ratner and
|
||||
Christopher R{\'{e}} and
|
||||
Jaeho Shin and
|
||||
Feiran Wang and
|
||||
Sen Wu and
|
||||
Ce Zhang},
|
||||
title = {Incremental knowledge base construction using DeepDive},
|
||||
journal = {{VLDB} J.},
|
||||
volume = {26},
|
||||
number = {1},
|
||||
pages = {81--105},
|
||||
year = {2017}
|
||||
}
|
||||
|
||||
|
||||
|
||||
@article{DBLP:journals/pvldb/RekatsinasCIR17,
|
||||
author = {Theodoros Rekatsinas and
|
||||
Xu Chu and
|
||||
Ihab F. Ilyas and
|
||||
Christopher R{\'{e}}},
|
||||
title = {HoloClean: Holistic Data Repairs with Probabilistic Inference},
|
||||
journal = {Proc. {VLDB} Endow.},
|
||||
volume = {10},
|
||||
number = {11},
|
||||
pages = {1190--1201},
|
||||
year = {2017}
|
||||
}
|
||||
|
||||
|
||||
|
||||
@article{DBLP:journals/pvldb/BeskalesIG10,
|
||||
author = {George Beskales and
|
||||
Ihab F. Ilyas and
|
||||
Lukasz Golab},
|
||||
title = {Sampling the Repairs of Functional Dependency Violations under Hard
|
||||
Constraints},
|
||||
journal = {Proc. {VLDB} Endow.},
|
||||
volume = {3},
|
||||
number = {1},
|
||||
pages = {197--207},
|
||||
year = {2010}
|
||||
}
|
||||
|
||||
|
||||
|
||||
@article{DBLP:journals/tods/OlteanuS16,
|
||||
author = {Dan Olteanu and
|
||||
Sebastiaan J. van Schaik},
|
||||
title = {ENFrame: {A} Framework for Processing Probabilistic Data},
|
||||
journal = {{ACM} Trans. Database Syst.},
|
||||
volume = {41},
|
||||
number = {1},
|
||||
pages = {3:1--3:44},
|
||||
year = {2016}
|
||||
}
|
||||
|
||||
|
||||
|
||||
@inproceedings{DBLP:conf/sigmod/GaoLPJ17,
|
||||
author = {Zekai J. Gao and
|
||||
Shangyu Luo and
|
||||
Luis Leopoldo Perez and
|
||||
Chris Jermaine},
|
||||
title = {The {BUDS} Language for Distributed Bayesian Machine Learning},
|
||||
booktitle = {{SIGMOD} Conference},
|
||||
pages = {961--976},
|
||||
publisher = {{ACM}},
|
||||
year = {2017}
|
||||
}
|
||||
|
||||
|
||||
|
||||
@inproceedings{DBLP:conf/sigmod/CaiVPAHJ13,
|
||||
author = {Zhuhua Cai and
|
||||
Zografoula Vagena and
|
||||
Luis Leopoldo Perez and
|
||||
Subramanian Arumugam and
|
||||
Peter J. Haas and
|
||||
Christopher M. Jermaine},
|
||||
title = {Simulation of database-valued markov chains using SimSQL},
|
||||
booktitle = {{SIGMOD} Conference},
|
||||
pages = {637--648},
|
||||
publisher = {{ACM}},
|
||||
year = {2013}
|
||||
}
|
||||
|
||||
|
|
Loading…
Reference in New Issue