Adding some summaries to the probabilistic database bibliography
parent
87f3539c9f
commit
e771c1f076
|
@ -1,49 +1,94 @@
|
|||
(still in development)
|
||||
|
||||
## Summary
|
||||
|
||||
Probabilistic databases revolve around so-called possible worlds semantics. An uncertain database is actually a set of possible databases, also termed possible worlds. To run a query on the uncertain database, you run the query (in principle) in each of the possible worlds, and you get a set of possible answers. Define a probability distribution over the possible worlds, and you get a distribution over the possible answers; This pair of an uncertain database and a probability distribution is what is commonly referred to as a probabilistic database.
|
||||
|
||||
Uncertain data typically appears in one of three forms: Row-level uncertainty, where a tuple may or may not be in a given table, Attribute-level uncertainty, where the exact value of an attribute is not known, and Open-world uncertainty, where the set of tuples in a given result table is not known ahead of time. Note the distinction between row-level and open-world uncertainty: With the former, you can describe precisely, and in a finite way which tuples could be in the table, while with the latter you can not.
|
||||
|
||||
## Surveys
|
||||
|
||||
* [Morgan & Claypool: Probabilistic Databases](http://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016)
|
||||
A solid, theory-focused survey of techniques for probabilistic databases. A good starting point for anyone working in the space.
|
||||
|
||||
## Formal Systems for Incomplete Information
|
||||
|
||||
#### C-Tables
|
||||
|
||||
The C-Tables data representation is a way to compactly encode row- and attribute-level uncertainty in a classical deterministic database. The idea is to allow the use of labeled nulls, and to tag each row in an uncertain relation with a so-called 'local condition' The local condition is a boolean formula who's atoms are comparisons over the labeled nulls. Every possible assignment of values to the local nulls defines a possible world; Rows are only part of possible worlds who's valuation causes the row's local condition to evaluate to true.
|
||||
|
||||
* [On representing incomplete information in a relational data base](http://dl.acm.org/citation.cfm?id=1286869)
|
||||
|
||||
#### Three-Valued Logic
|
||||
|
||||
Somewhat related to uncertainty is a concept called three-valued logic, which extends classical boolean logic's True and False with a third "Unknown" value. This is the type of logic used in SQL when NULL values appear in queries, and it has the potential to mess all sorts of things up.
|
||||
|
||||
* [SQL’s Three-Valued Logic and Certain Answers](http://homepages.inf.ed.ac.uk/libkin/papers/icdt15.pdf)
|
||||
|
||||
## Probabilistic Database Systems
|
||||
|
||||
* MCDB (UFL/Rice/IBM)
|
||||
* [MCDB: a monte carlo approach to managing uncertain data](http://dl.acm.org/citation.cfm?id=1376686)
|
||||
* [MCDB-R: risk analysis in the database](http://dl.acm.org/citation.cfm?id=1920941)
|
||||
* MayBMS (Cornell)
|
||||
* [MayBMS: Managing incomplete information with probabilistic world-set decompositions](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4221832)
|
||||
* [10^(10^6) worlds and beyond: efficient representation and processing of incomplete information](http://dl.acm.org/citation.cfm?id=1644253)
|
||||
* [MayBMS: a probabilistic database management system](http://dl.acm.org/citation.cfm?id=1559984)
|
||||
* A Compositional Query Algebra for Second-Order Logic and Uncertain Databases
|
||||
* On Query Algebras for Probabilistic Databases
|
||||
* On APIs for Probabilistic Databases
|
||||
* From Complete to Incomplete Information and Back
|
||||
* DEMO: [Query language support for incomplete information in the MayBMS system](http://dl.acm.org/citation.cfm?id=1326031)
|
||||
* BOOK CHAPTER: [MayBMS: A system for managing large uncertain and probabilistic databases](http://link.springer.com/content/pdf/10.1007/978-0-387-09690-2.pdf#page=166)
|
||||
* MANUAL: [MayBMS: A Probabilistic Database System.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.147.1226&rep=rep1&type=pdf)
|
||||
* Pip (Cornell)
|
||||
* [PIP: A database system for great and small expectations](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5447879)
|
||||
* MystiQ (UWash)
|
||||
* [Efficient query evaluation on probabilistic databases](http://link.springer.com/article/10.1007/s00778-006-0004-3)
|
||||
* Orion (UMD)
|
||||
* PrDB (UMD)
|
||||
* PrDB: Managing and Exploiting Rich Correlations in Probabilistic Databases.
|
||||
* Lineage Processing Over Correlated Probabilistic Databases
|
||||
* Trio (Stanford)
|
||||
* Sprout (Oxford)
|
||||
* Approximate Confidence Computation in Probabilistic Databases
|
||||
* SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases
|
||||
* A dichotomy for non-repeating queries with negation in probabilistic databases
|
||||
* Anytime Approximation in Probabilistic Databases
|
||||
* Aggregates in Probabilistic Databases via Knowledge Compilation
|
||||
* Ranking in Probabilistic Databases: Complexity and Efficient Algorithms
|
||||
#### MCDB
|
||||
(UFL/Rice/IBM)
|
||||
|
||||
MCDB, or the Monte-Carlo Data Base introduced to probabilistic databases the idea of describing a probability distribution by using a function that can compute a sample from the distribution. VG-Functions are table-generating functions that output a random sample from the table's possible worlds. MCDB processes queries by (1) generating a set of sampled possible worlds, (2) factorizing the possible worlds into a more compact representation, and (3) running the query over each of those possible worlds (conceptually) in parallel. Conveniently, the factorized representation also admits a more efficient query evaluation strategy. The main advantage of this approach is that it's simple and expressive: If you can generate samples of an uncertain table, you can use MCDB over it. In contrast to many of the other systems here, MCDB can support open-world uncertainty. Sampling upfront, however, can limit the accuracy of the query results, particularly if you have an extremely selective filtering predicate over the data.
|
||||
|
||||
* [MCDB: a monte carlo approach to managing uncertain data](http://dl.acm.org/citation.cfm?id=1376686)
|
||||
* [MCDB-R: risk analysis in the database](http://dl.acm.org/citation.cfm?id=1920941)
|
||||
|
||||
#### MayBMS
|
||||
(Cornell)
|
||||
|
||||
The central idea behind MayBMS is a practical implementation of Probabilistic C-Tables called U-Relations (in fact, it's not uncommon to discuss U-Relations, calling them C-Tables). The idea is to avoid labeled nulls (which most databases do not support) and instead focus entirely on row-level uncertainty. As it turns out, if you're considering only finite, discrete (i.e., categorical) distributions, row-level uncertainty can encode attribute level uncertainty as well. By further limiting condition columns to conjunctions of boolean equalities (which is still sufficient to capture a significant class of queries), MayBMS can use a classical deterministic database engine to evaluate probabilistic queries.
|
||||
|
||||
* [MayBMS: Managing incomplete information with probabilistic world-set decompositions](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4221832)
|
||||
* [10^(10^6) worlds and beyond: efficient representation and processing of incomplete information](http://dl.acm.org/citation.cfm?id=1644253)
|
||||
* [MayBMS: a probabilistic database management system](http://dl.acm.org/citation.cfm?id=1559984)
|
||||
* [A Compositional Query Algebra for Second-Order Logic and Uncertain Databases](http://dl.acm.org/citation.cfm?id=1514911)
|
||||
* [On Query Algebras for Probabilistic Databases](http://dl.acm.org/citation.cfm?id=1519116)
|
||||
* [On APIs for Probabilistic Databases](http://infoscience.epfl.ch/record/166848/files/20-mud2008.pdf)
|
||||
* [From Complete to Incomplete Information and Back](http://dl.acm.org/citation.cfm?id=1247559)
|
||||
* DEMO: [Query language support for incomplete information in the MayBMS system](http://dl.acm.org/citation.cfm?id=1326031)
|
||||
* BOOK CHAPTER: [MayBMS: A system for managing large uncertain and probabilistic databases](http://link.springer.com/content/pdf/10.1007/978-0-387-09690-2.pdf#page=166)
|
||||
* MANUAL: [MayBMS: A Probabilistic Database System.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.147.1226&rep=rep1&type=pdf)
|
||||
|
||||
#### Pip
|
||||
(Cornell)
|
||||
|
||||
* [PIP: A database system for great and small expectations](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5447879)
|
||||
|
||||
#### MystiQ
|
||||
(UWash)
|
||||
* [Efficient query evaluation on probabilistic databases](http://link.springer.com/article/10.1007/s00778-006-0004-3)
|
||||
|
||||
#### Orion
|
||||
(UMD)
|
||||
|
||||
#### PrDB
|
||||
(UMD)
|
||||
* PrDB: Managing and Exploiting Rich Correlations in Probabilistic Databases.
|
||||
* Lineage Processing Over Correlated Probabilistic Databases
|
||||
|
||||
#### Trio
|
||||
(Stanford)
|
||||
|
||||
#### Sprout
|
||||
(Oxford)
|
||||
* Approximate Confidence Computation in Probabilistic Databases
|
||||
* SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases
|
||||
* A dichotomy for non-repeating queries with negation in probabilistic databases
|
||||
* Anytime Approximation in Probabilistic Databases
|
||||
* Aggregates in Probabilistic Databases via Knowledge Compilation
|
||||
* Ranking in Probabilistic Databases: Complexity and Efficient Algorithms
|
||||
* Orchestra (UPenn)
|
||||
* Mimir (UBuff)
|
||||
* [On-Demand Query Result Cleaning](http://www.vldb.org/2014/phd_workshop.proceedings_files/Camera-Ready%20Papers/Paper%201283/p1283-Yang.pdf)
|
||||
* [Detecting the Temporal Context of Queries](http://link.springer.com/chapter/10.1007/978-3-662-46839-5_7)
|
||||
* [Lenses: an on-demand approach to ETL](http://dl.acm.org/citation.cfm?id=2824055)
|
||||
* Jigsaw (Cornell/Microsoft)
|
||||
* [Jigsaw: Efficient optimization over uncertain enterprise data](http://dl.acm.org/citation.cfm?id=1989410)
|
||||
* DEMO: [Fuzzy prophet: parameter exploration in uncertain enterprise scenarios](http://dl.acm.org/citation.cfm?id=1989482)
|
||||
|
||||
#### Mimir
|
||||
(UBuff)
|
||||
* [On-Demand Query Result Cleaning](http://www.vldb.org/2014/phd_workshop.proceedings_files/Camera-Ready%20Papers/Paper%201283/p1283-Yang.pdf)
|
||||
* [Detecting the Temporal Context of Queries](http://link.springer.com/chapter/10.1007/978-3-662-46839-5_7)
|
||||
* [Lenses: an on-demand approach to ETL](http://dl.acm.org/citation.cfm?id=2824055)
|
||||
|
||||
#### Jigsaw (Cornell/Microsoft)
|
||||
* [Jigsaw: Efficient optimization over uncertain enterprise data](http://dl.acm.org/citation.cfm?id=1989410)
|
||||
* DEMO: [Fuzzy prophet: parameter exploration in uncertain enterprise scenarios](http://dl.acm.org/citation.cfm?id=1989482)
|
||||
|
||||
|
||||
## Model Database Systems
|
||||
|
@ -72,11 +117,6 @@
|
|||
* [Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases](http://dl.acm.org/citation.cfm?id=1989411)
|
||||
* [Lenses: An On-Demand Approach to ETL](http://dl.acm.org/citation.cfm?id=2824055)
|
||||
|
||||
## Formal Systems for Incomplete Information
|
||||
|
||||
* [On representing incomplete information in a relational data base](http://dl.acm.org/citation.cfm?id=1286869)
|
||||
* [SQL’s Three-Valued Logic and Certain Answers](http://homepages.inf.ed.ac.uk/libkin/papers/icdt15.pdf)
|
||||
|
||||
## Information Fusion Systems
|
||||
|
||||
* [A Methodology to Evaluate Important Dimensions of Information Quality in Systems](http://dl.acm.org/citation.cfm?id=2744205)
|
Loading…
Reference in New Issue