Update ReadingList Probabilistic DBs

master
Oliver Kennedy 2019-02-21 12:05:00 -05:00
parent 24192057e6
commit 054370dcc2
1 changed files with 9 additions and 2 deletions

@ -159,7 +159,6 @@ Jigsaw is a variant of MCDB. The underlying system implementation basically fol
* [Jigsaw: Efficient optimization over uncertain enterprise data](http://dl.acm.org/citation.cfm?id=1989410)
* DEMO: [Fuzzy prophet: parameter exploration in uncertain enterprise scenarios](http://dl.acm.org/citation.cfm?id=1989482)
Querying Machine Learning Models
-----------------------------------------------------
Virtually all probabilistic database systems adopt a data model based on tuples. A number of efforts have come up looking at how to use similar techniques to directly query data defined by a graphical model and/or how to represent graphical models in a database.
@ -305,4 +304,12 @@ Classically, PDBs assume that you come to them with data already annotated with
#### BetaPDBs
A popular model for probabilistic databases is called the Tuple-Independent model (creating TI-PDBs for short). Tuple-independent probabilistic databases annotate each input tuple with a Bernoulli-distributed random variable. That is, we assume that each row of the input data is effectively present according to a random coin-flip. In a Beta-PDB, this is instead a Beta-Bernoulli distribution. It's still a coin flip, but the bias of the coin comes from training data given by two parameters (typically called a, b). Naively, these parameters represent samples: You flip a coin a+b times, and it comes up with a heads, that corresponds to a beta-distribution with parameters a, b. Propagating this training data through queries turns out to be surprisingly harder, which is the subject of this paper.
* http://odin.cse.buffalo.edu/papers/2017/SIGMOD-BetaPDBs-final.pdf
* http://odin.cse.buffalo.edu/papers/2017/SIGMOD-BetaPDBs-final.pdf
#### PayGo
A Graph-ish database with missing values that prioritizes triples for cleaning based on the anticipated number of added results the triple could produce.
* [Pay-as-you-go user feedback for dataspace systems](https://dl.acm.org/citation.cfm?id=1376701)