19 lines
1.9 KiB
TeX
19 lines
1.9 KiB
TeX
%root: main.tex
|
|
%!TEX root=./main.tex
|
|
\begin{abstract}
|
|
The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
|
|
The analog for bag semantics is a natural number-valued polynomial over random variables that evaluates to the multiplicity of the tuple in each world.
|
|
In this work, we study the problem of calculating the expectation of such polynomials (a tuple's expected multiplicity) exactly and approximately.
|
|
For tuple-independent databases (TIDBs), the expected multiplicity of a query result tuple can trivially be computed in linear time in the size of the tuple's lineage, if this polynomial is encoded as a sum of products.
|
|
However, using a reduction from the problem of counting $k$-matchings, we demonstrate that calculating the expectation is \sharpwonehard when the polynomial is compressed, for example through factorization.
|
|
As we show, this result has a significant implication: a Bag-PDB doing exact computations will never be as fast as a classical (deterministic) database.
|
|
The problem stays hard even for polynomials generated by conjunctive queries (CQs) if all input tuples have a fixed probability $\prob$ (s.t. $\prob \in (0,1)$).
|
|
We proceed to study polynomials of result tuples of union of conjunctive queries (UCQs) over TIDBs and for a non-trivial subclass of block-independent databases (BIDBs). We develop a sampling algorithm that computes a $1 \pm \epsilon$-approximation of the expectation of such polynomials in linear time in the size of the polynomial, paving the way for PDBs to be competitive with deterministic databases.
|
|
|
|
\end{abstract}
|
|
|
|
%%% Local Variables:
|
|
%%% mode: latex
|
|
%%% TeX-master: "main"
|
|
%%% End:
|