abstract
This commit is contained in:
parent
8d8e369962
commit
51fcabfe5a
21
abstract.tex
21
abstract.tex
|
@ -2,17 +2,24 @@
|
||||||
%!TEX root=./main.tex
|
%!TEX root=./main.tex
|
||||||
\begin{abstract}
|
\begin{abstract}
|
||||||
% The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
|
% The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
|
||||||
The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) is arguably the most fundamental problem in set-PDBs.
|
The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) is a % arguably the most
|
||||||
|
fundamental problem in set-PDBs.
|
||||||
%can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
|
%can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
|
||||||
%The analog for bag semantics is a natural number-valued polynomial over random variables that evaluates to the multiplicity of the tuple in each world.
|
%The analog for bag semantics is a natural number-valued polynomial over random variables that evaluates to the multiplicity of the tuple in each world.
|
||||||
The analog for bag semantics is computing the expected multiplicity of a result tuple.
|
% The analog for bag semantics is computing the expected multiplicity of a result tuple.
|
||||||
%In this work, we study the problem of calculating the expectation of such polynomials (a tuple's expected multiplicity) exactly and approximately.
|
%In this work, we study the problem of calculating the expectation of such polynomials (a tuple's expected multiplicity) exactly and approximately.
|
||||||
In this work, we study the problem of a tuple's expected multiplicity exactly and approximately.
|
In this work, we study the analog problem for bag semantics: computing a tuple's expected multiplicity exactly and approximately.
|
||||||
We are specifically interested in the fine-grained complexity of this problem relative to the complexity of deterministic query evaluation --- if these complexities are comparable, it opens the door to practical deployment of probabilistic databases.
|
% Specifically, we are interested in the fine-grained complexity of computing this type of expectation based on a query result tuple's lineage polynomial which encodes how the tuple's multiplicity is computed based on the multiplicity of input tuples.
|
||||||
Unfortunately, we show the reverse; our results imply that computing expected multiplicities for Bag-PDB based on the results produced by such algorithms introduces super-linear overhead.
|
% Furthermore, we study how the complexity of this problem compares to
|
||||||
|
We are specifically
|
||||||
|
interested in the fine-grained complexity and how it compares to the complexity of deterministic query evaluation algorithms --- if these complexities are comparable, it opens the door to practical deployment of probabilistic databases.
|
||||||
|
Unfortunately, % we show the reverse;
|
||||||
|
our results imply that computing expected multiplicities for Bag-PDBs based on the results produced by such query evaluation algorithms introduces super-linear overhead.
|
||||||
% Such factorized representations are necessary to realize the performance of modern join algorithms (e.g., worst-case optimal joins), and so our results imply that a Bag-PDB doing exact computations (via these factorized representations) can never be as fast as a classical (deterministic) database.
|
% Such factorized representations are necessary to realize the performance of modern join algorithms (e.g., worst-case optimal joins), and so our results imply that a Bag-PDB doing exact computations (via these factorized representations) can never be as fast as a classical (deterministic) database.
|
||||||
The problem stays hard even if all input tuples have a fixed probability $\prob$ (s.t. $\prob \in (0,1)$).
|
% The problem stays hard even if
|
||||||
We proceed to study polynomials of result tuples of positive relational algebra queries ($\raPlus$) over TIDBs and for a non-trivial subclass of block-independent databases (BIDBs).
|
This is the case even if
|
||||||
|
all input tuples have a fixed probability $\prob$ (s.t. $\prob \in (0,1)$).\BG{Replace with this because notion of hardness unclear: This is the case even if \ldots}
|
||||||
|
We proceed to study how approximate multiplicities using lineage polynomials of result tuples of positive relational algebra queries ($\raPlus$) over TIDBs and for a non-trivial subclass of block-independent databases (BIDBs).
|
||||||
We develop a sampling algorithm that computes a $1 \pm \epsilon$-approximation of the expected multiplicity of an output tuple in linear time in the runtime of a comparable deterministic query.
|
We develop a sampling algorithm that computes a $1 \pm \epsilon$-approximation of the expected multiplicity of an output tuple in linear time in the runtime of a comparable deterministic query.
|
||||||
% By removing Bag-PDB's reliance on the sum-of-products representation of polynomials, this result paves the way for future work on PDBs that are competitive with deterministic databases.
|
% By removing Bag-PDB's reliance on the sum-of-products representation of polynomials, this result paves the way for future work on PDBs that are competitive with deterministic databases.
|
||||||
\end{abstract}
|
\end{abstract}
|
||||||
|
|
|
@ -151,9 +151,9 @@ We stress that this question is very well motivated, even for one of the simples
|
||||||
Lower bound on $\timeOf{}^*(\query,\pdb)$ & Num. $\pd$s & Hardness Assumption\\
|
Lower bound on $\timeOf{}^*(\query,\pdb)$ & Num. $\pd$s & Hardness Assumption\\
|
||||||
\hline
|
\hline
|
||||||
$\Omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{1+\eps_0}}$ for {\em some} $\eps_0>0$ & Single & Triangle Detection hypothesis\\
|
$\Omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{1+\eps_0}}$ for {\em some} $\eps_0>0$ & Single & Triangle Detection hypothesis\\
|
||||||
\hline
|
%\hline
|
||||||
$\omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{C_0}}$ for {\em all} $C_0>0$ & Multiple &$\sharpwzero\ne\sharpwone$\\
|
$\omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{C_0}}$ for {\em all} $C_0>0$ & Multiple &$\sharpwzero\ne\sharpwone$\\
|
||||||
\hline
|
%\hline
|
||||||
$\Omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{c_0\cdot k}}$ for {\em some} $c_0>0$ & Multiple & Current $k$-matching algorithms\\
|
$\Omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{c_0\cdot k}}$ for {\em some} $c_0>0$ & Multiple & Current $k$-matching algorithms\\
|
||||||
\hline
|
\hline
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
|
|
Loading…
Reference in a new issue