trimmed the abstract

This commit is contained in:
Atri Rudra 2021-09-13 17:55:54 -04:00
parent a5667ee7d2
commit 656027c3c9

View file

@ -1,11 +1,15 @@
%root: main.tex %root: main.tex
%!TEX root=./main.tex %!TEX root=./main.tex
\begin{abstract} \begin{abstract}
The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds. % The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
The analog for bag semantics is a natural number-valued polynomial over random variables that evaluates to the multiplicity of the tuple in each world. The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) is arguably the most fundamental problem in set-PDBs.
In this work, we study the problem of calculating the expectation of such polynomials (a tuple's expected multiplicity) exactly and approximately. %can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
%The analog for bag semantics is a natural number-valued polynomial over random variables that evaluates to the multiplicity of the tuple in each world.
The analog for bag semantics is computing the expected multiplicity of a result tuple.
%In this work, we study the problem of calculating the expectation of such polynomials (a tuple's expected multiplicity) exactly and approximately.
In this work, we study the problem of a tuple's expected multiplicity exactly and approximately.
We are specifically interested in the fine-grained complexity of this problem relative to the complexity of deterministic query evaluation --- if these complexities are comparable, it opens the door to practical deployment of probabilistic databases. We are specifically interested in the fine-grained complexity of this problem relative to the complexity of deterministic query evaluation --- if these complexities are comparable, it opens the door to practical deployment of probabilistic databases.
Unfortunately, we show the reverse; our results imply that computing probabilities for Bag-PDB based on the results produced by such algorithms introduces super-linear overhead. Unfortunately, we show the reverse; our results imply that computing expected multiplicities for Bag-PDB based on the results produced by such algorithms introduces super-linear overhead.
% Such factorized representations are necessary to realize the performance of modern join algorithms (e.g., worst-case optimal joins), and so our results imply that a Bag-PDB doing exact computations (via these factorized representations) can never be as fast as a classical (deterministic) database. % Such factorized representations are necessary to realize the performance of modern join algorithms (e.g., worst-case optimal joins), and so our results imply that a Bag-PDB doing exact computations (via these factorized representations) can never be as fast as a classical (deterministic) database.
The problem stays hard even if all input tuples have a fixed probability $\prob$ (s.t. $\prob \in (0,1)$). The problem stays hard even if all input tuples have a fixed probability $\prob$ (s.t. $\prob \in (0,1)$).
We proceed to study polynomials of result tuples of positive relational algebra queries ($\raPlus$) over TIDBs and for a non-trivial subclass of block-independent databases (BIDBs). We proceed to study polynomials of result tuples of positive relational algebra queries ($\raPlus$) over TIDBs and for a non-trivial subclass of block-independent databases (BIDBs).