paper-BagRelationalPDBsAreHard/abstract.tex

33 lines
3.6 KiB
TeX
Raw Normal View History

2020-12-02 16:30:42 -05:00
%root: main.tex
2020-12-15 17:26:40 -05:00
%!TEX root=./main.tex
2020-12-04 13:14:12 -05:00
\begin{abstract}
2021-09-13 17:55:54 -04:00
% The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
2021-09-17 13:39:58 -04:00
The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) is a % arguably the most
fundamental problem in set-PDBs.
2021-09-13 17:55:54 -04:00
%can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
%The analog for bag semantics is a natural number-valued polynomial over random variables that evaluates to the multiplicity of the tuple in each world.
2021-09-17 13:39:58 -04:00
% The analog for bag semantics is computing the expected multiplicity of a result tuple.
2021-09-13 17:55:54 -04:00
%In this work, we study the problem of calculating the expectation of such polynomials (a tuple's expected multiplicity) exactly and approximately.
2021-09-17 13:39:58 -04:00
In this work, we study the analog problem for bag semantics: computing a tuple's expected multiplicity exactly and approximately.
% Specifically, we are interested in the fine-grained complexity of computing this type of expectation based on a query result tuple's lineage polynomial which encodes how the tuple's multiplicity is computed based on the multiplicity of input tuples.
% Furthermore, we study how the complexity of this problem compares to
We are specifically
interested in the fine-grained complexity and how it compares to the complexity of deterministic query evaluation algorithms --- if these complexities are comparable, it opens the door to practical deployment of probabilistic databases.
Unfortunately, % we show the reverse;
2021-09-17 23:46:17 -04:00
our results imply that computing expected multiplicities for Bag-PDBs based on the results produced by such query evaluation algorithms introduces super-linear overhead (under parameterized complexity hardness assumptions/conjectures).
2021-04-10 13:59:17 -04:00
% Such factorized representations are necessary to realize the performance of modern join algorithms (e.g., worst-case optimal joins), and so our results imply that a Bag-PDB doing exact computations (via these factorized representations) can never be as fast as a classical (deterministic) database.
2021-09-17 13:39:58 -04:00
% The problem stays hard even if
2021-09-17 23:46:17 -04:00
% This is the case even if
%all input tuples have a fixed probability $\prob$ (s.t. $\prob \in (0,1)$).\BG{Replace with this because notion of hardness unclear: This is the case even if \ldots}
%Atri: Fair enough: droppped.
%We proceed to study how approximate multiplicities using lineage polynomials of result tuples of positive relational algebra queries ($\raPlus$) over TIDBs and for a non-trivial subclass of block-independent databases (BIDBs).
2022-01-11 11:35:45 -05:00
We proceed to study approximation of expected multiplicities of result tuples of positive relational algebra queries ($\raPlus$) over \AHchange{\abbrCTIDB\xplural} and for a non-trivial subclass of block-independent databases (\abbrBIDB\xplural).
2021-09-20 15:32:48 -04:00
We develop a sampling algorithm that computes a $(1 \pm \epsilon)$-approximation of the expected multiplicity of an output tuple in time linear in the runtime of a comparable deterministic query for any $\raPlus$ query.
2021-09-12 23:44:44 -04:00
% By removing Bag-PDB's reliance on the sum-of-products representation of polynomials, this result paves the way for future work on PDBs that are competitive with deterministic databases.
2020-12-04 13:14:12 -05:00
\end{abstract}
2020-12-11 19:50:53 -05:00
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: