paper-BagRelationalPDBsAreHard/abstract.tex

%root: main.tex
%!TEX root=./main.tex
\begin{abstract}
%  The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
  The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) is a % arguably the most
  fundamental problem in set-PDBs.
%can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
  %The analog for bag semantics is a natural number-valued polynomial over random variables that evaluates to the multiplicity of the tuple in each world.
  % The analog for bag semantics is computing the expected multiplicity of a result tuple.
  %In this work, we study the problem of calculating the expectation of such polynomials (a tuple's expected  multiplicity) exactly and approximately.
  In this work, we study the analog problem for bag semantics: computing a tuple's expected  multiplicity exactly and approximately.
% Specifically, we are interested in the fine-grained complexity of computing this type of expectation based on a query result tuple's lineage polynomial which encodes how the tuple's multiplicity is computed based on the multiplicity of input tuples.
% Furthermore, we study how the complexity of this problem compares to
  We are specifically
   interested in the fine-grained complexity and how it compares to the complexity of deterministic query evaluation algorithms --- if these complexities are comparable, it opens the door to practical deployment of probabilistic databases.
  Unfortunately, % we show the reverse;
  our results imply that computing expected multiplicities for Bag-PDBs based on the results produced by such query evaluation algorithms introduces super-linear overhead (under parameterized complexity hardness assumptions/conjectures).
  % Such factorized representations are necessary to realize the performance of modern join algorithms (e.g., worst-case optimal joins), and so our results imply that a Bag-PDB doing exact computations (via these factorized representations) can never be as fast as a classical (deterministic) database.
  % The problem stays hard even if
%  This is the case even if
%all input tuples have a fixed probability $\prob$ (s.t. $\prob \in (0,1)$).\BG{Replace with this because notion of hardness unclear: This is the case even if \ldots}
%Atri: Fair enough: droppped.
  %We proceed to study how approximate multiplicities using lineage polynomials of result tuples of positive relational algebra queries ($\raPlus$) over TIDBs and for a non-trivial subclass of block-independent databases (BIDBs).
  We proceed to study approximation of expected multiplicities of result tuples of positive relational algebra queries ($\raPlus$) over \AHchange{\abbrCTIDB\xplural} and for a non-trivial subclass of block-independent databases (\abbrBIDB\xplural).
  We develop a sampling algorithm that computes a $(1 \pm \epsilon)$-approximation of the expected multiplicity of an output tuple in time linear in the runtime of a comparable deterministic query for any $\raPlus$ query.
  % By removing Bag-PDB's reliance on the sum-of-products representation of polynomials, this result paves the way for future work on PDBs that are competitive with deterministic databases.
\end{abstract}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: