paper-BagRelationalPDBsAreHard/related-work.tex

\section{Related Work}\label{sec:related-work}

In addition to work on probabilistic databases, our work has connections to work on compact representations of polynomials and relies on past work in fine-grained complexity which we review in \Cref{sec:compr-repr-polyn} and \Cref{sec:param-compl}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Probabilistic Databases}\label{sec:prob-datab}

Probabilistic databases have been studied predominantly under set semantics.
A multitude of probabilistic data models have been proposed for encoding a probabilistic database more compactly than as its set of possible worlds. Tuple-independent databases~ consist of a classical database where each tuple associated with a probability and tuples are treated as independent probabilistic events. In spite of the inability to encode correlations, \tis have received much attention, because it was shown that any finite probabilistic database can be encode as a \ti and a set of constraints that ``condition'' the \ti~\cite{VS17}. Block-independent databases (\bis) generalize \tis by partitioning the input into blocks where tuples within each block as disjoint events and blocks are independent~\cite{RS07,BS06}. \emph{PC-tables}~\cite{GT06} pair a C-table~\cite{IL84a} with probability distribution for each of its variables. This is similar to the $\semNX$-PDBs we use here, except that we do not allow for variables as attribute values and instead of local conditions which are propositional formulas which may contain comparisons, we associate tuples with polynomials $\semNX$.

Approaches for probabilistic query processing, i.e., computing the marginal probability for each result tuple of a query over a probabilistic database, fall into two broad categories. \emph{Intensional} (or \emph{grounded}) query evaluation approaches compute the \emph{lineage} of a tuple which is a Boolean formula encoding the provenance of the tuple and then compute the probability of the lineage formula. In this paper we also focus on intensional query evaluation, but use polynomials instead of boolean formulas to deal with multisets. It is a well-known fact that computing the probability of a tuple in the result of a query over a probabilistic database (the \emph{marginal probability of a tuple}) is \sharpphard which can be proven through a reduction from weighted model counting~\cite{provan-83-ccccptg,valiant-79-cenrp} using the fact the the probability of a tuple's lineage formula is equal to the marginal probability of the tuple. The second category, \emph{extensional} query evaluation, avoids calculating the lineage. This approach is in \ptime, but is limited to certain classes of queries. Dalvi et al.~\cite{DS12} proved a dichotomy for unions of conjunctive queries (UCQs): for any UCQ the probabilistic query evaluation problem is either \sharpphard or \ptime. Olteanu et al.~\cite{FO16} presented dichotomies for two classes of queries with negation, R\'e et al~\cite{RS09b} present a trichotomy for HAVING queries. Amarilli et al. investigated tractable classes of databases for more complex queries~\cite{AB15,AB15c}. Another line of work, studies which structural properties of lineage formulas lead to tractable cases~\cite{kenig-13-nclexpdc,roy-11-f,sen-10-ronfqevpd}.

Several techniques for approximating the probability of a query result tuple have been proposed in related work~\cite{FH13,heuvel-19-anappdsd,DBLP:conf/icde/OlteanuHK10,DS07,re-07-eftqevpd}. These approaches either rely on Monte Carlo sampling, e.g., \cite{DS07,re-07-eftqevpd}, or a branch-and-bound paradigm~\cite{DBLP:conf/icde/OlteanuHK10,fink-11}. The approximation algorithm for bag expectation we present in this work is based on sampling.

Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (this data model is referred to as \emph{pvc-tables}). As an extension of K-relations, this approach supports bags. Probabilities are computed using a decomposition approach~\cite{DBLP:conf/icde/OlteanuHK10} over the symbolic expressions that are used as tuple annotations and values in pvc-tables. \cite{FH12} identifies a tractable class of queries involving aggregation. In contrast, we study a less general data model and query class, but provide a linear time approximation algorithm and provide new insights into the complexity of computing expectation (while \cite{FH12} computes probabilities for individual output annotations).

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: