paper-BagRelationalPDBsAreHard/related-work.tex

49 lines
4.9 KiB
TeX

%!TEX root=./main.tex
\section{Related Work}\label{sec:related-work}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\subsection{Probabilistic Databases}\label{sec:prob-datab}
\textbf{Probabilistic Databases} (PDBs) have been studied predominantly for set semantics.
Many data models have been proposed for encoding PDBs more compactly than as sets of possible worlds.
These include tuple-independent databases~\cite{VS17} (\tis), block-independent databases (\bis)~\cite{RS07}, and \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}
with probability distribution over its variables.
This is similar to our $\semNX$-PDBs, with Boolean expressions instead of polynomials.
% Tuple-independent databases (\tis) consist of a classical database where each tuple associated with a probability and tuples are treated as independent probabilistic events.
% While unable to encode correlations directly, \tis are popular because any finite probabilistic database can be encoded as a \ti and a set of constraints that ``condition'' the \ti~\cite{VS17}.
% Block-independent databases (\bis) generalize \tis by partitioning the input into blocks of disjoint tuples, where blocks are independent~\cite{RS07}. %,BS06
% \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}
% with probability distribution over its variables. This is similar to our $\semNX$-PDBs, except that we do not allow for variables as attribute values and instead of local conditions (propositional formulas that may contain comparisons), we associate tuples with polynomials $\semNX$.
Approaches for probabilistic query processing (i.e., computing marginal probabilities for tuples), fall into two broad categories.
\emph{Intensional} (or \emph{grounded}) query evaluation computes the \emph{lineage} of a tuple % (a Boolean formula encoding the provenance of the tuple)
and then the probability of the lineage formula.
In this paper we focus on intensional query evaluation with polynomials.
It has been shown that computing the marginal probability of a tuple is \sharpphard~\cite{valiant-79-cenrp} (by reduction from weighted model counting).
The second category, \emph{extensional} query evaluation, % avoids calculating the lineage.
% This approach
is in \ptime, but is limited to certain classes of queries.
Dalvi et al.~\cite{DS12} proved a dichotomy for unions of conjunctive queries (UCQs):
for any UCQ the probabilistic query evaluation problem is either \sharpphard (requires extensional evaluation) or \ptime (permits intensional).
Olteanu et al.~\cite{FO16} presented dichotomies for two classes of queries with negation. % R\'e et al~\cite{RS09b} present a trichotomy for HAVING queries.
Amarilli et al. investigated tractable classes of databases for more complex queries~\cite{AB15}. %,AB15c
Another line of work, studies which structural properties of lineage formulas lead to tractable cases~\cite{kenig-13-nclexpdc,roy-11-f,sen-10-ronfqevpd}.
Several techniques for approximating tuple probabilities have been proposed in related work~\cite{FH13,heuvel-19-anappdsd,DBLP:conf/icde/OlteanuHK10,DS07}, relying on Monte Carlo sampling, e.g.,~\cite{DS07}, or a branch-and-bound paradigm~\cite{DBLP:conf/icde/OlteanuHK10}.
Our approximation algorithm is also based on sampling.
Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (this data model is referred to as \emph{pvc-tables}). As an extension of K-relations, this approach supports bags. Probabilities are computed using a decomposition approach~\cite{DBLP:conf/icde/OlteanuHK10}. % over the symbolic expressions that are used as tuple annotations and values in pvc-tables.
% \cite{FH12} identifies a tractable class of queries involving aggregation.
In contrast, we study a less general data model and query class, but provide a linear time approximation algorithm and provide new insights into the complexity of computing expectation (while~\cite{FH12} computes probabilities for individual output annotations).
\noindent \textbf{Compressed Encodings} are used for Boolean formulas (e.g, various types of circuits including OBDDs~\cite{jha-12-pdwm}) and polynomials (e.g., factorizations~\cite{factorized-db}) some of which have been utilized for probabilistic query processing, e.g.,~\cite{jha-12-pdwm}.
Compact representations for which probabilities can be computed in linear time include OBDDs, SDDs, d-DNNF, and FBDD.
\cite{DM14c} studies circuits for absorptive semirings while~\cite{S18a} studies circuits that include negation (expressed as the monus operation). Algebraic Decision Diagrams~\cite{bahar-93-al} (ADDs) generalize BDDs to variables with more than two values. Chen et al.~\cite{chen-10-cswssr} introduced the generalized disjunctive normal form.
\noindent \Cref{sec:param-compl} covers more related work on fine-grained complexity.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: