paper-BagRelationalPDBsAreHard/related-work.tex

35 lines
3.3 KiB
TeX
Raw Normal View History

%!TEX root=./main.tex
2020-12-11 20:29:01 -05:00
\section{Related Work}\label{sec:related-work}
\textbf{Probabilistic Databases} (PDBs) have been studied predominantly for set semantics.
2021-04-08 22:17:57 -04:00
Approaches for probabilistic query processing (i.e., computing marginal probabilities of tuples), fall into two broad categories.
\emph{Intensional} (or \emph{grounded}) query evaluation computes the \emph{lineage} of a tuple
2020-12-19 16:44:18 -05:00
and then the probability of the lineage formula.
It has been shown that computing the marginal probability of a tuple is \sharpphard~\cite{valiant-79-cenrp} (by reduction from weighted model counting).
The second category, \emph{extensional} query evaluation,
2020-12-19 16:44:18 -05:00
is in \ptime, but is limited to certain classes of queries.
2021-04-08 22:17:57 -04:00
Dalvi et al.~\cite{DS12} and Olteanu et al.~\cite{FO16} proved dichotomies for UCQs and two classes of queries with negation, respectively.
Amarilli et al. investigated tractable classes of databases for more complex queries~\cite{AB15}.
2021-09-18 01:47:02 -04:00
Another line of work studies which structural properties of lineage formulas lead to tractable cases~\cite{kenig-13-nclexpdc,roy-11-f,sen-10-ronfqevpd}.
2021-04-08 22:17:57 -04:00
In this paper we focus on intensional query evaluation with polynomials.
Many data models have been proposed for encoding PDBs more compactly than as sets of possible worlds.
2021-09-17 19:29:25 -04:00
These include tuple-independent databases~\cite{VS17} (\tis), block-independent databases (\bis)~\cite{RS07}, and \emph{PC-tables}~\cite{GT06}.
2021-04-08 22:17:57 -04:00
%
2021-09-17 19:29:25 -04:00
Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (\emph{pvc-tables}) that supports bags, and has runtime complexity linear in the size of the lineage.
2022-02-17 10:15:52 -05:00
However, this lineage is encoded as a tree; the size (and thus the runtime) are still superlinear in $\qruntime{\query, \tupset, \bound}$.
2022-06-02 12:13:46 -04:00
The runtime bound is also limited to a specific class of (hierarchical) queries, suggesting the possibility of a generalization of \cite{DS12}'s dichotomy result to \abbrBPDB\xplural for our problem (\cite{https://doi.org/10.48550/arxiv.2201.11524} presents a dichotomy result for a related problem).
2020-12-19 01:19:27 -05:00
2020-12-19 16:44:18 -05:00
Several techniques for approximating tuple probabilities have been proposed in related work~\cite{FH13,heuvel-19-anappdsd,DBLP:conf/icde/OlteanuHK10,DS07}, relying on Monte Carlo sampling, e.g.,~\cite{DS07}, or a branch-and-bound paradigm~\cite{DBLP:conf/icde/OlteanuHK10}.
2020-12-20 18:54:40 -05:00
Our approximation algorithm is also based on sampling.
2020-12-19 01:19:27 -05:00
2021-04-08 22:17:57 -04:00
\noindent \textbf{Compressed Encodings} are used for Boolean formulas (e.g, various types of circuits including OBDDs~\cite{jha-12-pdwm}) and polynomials (e.g., factorizations~\cite{factorized-db}) some of which have been utilized for probabilistic query processing, e.g.,~\cite{jha-12-pdwm}.
Compact representations for which probabilities can be computed in linear time include OBDDs, SDDs, d-DNNF, and FBDD.
\cite{DM14c} studies circuits for absorptive semirings while~\cite{S18a} studies circuits that include negation (expressed as the monus operation). Algebraic Decision Diagrams~\cite{bahar-93-al} (ADDs) generalize BDDs to variables with more than two values. Chen et al.~\cite{chen-10-cswssr} introduced the generalized disjunctive normal form.
2021-04-08 22:17:57 -04:00
\Cref{sec:param-compl} covers more related work on fine-grained complexity.
2020-12-11 20:29:01 -05:00
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: