paper-BagRelationalPDBsAreHard/related-work.tex

45 lines
4.6 KiB
TeX
Raw Normal View History

%!TEX root=./main.tex
2020-12-11 20:29:01 -05:00
\section{Related Work}\label{sec:related-work}
2020-12-19 23:44:40 -05:00
In addition to probabilistic databases, our work has connections to work on compact representations of polynomials and on fine-grained complexity, which we review in \Cref{sec:compr-repr-polyn,sec:param-compl}.
2020-12-19 01:19:27 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\subsection{Probabilistic Databases}\label{sec:prob-datab}
2020-12-19 01:19:27 -05:00
2020-12-19 16:44:18 -05:00
Probabilistic Databases (PDBs) have been studied predominantly for set semantics.
A multitude of data models have been proposed for encoding a PDB more compactly than as its set of possible worlds. These include tuple-independent databases~\cite{VS17} (\tis), block-independent databases (\bis)~\cite{RS07}, and \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}
with probability distribution over its variables.
This is similar to our $\semNX$-PDBs, but we use polynomials instead of Boolean expressions and only allow constants as attribute values.
% Tuple-independent databases (\tis) consist of a classical database where each tuple associated with a probability and tuples are treated as independent probabilistic events.
% While unable to encode correlations directly, \tis are popular because any finite probabilistic database can be encoded as a \ti and a set of constraints that ``condition'' the \ti~\cite{VS17}.
% Block-independent databases (\bis) generalize \tis by partitioning the input into blocks of disjoint tuples, where blocks are independent~\cite{RS07}. %,BS06
% \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}
% with probability distribution over its variables. This is similar to our $\semNX$-PDBs, except that we do not allow for variables as attribute values and instead of local conditions (propositional formulas that may contain comparisons), we associate tuples with polynomials $\semNX$.
2020-12-19 01:19:27 -05:00
2020-12-19 16:44:18 -05:00
Approaches for probabilistic query processing (i.e., computing the marginal probability for query result tuples), fall into two broad categories.
\emph{Intensional} (or \emph{grounded}) query evaluation computes the \emph{lineage} of a tuple % (a Boolean formula encoding the provenance of the tuple)
and then the probability of the lineage formula.
2020-12-19 23:44:40 -05:00
In this paper we focus on intensional query evaluation using polynomials instead of Boolean formulas.
2020-12-19 16:44:18 -05:00
It is a well-known fact that computing the marginal probability of a tuple is \sharpphard (proven through a reduction from weighted model counting~\cite{valiant-79-cenrp} %provan-83-ccccptg
using the fact the tuple's marginal probability is the probability of a its lineage formula).
The second category, \emph{extensional} query evaluation, % avoids calculating the lineage.
% This approach
is in \ptime, but is limited to certain classes of queries.
Dalvi et al.~\cite{DS12} proved that a dichotomy for unions of conjunctive queries (UCQs):
for any UCQ the probabilistic query evaluation problem is either \sharpphard (requires extensional evaluation) or \ptime (permits intensional).
Olteanu et al.~\cite{FO16} presented dichotomies for two classes of queries with negation. % R\'e et al~\cite{RS09b} present a trichotomy for HAVING queries.
Amarilli et al. investigated tractable classes of databases for more complex queries~\cite{AB15}. %,AB15c
Another line of work, studies which structural properties of lineage formulas lead to tractable cases~\cite{kenig-13-nclexpdc,roy-11-f,sen-10-ronfqevpd}.
2020-12-19 01:19:27 -05:00
2020-12-19 16:44:18 -05:00
Several techniques for approximating tuple probabilities have been proposed in related work~\cite{FH13,heuvel-19-anappdsd,DBLP:conf/icde/OlteanuHK10,DS07}, relying on Monte Carlo sampling, e.g.,~\cite{DS07}, or a branch-and-bound paradigm~\cite{DBLP:conf/icde/OlteanuHK10}.
The approximation algorithm for bag expectation we present in this work is based on sampling.
2020-12-19 01:19:27 -05:00
2020-12-19 16:44:18 -05:00
Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (this data model is referred to as \emph{pvc-tables}). As an extension of K-relations, this approach supports bags. Probabilities are computed using a decomposition approach~\cite{DBLP:conf/icde/OlteanuHK10}. % over the symbolic expressions that are used as tuple annotations and values in pvc-tables.
% \cite{FH12} identifies a tractable class of queries involving aggregation.
In contrast, we study a less general data model and query class, but provide a linear time approximation algorithm and provide new insights into the complexity of computing expectation (while~\cite{FH12} computes probabilities for individual output annotations).
2020-12-19 01:19:27 -05:00
2020-12-11 20:29:01 -05:00
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: