paper-BagRelationalPDBsAreHard/related-work.tex

%!TEX root=./main.tex
\section{Related Work}\label{sec:related-work}

In addition to probabilistic databases, our work has connections to work on compact representations of polynomials and on fine-grained complexity, which we review in \Cref{sec:compr-repr-polyn,sec:param-compl}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\subsection{Probabilistic Databases}\label{sec:prob-datab}

Probabilistic Databases (PDBs) have been studied predominantly for set semantics.
A multitude of data models have been proposed for encoding a PDB more compactly than as its set of possible worlds. These include tuple-independent databases~\cite{VS17} (\tis), block-independent databases (\bis)~\cite{RS07}, and \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}
with probability distribution  over its variables.
This is similar to our $\semNX$-PDBs, but we use polynomials instead of Boolean expressions and only allow constants as attribute values.
% Tuple-independent databases (\tis) consist of a classical database where each tuple associated with a probability and tuples are treated as independent probabilistic events.
% While unable to encode correlations directly, \tis are popular because any finite probabilistic database can be encoded as a \ti and a set of constraints that ``condition'' the \ti~\cite{VS17}.
% Block-independent databases (\bis) generalize \tis by partitioning the input into blocks of disjoint tuples, where blocks are independent~\cite{RS07}. %,BS06
% \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}
% with probability distribution over its variables. This is similar to our $\semNX$-PDBs, except that we do not allow for variables as attribute values and instead of local conditions (propositional formulas that may contain comparisons), we associate tuples with polynomials $\semNX$.

Approaches for probabilistic query processing (i.e., computing the marginal probability for query result tuples), fall into two broad categories.
\emph{Intensional} (or \emph{grounded}) query evaluation computes the \emph{lineage} of a tuple % (a Boolean formula encoding the provenance of the tuple)
and then the probability of the lineage formula.
In this paper we focus on intensional query evaluation using polynomials instead of Boolean formulas.
It is a well-known fact that computing the marginal probability of a tuple is \sharpphard (proven through a reduction from weighted model counting~\cite{valiant-79-cenrp} %provan-83-ccccptg
using the fact the tuple's marginal probability is the probability of a its lineage formula).
The second category, \emph{extensional} query evaluation, % avoids calculating the lineage.
% This approach
is in \ptime, but is limited to certain classes of queries.
Dalvi et al.~\cite{DS12} proved that  a dichotomy for unions of conjunctive queries (UCQs):
for any UCQ the probabilistic query evaluation problem is either \sharpphard (requires extensional evaluation) or \ptime (permits intensional).
Olteanu et al.~\cite{FO16} presented dichotomies for two classes of queries with negation. % R\'e et al~\cite{RS09b} present a trichotomy for HAVING queries.
Amarilli et al. investigated tractable classes of databases for more complex queries~\cite{AB15}. %,AB15c
Another line of work, studies which structural properties of lineage formulas lead to tractable cases~\cite{kenig-13-nclexpdc,roy-11-f,sen-10-ronfqevpd}.

Several techniques for approximating tuple probabilities have been proposed in related work~\cite{FH13,heuvel-19-anappdsd,DBLP:conf/icde/OlteanuHK10,DS07}, relying on Monte Carlo sampling, e.g.,~\cite{DS07}, or a branch-and-bound paradigm~\cite{DBLP:conf/icde/OlteanuHK10}.
The approximation algorithm for bag expectation we present in this work is based on sampling.

Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (this data model is referred to as \emph{pvc-tables}). As an extension of K-relations, this approach supports bags. Probabilities are computed using a decomposition approach~\cite{DBLP:conf/icde/OlteanuHK10}. % over the symbolic expressions that are used as tuple annotations and values in pvc-tables.
% \cite{FH12} identifies a tractable class of queries involving aggregation.
In contrast, we study a less general data model and query class, but provide a linear time approximation algorithm and provide new insights into the complexity of computing expectation (while~\cite{FH12} computes probabilities for individual output annotations).

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End:
Trimming about all that I can trim through rephrasing+spacing cheats 2020-12-19 14:02:12 -05:00			`%!TEX root=./main.tex`
conclusions 2020-12-11 20:29:01 -05:00			`\section{Related Work}\label{sec:related-work}`

Done with pass on S6+7 2020-12-19 23:44:40 -05:00			`In addition to probabilistic databases, our work has connections to work on compact representations of polynomials and on fine-grained complexity, which we review in \Cref{sec:compr-repr-polyn,sec:param-compl}.`
merged bib + wrote related work 2020-12-19 01:19:27 -05:00
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Trimming about all that I can trim through rephrasing+spacing cheats 2020-12-19 14:02:12 -05:00			`%\subsection{Probabilistic Databases}\label{sec:prob-datab}`
merged bib + wrote related work 2020-12-19 01:19:27 -05:00
shorten 2020-12-19 16:44:18 -05:00			`Probabilistic Databases (PDBs) have been studied predominantly for set semantics.`
			`A multitude of data models have been proposed for encoding a PDB more compactly than as its set of possible worlds. These include tuple-independent databases~\cite{VS17} (\tis), block-independent databases (\bis)~\cite{RS07}, and \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}`
			`with probability distribution over its variables.`
			`This is similar to our $\semNX$-PDBs, but we use polynomials instead of Boolean expressions and only allow constants as attribute values.`
			`% Tuple-independent databases (\tis) consist of a classical database where each tuple associated with a probability and tuples are treated as independent probabilistic events.`
			% While unable to encode correlations directly, \tis are popular because any finite probabilistic database can be encoded as a \ti and a set of constraints that ``condition'' the \ti~\cite{VS17}.
			`% Block-independent databases (\bis) generalize \tis by partitioning the input into blocks of disjoint tuples, where blocks are independent~\cite{RS07}. %,BS06`
			`% \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}`
			`% with probability distribution over its variables. This is similar to our $\semNX$-PDBs, except that we do not allow for variables as attribute values and instead of local conditions (propositional formulas that may contain comparisons), we associate tuples with polynomials $\semNX$.`
merged bib + wrote related work 2020-12-19 01:19:27 -05:00
shorten 2020-12-19 16:44:18 -05:00			`Approaches for probabilistic query processing (i.e., computing the marginal probability for query result tuples), fall into two broad categories.`
			`\emph{Intensional} (or \emph{grounded}) query evaluation computes the \emph{lineage} of a tuple % (a Boolean formula encoding the provenance of the tuple)`
			`and then the probability of the lineage formula.`
Done with pass on S6+7 2020-12-19 23:44:40 -05:00			`In this paper we focus on intensional query evaluation using polynomials instead of Boolean formulas.`
shorten 2020-12-19 16:44:18 -05:00			`It is a well-known fact that computing the marginal probability of a tuple is \sharpphard (proven through a reduction from weighted model counting~\cite{valiant-79-cenrp} %provan-83-ccccptg`
			`using the fact the tuple's marginal probability is the probability of a its lineage formula).`
			`The second category, \emph{extensional} query evaluation, % avoids calculating the lineage.`
			`% This approach`
			`is in \ptime, but is limited to certain classes of queries.`
			`Dalvi et al.~\cite{DS12} proved that a dichotomy for unions of conjunctive queries (UCQs):`
			`for any UCQ the probabilistic query evaluation problem is either \sharpphard (requires extensional evaluation) or \ptime (permits intensional).`
			`Olteanu et al.~\cite{FO16} presented dichotomies for two classes of queries with negation. % R\'e et al~\cite{RS09b} present a trichotomy for HAVING queries.`
			`Amarilli et al. investigated tractable classes of databases for more complex queries~\cite{AB15}. %,AB15c`
Trimming about all that I can trim through rephrasing+spacing cheats 2020-12-19 14:02:12 -05:00			`Another line of work, studies which structural properties of lineage formulas lead to tractable cases~\cite{kenig-13-nclexpdc,roy-11-f,sen-10-ronfqevpd}.`
merged bib + wrote related work 2020-12-19 01:19:27 -05:00
shorten 2020-12-19 16:44:18 -05:00			`Several techniques for approximating tuple probabilities have been proposed in related work~\cite{FH13,heuvel-19-anappdsd,DBLP:conf/icde/OlteanuHK10,DS07}, relying on Monte Carlo sampling, e.g.,~\cite{DS07}, or a branch-and-bound paradigm~\cite{DBLP:conf/icde/OlteanuHK10}.`
Trimming about all that I can trim through rephrasing+spacing cheats 2020-12-19 14:02:12 -05:00			`The approximation algorithm for bag expectation we present in this work is based on sampling.`
merged bib + wrote related work 2020-12-19 01:19:27 -05:00
shorten 2020-12-19 16:44:18 -05:00			`Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (this data model is referred to as \emph{pvc-tables}). As an extension of K-relations, this approach supports bags. Probabilities are computed using a decomposition approach~\cite{DBLP:conf/icde/OlteanuHK10}. % over the symbolic expressions that are used as tuple annotations and values in pvc-tables.`
			`% \cite{FH12} identifies a tractable class of queries involving aggregation.`
			`In contrast, we study a less general data model and query class, but provide a linear time approximation algorithm and provide new insights into the complexity of computing expectation (while~\cite{FH12} computes probabilities for individual output annotations).`
merged bib + wrote related work 2020-12-19 01:19:27 -05:00
conclusions 2020-12-11 20:29:01 -05:00			`%%% Local Variables:`
			`%%% mode: latex`
			`%%% TeX-master: "main"`
			`%%% End:`