related + conclusions

master
Boris Glavic 2021-04-08 21:17:57 -05:00
parent 262debf68c
commit ab6c53c52e
2 changed files with 36 additions and 28 deletions

View File

@ -1,20 +1,24 @@
%!TEX root=./main.tex
\section{Conclusions and Future Work}\label{sec:concl-future-work}
We have studied the problem of calculating the expectation of query polynomials over BIDBs. %random integer variables.
We have studied the problem of calculating the expectation of lineage polynomials over BIDBs. %random integer variables.
This problem has a practical application in probabilistic databases over multisets, where it corresponds to calculating the expected multiplicity of a query result tuple.
It has been studied extensively for sets (lineage formulas), but the bag settings has not received much attention.
While the expectation of a polynomial can be calculated in linear time in the size of polynomials that are in SOP form, the problem is \sharpwonehard for factorized polynomials.
We have proven this claim through a reduction from the problem of counting k-matchings.
When only considering polynomials for result tuples of UCQs over TIDBs and BIDBs (under the assumption that there are few cancellations), we prove that it is still possible to approximate the expectation of a polynomial in linear time.
Interesting directions for future work include development of a dichotomy for queries over bag PDBs and approximations for data models beyond what we consider in this paper.
% It has been studied extensively for sets (lineage formulas), but the bag settings has not received much attention.
While the expectation of a polynomial can be calculated in linear time for % in the size of
polynomials % that are
in SOP form, the problem is \sharpwonehard for factorized polynomials (proven through a reduction from the problem of counting k-matchings).
%We have proven this claim through a reduction from the problem of counting k-matchings.
We prove that it is possible to approximate the expectation of a lineage polynomial in linear time
% When only considering polynomials for result tuples of
UCQs over TIDBs and BIDBs (under the assumption that there are few cancellations).
Interesting directions for future work include development of a dichotomy for bag PDBs and approximations for more general data models. % beyond what we consider in this paper.
% Furthermore, it would be interesting to see whether our approximation algorithm can be extended to support queries with negations, perhaps using circuits with monus as a representation system.
\BG{I am not sure what interesting future work is here. Some wild guesses, if anybody agrees I'll try to flesh them out:
\textbullet{More queries: what happens with negation can circuits with monus be used?}
\textbullet{More databases: can we push beyond BIDBs? E.g., C-tables / aggregate semimodules or just TIDBs where each input tuple is a random variable over $\mathbb{N}$?}
\textbullet{Other results: can we extend the work to approximate $P(R(t) = n)$}
}
% \BG{I am not sure what interesting future work is here. Some wild guesses, if anybody agrees I'll try to flesh them out:
% \textbullet{More queries: what happens with negation can circuits with monus be used?}
% \textbullet{More databases: can we push beyond BIDBs? E.g., C-tables / aggregate semimodules or just TIDBs where each input tuple is a random variable over $\mathbb{N}$?}
% \textbullet{Other results: can we extend the work to approximate $P(R(t) = n)$}
% }
%%% Local Variables:
%%% mode: latex

View File

@ -4,35 +4,39 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\subsection{Probabilistic Databases}\label{sec:prob-datab}
\textbf{Probabilistic Databases} (PDBs) have been studied predominantly for set semantics.
Many data models have been proposed for encoding PDBs more compactly than as sets of possible worlds.
These include tuple-independent databases~\cite{VS17} (\tis), block-independent databases (\bis)~\cite{RS07}, and \emph{PC-tables}~\cite{GT06}, which is similar to our $\semNX$-PDBs, with Boolean expressions instead of polynomials.
Approaches for probabilistic query processing (i.e., computing marginal probabilities for tuples), fall into two broad categories.
\emph{Intensional} (or \emph{grounded}) query evaluation computes the \emph{lineage} of a tuple
Approaches for probabilistic query processing (i.e., computing marginal probabilities of tuples), fall into two broad categories.
\emph{Intensional} (or \emph{grounded}) query evaluation computes the \emph{lineage} of a tuple
and then the probability of the lineage formula.
In this paper we focus on intensional query evaluation with polynomials.
It has been shown that computing the marginal probability of a tuple is \sharpphard~\cite{valiant-79-cenrp} (by reduction from weighted model counting).
The second category, \emph{extensional} query evaluation, % avoids calculating the lineage.
% This approach
is in \ptime, but is limited to certain classes of queries.
Dalvi et al.~\cite{DS12} proved a dichotomy for UCQs:
for any UCQ the probabilistic query evaluation problem is either \sharpphard or \ptime.
Olteanu et al.~\cite{FO16} presented dichotomies for two classes of queries with negation. % R\'e et al~\cite{RS09b} present a trichotomy for HAVING queries.
Dalvi et al.~\cite{DS12} and Olteanu et al.~\cite{FO16} proved dichotomies for UCQs and two classes of queries with negation, respectively.
% Dalvi et al.~\cite{DS12} proved a dichotomy for UCQs:
% for any UCQ the probabilistic query evaluation problem is either \sharpphard or \ptime.
% Olteanu et al.~\cite{FO16} presented dichotomies for two classes of queries with negation.
% R\'e et al~\cite{RS09b} present a trichotomy for HAVING queries.
Amarilli et al. investigated tractable classes of databases for more complex queries~\cite{AB15}. %,AB15c
Another line of work, studies which structural properties of lineage formulas lead to tractable cases~\cite{kenig-13-nclexpdc,roy-11-f,sen-10-ronfqevpd}.
In this paper we focus on intensional query evaluation with polynomials.
Many data models have been proposed for encoding PDBs more compactly than as sets of possible worlds.
These include tuple-independent databases~\cite{VS17} (\tis), block-independent databases (\bis)~\cite{RS07}, and \emph{PC-tables}~\cite{GT06}.
%
Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (\emph{pvc-tables}). As an extension of K-relations, this approach supports bags. % Probabilities are computed using a decomposition approach~\cite{DBLP:conf/icde/OlteanuHK10}.
% over the symbolic expressions that are used as tuple annotations and values in pvc-tables.
% \cite{FH12} identifies a tractable class of queries involving aggregation.
In contrast, we study a less general data model ($\semNX$-PDBs)
and query class, but provide a linear time approximation algorithm and provide new insights into the complexity of computing expectations while~\cite{FH12} computes probabilities for individual output annotations.
Several techniques for approximating tuple probabilities have been proposed in related work~\cite{FH13,heuvel-19-anappdsd,DBLP:conf/icde/OlteanuHK10,DS07}, relying on Monte Carlo sampling, e.g.,~\cite{DS07}, or a branch-and-bound paradigm~\cite{DBLP:conf/icde/OlteanuHK10}.
Our approximation algorithm is also based on sampling.
Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (this data model is referred to as \emph{pvc-tables}). As an extension of K-relations, this approach supports bags. Probabilities are computed using a decomposition approach~\cite{DBLP:conf/icde/OlteanuHK10}. % over the symbolic expressions that are used as tuple annotations and values in pvc-tables.
% \cite{FH12} identifies a tractable class of queries involving aggregation.
In contrast, we study a less general data model and query class, but provide a linear time approximation algorithm and provide new insights into the complexity of computing expectation (while~\cite{FH12} computes probabilities for individual output annotations).
\noindent \textbf{Compressed Encodings} are used for Boolean formulas (e.g, various types of circuits including OBDDs~\cite{jha-12-pdwm}) and polynomials (e.g., factorizations~\cite{factorized-db}) some of which have been utilized for probabilistic query processing, e.g.,~\cite{jha-12-pdwm}.
Compact representations for which probabilities can be computed in linear time include OBDDs, SDDs, d-DNNF, and FBDD.
\noindent \textbf{Compressed Encodings} are used for Boolean formulas (e.g, various types of circuits including OBDDs~\cite{jha-12-pdwm}) and polynomials (e.g., factorizations~\cite{factorized-db}) some of which have been utilized for probabilistic query processing, e.g.,~\cite{jha-12-pdwm}.
Compact representations for which probabilities can be computed in linear time include OBDDs, SDDs, d-DNNF, and FBDD.
\cite{DM14c} studies circuits for absorptive semirings while~\cite{S18a} studies circuits that include negation (expressed as the monus operation). Algebraic Decision Diagrams~\cite{bahar-93-al} (ADDs) generalize BDDs to variables with more than two values. Chen et al.~\cite{chen-10-cswssr} introduced the generalized disjunctive normal form.
\noindent \Cref{sec:param-compl} covers more related work on fine-grained complexity.
%\noindent
\Cref{sec:param-compl} covers more related work on fine-grained complexity.
%%% Local Variables: