Merge branch 'master' of https://gitlab.odin.cse.buffalo.edu/ahuber/SketchingWorlds
This commit is contained in:
commit
ed16f49249
1
.gitignore
vendored
1
.gitignore
vendored
|
@ -12,3 +12,4 @@
|
|||
*.xoj
|
||||
*.auxlock
|
||||
*.vtc
|
||||
auto
|
|
@ -2,14 +2,24 @@
|
|||
%root: main.tex
|
||||
\section{Introduction (Rewrite - 070921)}\label{sec:intro-rewrite-070921}
|
||||
\input{two-step-model}
|
||||
A probabilistic database (or PDB) $\pdb$ is a pair $\inparen{\idb, \pd}$ such that $\idb$ is a set of deterministic database instances (possible worlds) and $\pd$ is a probability distribution over $\idb$.
|
||||
In bag query semantics the random variable $\query\inparen{\pdb}\inparen{\tup}$ is the multiplicity of its corresponding output tuple $\tup$ (in a random database instance in $\idb$ chosen according to $\pd$).
|
||||
In addition to traditional deterministic query evaluation requirements (for a given query class), the query evaluation problem in bag-\abbrPDB semantics can be formally stated as:
|
||||
\begin{Problem}\label{prob:bag-pdb-query-eval}
|
||||
Given a query $\query$ from the set of positive relational algebra queries\footnote{The class of $\raPlus$ queries consists of all queries that can be composed of the positive (monotonic) relational algebra operators: selection, projection, join, and union (SPJU).} ($\raPlus$), compute the expected\footnote{Unless stated otherwise, we assume the implicity probability distribution $\pd$, and for notational convenience use $\expct\pbox{\cdot}$ instead of $\expct_\pd\pbox{\cdot}$.}
|
||||
multiplicity ($\expct\pbox{\query\inparen{\pdb}\inparen{\tup}}$)
|
||||
of output tuple $\tup$. We are interested in the data complexity of this problem (i.e. we think of $Q$ as being of constant size).
|
||||
A probabilistic database (PDB) $\pdb$ is a tuple $\inparen{\idb, \pd}$ such that $\idb$ is a set of deterministic database instances called possible worlds and $\pd$ is a probability distribution over $\idb$.
|
||||
A commonly studied problem in probabilistic databases is given a query $\query$, PDB $\pdb$, and possible query result tuple $\tup$, to compute the tuple's \textit{marginal probability} to be in the query's result, i.e., computing the expectation of a Boolean random variable over $\pd$ that is $1$ for every $\db \in \idb$ for which $\tup \in \query(\db)$ and $0$ otherwise. In this work, we are interested in bag semantics where each tuple $\tup$ is associated with a multiplicity $\db(\tup)$ from $\semN$ in each possible world.\footnote{We find it convenient to use the notation from~\cite{DBLP:conf/pods/GreenKT07} which models bag relations as function that map tuples to their multiplicity.}
|
||||
We refer to such a probabilistic database as a bag-probabilistic database or \abbrBPDB for short.
|
||||
The natural generalization of the problem of computing marginal probabilities of query result tuples to bag semantics is to compute the expectation of a random variable over $\pd$ that assign value $\query(\db)(\tup)$ in world $\db$:
|
||||
|
||||
% In bag query semantics the random variable $\query\inparen{\pdb}\inparen{\tup}$ is the multiplicity of its corresponding output tuple $\tup$ (in a random database instance in $\idb$ chosen according to $\pd$).
|
||||
%In addition to traditional deterministic query evaluation requirements (for a given query class), the query evaluation problem in bag-\abbrPDB semantics can be formally stated as:
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\begin{Problem}[Expected Multiplicity]\label{prob:bag-pdb-query-eval}
|
||||
Given a positive relational algebra query ($\raPlus$)\footnote{The class of $\raPlus$ queries consists of all queries that can be composed of the positive (monotonic) relational algebra operators: selection, projection, join, and union (SPJU).} $\query$, \abbrBPDB $\pdb$, and output tuple $\tup$, compute the expected
|
||||
multiplicity ($\expct_\pd\pbox{\query\inparen{\pdb}\inparen{\tup}}$)
|
||||
of tuple $\tup$.
|
||||
\end{Problem}
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
We are mostly interested in the data complexity of this problem (i.e. we think of $Q$ as being of constant size). Unless stated otherwise, we implicitly assume the probability distribution $\pd$, and for notational convenience use $\expct\pbox{\cdot}$ instead of $\expct_\pd\pbox{\cdot}$. It has been shown that the problem of computing the marginal probability of a query result tuple can be reduced to the problem of computing the probability that the lineage formula of the tuple evaluates to true. The lineage formula of a tuple is a propositional formula over boolean random variables representing the tuples of $\pdb$. The bag semantics analog for a lineage formula is a provenance polynomial, a polynomial with integer co-efficients and exponents over integer random variables (representing the multiplicity of input tuples) and we show that \Cref{prob:bag-pdb-query-eval} corresponds to the problem of computing the expectation of such a polynomial. Our main technical focus is on studying the complexity of this problem for various encoding of such polynomials. However, as we will show, these results also have implications for \cref{prob:bag-pdb-query-eval} when considering the cost of generating polynomials of query result tuples.
|
||||
|
||||
|
||||
Solving~\cref{prob:bag-pdb-query-eval} for arbitrary $\pd$ is hopeless since we need exponential space to repreent an arbitrary $\pd$.
|
||||
We initially focus on tuple-independent probabilistic bag-databases (\abbrTIDB), a compressed encoding of probabilistic databases where the presence of each individual tuple (out of a total of $\numvar$ input tuples) in a possible world can be modeled as an independent probabilistic event\footnote{
|
||||
|
@ -245,3 +255,9 @@ To get an $(1\pm \epsilon)$-multiplicative approximation we uniformly sample mon
|
|||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\mypar{Paper Organization} We present relevant background and notation in \Cref{sec:background}. We then prove our main hardness results in \Cref{sec:hard} and present our approximation algorithm in \Cref{sec:algo}. We present some (easy) generalizations of our results in \Cref{sec:gen} and also discuss extensions from computing expectations of polynomials to the expected result multiplicity problem (\Cref{def:the-expected-multipl})\AH{Aren't they the same?}. Finally, we discuss related work in \Cref{sec:related-work} and conclude in \Cref{sec:concl-future-work}.
|
||||
|
||||
|
||||
%%% Local Variables:
|
||||
%%% mode: latex
|
||||
%%% TeX-master: "main"
|
||||
%%% End:
|
||||
|
|
|
@ -124,6 +124,7 @@
|
|||
|
||||
%PDB Abbreviations
|
||||
\newcommand{\abbrPDB}{\textnormal{PDB}\xspace}
|
||||
\newcommand{\abbrBPDB}{\textnormal{BPDB}\xspace}
|
||||
\newcommand{\abbrTIDB}{\textnormal{TIDB}\xspace}%replace \ti with this
|
||||
\newcommand{\abbrBIDB}{\textnormal{BIDB}\xspace}
|
||||
\newcommand{\ti}{TIDB\xspace}
|
||||
|
|
Loading…
Reference in a new issue