This commit is contained in:
Boris Glavic 2021-08-26 20:06:30 -05:00
parent bd70b6147e
commit 2f88dc8524

View file

@ -2,12 +2,17 @@
%root: main.tex
\section{Introduction (Rewrite - 070921)}\label{sec:intro-rewrite-070921}
\input{two-step-model}
A probabilistic database $\pdb$ is a tuple $\inparen{\idb, \pd}$ such that $\idb$ is a set of deterministic database instances (possible worlds) and $\pd$ is a probability distribution over $\idb$.
A probabilistic database (PDB) $\pdb$ is a tuple $\inparen{\idb, \pd}$ such that $\idb$ is a set of deterministic database instances called possible worlds and $\pd$ is a probability distribution over $\idb$.
A commonly studied problem in probabilistic databases is given a query $\query$, PDB $\pdb$, and possible query result tuple $\tup$, to compute the tuple's \textit{marginal probability} to be in the query's result, i.e., computing the expectation of a Boolean random variable over $\pd$ that is $1$ for every $\db \in \idb$ for which $\tup \in \query(\db)$ and $0$ otherwise. In this work, we are interested in bag semantics where each tuple $\tup$ is associated with a multiplicity $\db(\tup)$ from $\semN$ in each possible world.\footnote{We find it convenient to use the notation from~\cite{DBLP:conf/pods/GreenKT07} which models bag relations as function that map tuples to their multiplicity.} The natural generalization of the problem of computing marginal probabilities of query result tuples to bag semantics is to compute the expectation of a random variable over $\pd$ that is $m$ for world $\db$ iff $\query(\db)(\tup) = m$.
In bag count-query semantics the random variable $\query\inparen{\pdb}\inparen{\tup}$ computes the multiplicity of its corresponding tuple $\tup$.
In addition to traditional deterministic query evaluation requirements (for a given query class), the count-query evaluation problem in bag-\abbrPDB semantics can be formally stated as:
\begin{Problem}\label{prob:bag-pdb-query-eval}
Given a query $\query$ from the set of positive relational algebra queries ($\raPlus$),\footnote{The class of $\raPlus$ queries consists of all queries that can be composed of the positive (monotonic) relational algebra operators: selection, projection, join, and union (SPJU).} compute the expected multiplicity ($\expct\pbox{\query\inparen{\pdb}\inparen{\tup}}$)\footnote{We assume the implicity probability distribution $\pd$, and explicilty denote the distribution if it is not implicit.} of output tuple $\tup$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Problem}[Expected Multiplicity]\label{prob:bag-pdb-query-eval}
Given a positive relational algebra ($\raPlus$)\footnote{The class of $\raPlus$ queries consists of all queries that can be composed of the positive (monotonic) relational algebra operators: selection, projection, join, and union (SPJU).} query $\query$ and bag-PDB $\pdb$, compute the expected multiplicity ($\expct\pbox{\query\inparen{\pdb}\inparen{\tup}}$)\footnote{We assume the implicity probability distribution $\pd$, and explicilty denote the distribution if it is not implicit.} .
\end{Problem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We initially focus on tuple-independent probabilistic bag-databases (\abbrTIDB), a compressed encoding of probabilistic databases where the presence of each individual copy of a tuple in a possible world can be modeled as an independent probabilistic event\footnote{
This model corresponds to the classical set-relational approach to \abbrTIDB{}s, reducing duplicate tuples to a set-\abbrTIDB by assigning unique keys across all $\tup$ in $\pdb$. This typically has an $\bigO{c}$ increase in size, for $c = \max_{\tup \in \db}\db\inparen{\tup}$, where $\db\inparen{\tup}$ denotes $\tup$'s multiplicity in the encoding.
@ -180,3 +185,9 @@ To get an $(1\pm \epsilon)$-multiplicative approximation we uniformly sample mon
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\mypar{Paper Organization} We present relevant background and notation in \Cref{sec:background}. We then prove our main hardness results in \Cref{sec:hard} and present our approximation algorithm in \Cref{sec:algo}. We present some (easy) generalizations of our results in \Cref{sec:gen} and also discuss extensions from computing expectations of polynomials to the expected result multiplicity problem (\Cref{def:the-expected-multipl})\AH{Aren't they the same?}. Finally, we discuss related work in \Cref{sec:related-work} and conclude in \Cref{sec:concl-future-work}.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: