Changes to S.2 and fig. 1

This commit is contained in:
Aaron Huber 2021-09-07 08:02:00 -04:00
parent 8921da1783
commit c28cc55127
4 changed files with 20 additions and 19 deletions

View file

@ -294,7 +294,7 @@ then we note that $\poly^2\inparen{\vct{\prob}}$ is in the range $[\inparen{p_0}
To get an $(1\pm \epsilon)$-multiplicative approximation we uniformly sample monomials from the \abbrSMB representation of $\Phi$ and `adjust' their contribution to $\widetilde{\Phi}\left(\cdot\right)$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\mypar{Paper Organization} We present relevant background and notation in \Cref{sec:background}. We then prove our main hardness results in \Cref{sec:hard} and present our approximation algorithm in \Cref{sec:algo}. We present some (easy) generalizations of our results in \Cref{sec:gen} and also discuss extensions from computing expectations of polynomials to the expected result multiplicity problem (\Cref{def:the-expected-multipl})\AH{Aren't they the same?}. Finally, we discuss related work in \Cref{sec:related-work} and conclude in \Cref{sec:concl-future-work}.
\mypar{Paper Organization} We present relevant background and notation in \Cref{sec:background}. We then prove our main hardness results in \Cref{sec:hard} and present our approximation algorithm in \Cref{sec:algo}. We present some (easy) generalizations of our results in \Cref{sec:gen} and also discuss extensions from computing expectations of polynomials to the expected result multiplicity problem (\Cref{def:the-expected-multipl}). Finally, we discuss related work in \Cref{sec:related-work} and conclude in \Cref{sec:concl-future-work}. All proofs are in the appendix.
%%% Local Variables:

View file

@ -110,7 +110,8 @@
% Incomplete DB/PDBs %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\idb}{\Omega}
\newcommand{\pd}{\mathcal{P}}%pd for probability distribution
\newcommand{\pd}{{\mathcal{P}_{\idb}}}%pd for probability distribution
\newcommand{\pdassign}{\mathcal{P}}
\newcommand{\pdb}{\mathcal{D}}
\newcommand{\encodedDB}{\textnormal{\db}}
\newcommand{\pxdb}{\pdb_{\semNX}}

View file

@ -5,14 +5,14 @@
\subsection{Probabilistic Databases}
While the setting used in this section is primarily that of a bag-\abbrPDB query with set-\abbrPDB inputs, recall, as noted in \cref{sec:intro-rewrite-070921}, this is not limiting. All proofs are located in the appendix.
Following typical representation of bags in production databases, for query inputs, we will use \abbrBPDB\xplural with $\{0, 1\}$ input.
An \textit{incomplete database} $\idb$ is a set of deterministic databases $\db$ called possible worlds.
Denote the schema of $\db$ as $\sch(\db)$. A \textit{probabilistic database} $\pdb$ is a pair $(\idb, \pd)$ where $\idb$ is an incomplete database and $\pd$ is a probability distribution over $\idb$. Queries over probabilistic databases are evaluated using the so-called possible world semantics. Under the possible world semantics, the result of a query $\query$ over an incomplete database $\idb$ is the set of query answers produced by evaluating $\query$ over each possible world: $\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}$.
A \textit{probabilistic database} $\pdb$ is a pair $(\idb, \pd)$ where $\idb$ is an incomplete database and $\pd$ is a probability distribution over $\idb$. Queries over probabilistic databases are evaluated using the so-called possible world semantics. Under the possible world semantics, the result of a query $\query$ over an incomplete database $\idb$ is the set of query answers produced by evaluating $\query$ over each possible world: $\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}$.
For a probabilistic database $\pdb = (\idb, \pd)$, the result of a query is the pair $(\query(\idb), \pd')$ where $\pd'$ is a probability distribution over $\query(\idb)$ that assigns to each possible query result the sum of the probabilities of the worlds that produce this answer:
%
\[\forall \db \in \query(\idb): \pd'(\db) = \sum_{\db' \in \idb: \query(\db') = \db} \pd(\db') \]
\[\forall \db' \in \query(\idb): \pd'(\db') = \sum_{\db \in \idb: \query(\db) = \db'} \pd(\db). \]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%NEEDS to be moved to the appendix.
@ -35,16 +35,16 @@ For a probabilistic database $\pdb = (\idb, \pd)$, the result of a query is th
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%END: move to appendix.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Recall \cref{fig:nxDBSemantics} which depicts the semantics for constructing a lineage polynomial $\apolyqdt$ for any $\raPlus$ query. We now make a meaningful connection between possible world semantics and world assignments on the lineage polynomial.
\begin{Proposition}[Expectation of polynomials]\label{prop:expection-of-polynom}
Given an $\semN$-\abbrPDB $\pdb = (\idb,\pd)$ and equivalent polynomial $\polyForTuple$ for aribitrary tuple $\tup \in \pdb$,%$\semNX$-\abbrPDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$,
Given a \abbrBPDB $\pdb = (\idb,\pd)$ and lineage polynomial $\apolyqdt$ for aribitrary output tuple $\tup$, %$\semNX$-\abbrPDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$,
we have:
$ \expct_{\randDB \sim \pd}[\query(\randDB)(t)] = \expct_{\randWorld\sim \pd'}\pbox{\poly_{\query, \tup}(\randWorld)}. $
\footnote{Although assumed by most prior work on set-probabilistic databases, e.g., as an obvious consequence of~\cite{IL84a}'s Theorem 7.1, we are unaware of any formal proof for bag-probabilistic databases.}
$ \expct_{\randDB \sim \pd}[\query(\randDB)(t)] = \expct_{\randWorld\sim \pdassign}\pbox{\apolyqdt(\randWorld)}. $
\end{Proposition}
\noindent A formal proof of \Cref{prop:expection-of-polynom} is given in \Cref{subsec:expectation-of-polynom-proof}.
This proposition shows that computing expected tuple multiplicities is equivalent to computing the expectation of a polynomial (for that tuple) from a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$.
We focus on this problem from now on, assume an implicit result tuple, and so drop the subscript from $\poly_{\query, \tup}$ (i.e., $\poly$ will denote a polynomial).
\noindent A formal proof of \Cref{prop:expection-of-polynom} is given in \Cref{subsec:expectation-of-polynom-proof}.\footnote{Although \Cref{prop:expection-of-polynom} follows, e.g., as an obvious consequence of~\cite{IL84a}'s Theorem 7.1, we are unaware of any formal proof for bag-probabilistic databases.}
%This proposition shows that computing expected tuple multiplicities is equivalent to computing the expectation of a polynomial (for that tuple) from a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$.
We focus on the problem of computing $\expct_\pdassign\pbox{\apolyqdt\inparen{\randWorld}}$a from now on, assume an implicit result tuple, and so drop the subscript from $\apolyqdt$ (i.e., $\poly$ will denote a polynomial).
\subsubsection{\tis and \bis}
\label{subsec:tidbs-and-bidbs}

View file

@ -58,8 +58,8 @@
Buffalo & $AX$ &\resizebox{!}{10mm}{
\begin{tikzpicture}[thick]
\node[gen_tree_node](sink) at (0.5, 0.8){$\boldsymbol{\circmult}$};
\node[gen_tree_node](source1) at (0, 0){$L_a$};
\node[gen_tree_node](source2) at (1, 0){$R_a$};
\node[gen_tree_node](source1) at (0, 0){$A$};
\node[gen_tree_node](source2) at (1, 0){$X$};
\draw[->](source1)--(sink);
\draw[->] (source2)--(sink);
\end{tikzpicture}% & $0.5 \cdot 1.0 + 0.5 \cdot 1.0 = 1.0$
@ -67,10 +67,10 @@
Chicago & $B(Y + Z)$\newline \text{Or}\newline $BY+ BZ$&
\resizebox{!}{16mm} {
\begin{tikzpicture}[thick]
\node[gen_tree_node] (a1) at (1, 0){$R_b$};
\node[gen_tree_node] (b1) at (2, 0){$R_c$};
\node[gen_tree_node] (a1) at (1, 0){$Y$};
\node[gen_tree_node] (b1) at (2, 0){$Z$};
%level 1
\node[gen_tree_node] (a2) at (0.75, 0.8){$L_b$};
\node[gen_tree_node] (a2) at (0.75, 0.8){$B$};
\node[gen_tree_node] (b2) at (1.5, 0.8){$\boldsymbol{\circplus}$};
%level 0
\node[gen_tree_node] (a3) at (1.1, 1.6){$\boldsymbol{\circmult}$};
@ -86,9 +86,9 @@
%%%%%%%%%%%
\resizebox{!}{16mm} {
\begin{tikzpicture}[thick]
\node[gen_tree_node] (a2) at (0, 0){$R_b$};
\node[gen_tree_node] (b2) at (1, 0){$L_b$};
\node[gen_tree_node] (c2) at (2, 0){$R_c$};
\node[gen_tree_node] (a2) at (0, 0){$Y$};
\node[gen_tree_node] (b2) at (1, 0){$B$};
\node[gen_tree_node] (c2) at (2, 0){$Z$};
%level 1
\node[gen_tree_node] (a1) at (0.5, 0.8){$\boldsymbol{\circmult}$};
\node[gen_tree_node] (b1) at (1.5, 0.8){$\boldsymbol{\circmult}$};