Changes to S.2 and fig. 1

2021-09-07 08:02:00 -04:00 · 2021-09-07 08:02:00 -04:00 · c28cc55127
parent 8921da1783
commit c28cc55127
4 changed files with 20 additions and 19 deletions
--- a/intro-rewrite-070921.tex
+++ b/intro-rewrite-070921.tex
@ -294,7 +294,7 @@ then we note that $\poly^2\inparen{\vct{\prob}}$ is in the range $[\inparen{p_0}
 To get an $(1\pm \epsilon)$-multiplicative approximation we uniformly sample monomials from the \abbrSMB representation of $\Phi$ and `adjust' their contribution to $\widetilde{\Phi}\left(\cdot\right)$.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\mypar{Paper Organization} We present relevant background and notation in \Cref{sec:background}. We then prove our main hardness results in \Cref{sec:hard} and present our approximation algorithm in \Cref{sec:algo}. We present some (easy) generalizations of our results in \Cref{sec:gen} and also discuss extensions from computing expectations of polynomials to the expected result multiplicity problem (\Cref{def:the-expected-multipl})\AH{Aren't they the same?}. Finally, we discuss related work in \Cref{sec:related-work} and conclude in \Cref{sec:concl-future-work}.
+\mypar{Paper Organization} We present relevant background and notation in \Cref{sec:background}. We then prove our main hardness results in \Cref{sec:hard} and present our approximation algorithm in \Cref{sec:algo}. We present some (easy) generalizations of our results in \Cref{sec:gen} and also discuss extensions from computing expectations of polynomials to the expected result multiplicity problem (\Cref{def:the-expected-multipl}). Finally, we discuss related work in \Cref{sec:related-work} and conclude in \Cref{sec:concl-future-work}.  All proofs are in the appendix.


 %%% Local Variables:
--- a/macros.tex
+++ b/macros.tex
@ -110,7 +110,8 @@
 % Incomplete DB/PDBs           															      %
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \newcommand{\idb}{\Omega}
-\newcommand{\pd}{\mathcal{P}}%pd for probability distribution
+\newcommand{\pd}{{\mathcal{P}_{\idb}}}%pd for probability distribution
+\newcommand{\pdassign}{\mathcal{P}}
 \newcommand{\pdb}{\mathcal{D}}
 \newcommand{\encodedDB}{\textnormal{\db}}
 \newcommand{\pxdb}{\pdb_{\semNX}}
--- a/ra-to-poly.tex
+++ b/ra-to-poly.tex
@ -5,14 +5,14 @@

 \subsection{Probabilistic Databases}

-While the setting used in this section is primarily that of a bag-\abbrPDB query with set-\abbrPDB inputs, recall, as noted in \cref{sec:intro-rewrite-070921}, this is not limiting.  All proofs are located in the appendix.
+Following typical representation of bags in production databases, for query inputs, we will use \abbrBPDB\xplural with $\{0, 1\}$ input.

 An \textit{incomplete database} $\idb$ is a set of deterministic databases $\db$ called possible worlds.
-Denote the schema of $\db$ as $\sch(\db)$. A \textit{probabilistic database} $\pdb$ is a pair $(\idb, \pd)$ where $\idb$ is an incomplete database and $\pd$ is a probability distribution over $\idb$. Queries over probabilistic databases are evaluated using the so-called possible world semantics. Under the possible world semantics, the result of a query $\query$ over an incomplete database $\idb$ is the set of query answers produced by evaluating $\query$ over each possible world: $\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}$.
+A \textit{probabilistic database} $\pdb$ is a pair $(\idb, \pd)$ where $\idb$ is an incomplete database and $\pd$ is a probability distribution over $\idb$. Queries over probabilistic databases are evaluated using the so-called possible world semantics. Under the possible world semantics, the result of a query $\query$ over an incomplete database $\idb$ is the set of query answers produced by evaluating $\query$ over each possible world: $\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}$.

 For a probabilistic  database $\pdb = (\idb, \pd)$,  the result of a query is the pair $(\query(\idb), \pd')$ where $\pd'$ is a probability distribution over $\query(\idb)$  that assigns to each possible query result the sum of the probabilities of the worlds that produce this answer:
 %
-\[\forall \db \in \query(\idb): \pd'(\db) = \sum_{\db' \in \idb: \query(\db') = \db} \pd(\db') \]
+\[\forall \db' \in \query(\idb): \pd'(\db') = \sum_{\db \in \idb: \query(\db) = \db'} \pd(\db). \]

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %NEEDS to be moved to the appendix.
@ -35,16 +35,16 @@ For a probabilistic  database $\pdb = (\idb, \pd)$,  the result of a query is th
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %END: move to appendix.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+Recall \cref{fig:nxDBSemantics} which depicts the semantics for constructing a lineage polynomial $\apolyqdt$ for any $\raPlus$ query.  We now make a meaningful connection between possible world semantics and world assignments on the lineage polynomial.

 \begin{Proposition}[Expectation of polynomials]\label{prop:expection-of-polynom}
-  Given an $\semN$-\abbrPDB $\pdb = (\idb,\pd)$ and equivalent polynomial $\polyForTuple$ for aribitrary tuple $\tup \in \pdb$,%$\semNX$-\abbrPDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$, 
+Given a \abbrBPDB $\pdb = (\idb,\pd)$ and lineage polynomial $\apolyqdt$ for aribitrary output tuple $\tup$, %$\semNX$-\abbrPDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$, 
 we have:
-  $ \expct_{\randDB \sim \pd}[\query(\randDB)(t)] = \expct_{\randWorld\sim \pd'}\pbox{\poly_{\query, \tup}(\randWorld)}. $
-  \footnote{Although assumed by most prior work on set-probabilistic databases, e.g., as an obvious consequence of~\cite{IL84a}'s Theorem 7.1, we are unaware of any formal proof for bag-probabilistic databases.}
+  $ \expct_{\randDB \sim \pd}[\query(\randDB)(t)] = \expct_{\randWorld\sim \pdassign}\pbox{\apolyqdt(\randWorld)}. $
 \end{Proposition}
-\noindent A formal proof of \Cref{prop:expection-of-polynom} is given in \Cref{subsec:expectation-of-polynom-proof}.
-This proposition shows that computing expected tuple multiplicities is equivalent to computing the expectation of a polynomial (for that tuple) from a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$.
-We focus on this problem from now on, assume an implicit result tuple, and so drop the subscript from $\poly_{\query, \tup}$ (i.e., $\poly$ will denote a polynomial).
+\noindent A formal proof of \Cref{prop:expection-of-polynom} is given in \Cref{subsec:expectation-of-polynom-proof}.\footnote{Although \Cref{prop:expection-of-polynom} follows, e.g., as an obvious consequence of~\cite{IL84a}'s Theorem 7.1, we are unaware of any formal proof for bag-probabilistic databases.}
+%This proposition shows that computing expected tuple multiplicities is equivalent to computing the expectation of a polynomial (for that tuple) from a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$.
+We focus on the problem of computing $\expct_\pdassign\pbox{\apolyqdt\inparen{\randWorld}}$a from now on, assume an implicit result tuple, and so drop the subscript from $\apolyqdt$ (i.e., $\poly$ will denote a polynomial).

 \subsubsection{\tis and \bis}
 \label{subsec:tidbs-and-bidbs}
--- a/two-step-model.tex
+++ b/two-step-model.tex
@ -58,8 +58,8 @@
 	                 Buffalo & $AX$ &\resizebox{!}{10mm}{
 	                       \begin{tikzpicture}[thick]
 	                       		\node[gen_tree_node](sink) at (0.5, 0.8){$\boldsymbol{\circmult}$};
-	                       		\node[gen_tree_node](source1) at (0, 0){$L_a$};
-	                       		\node[gen_tree_node](source2) at (1, 0){$R_a$};
+	                       		\node[gen_tree_node](source1) at (0, 0){$A$};
+	                       		\node[gen_tree_node](source2) at (1, 0){$X$};
 	                       		\draw[->](source1)--(sink);
 	                       		\draw[->] (source2)--(sink);
 					\end{tikzpicture}% & $0.5 \cdot 1.0 + 0.5 \cdot 1.0 = 1.0$   
@ -67,10 +67,10 @@
 	                       Chicago & $B(Y + Z)$\newline \text{Or}\newline $BY+ BZ$&
 	                       \resizebox{!}{16mm} {
 						\begin{tikzpicture}[thick]
-							\node[gen_tree_node] (a1) at (1, 0){$R_b$};
-							\node[gen_tree_node] (b1) at (2, 0){$R_c$};
+							\node[gen_tree_node] (a1) at (1, 0){$Y$};
+							\node[gen_tree_node] (b1) at (2, 0){$Z$};
 							%level 1
-							\node[gen_tree_node] (a2) at (0.75, 0.8){$L_b$};
+							\node[gen_tree_node] (a2) at (0.75, 0.8){$B$};
 							\node[gen_tree_node] (b2) at (1.5, 0.8){$\boldsymbol{\circplus}$};
 							%level 0
 							\node[gen_tree_node] (a3) at (1.1, 1.6){$\boldsymbol{\circmult}$};
@ -86,9 +86,9 @@
 					%%%%%%%%%%%
 	                       \resizebox{!}{16mm} {
 					\begin{tikzpicture}[thick]
-						\node[gen_tree_node] (a2) at (0, 0){$R_b$};
-						\node[gen_tree_node] (b2) at (1, 0){$L_b$};
-						\node[gen_tree_node] (c2) at (2, 0){$R_c$};
+						\node[gen_tree_node] (a2) at (0, 0){$Y$};
+						\node[gen_tree_node] (b2) at (1, 0){$B$};
+						\node[gen_tree_node] (c2) at (2, 0){$Z$};
 						%level 1
 						\node[gen_tree_node] (a1) at (0.5, 0.8){$\boldsymbol{\circmult}$};
 						\node[gen_tree_node] (b1) at (1.5, 0.8){$\boldsymbol{\circmult}$};