Moved S2 proofs into Appendix

master
Aaron Huber 2020-12-17 17:08:48 -05:00
parent aa8bac2075
commit e9d13722af
3 changed files with 62 additions and 50 deletions

View File

@ -1,4 +1,50 @@
\section{Missing details from Section~\ref{sec:background}}
\section{Missing details from Section~\ref{sec:background}}\label{sec:proofs-background}
\AH{I made small changes to the proof, noteably the summation, the variable definition and the world subscript, the latter of which I am not sure if it is the best notation or not.}
\subsection{Proof of~\Cref{prop:semnx-pdbs-are-a-}}
To prove that $\semNX$-PDBs are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an $\semNX$-PDB $\pxdb = (\db, \pd')$ such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}$ and let $max(D_i)$ denote $max_{\tup} D_i(\tup)$. For each world $D_i$ we create a corresponding variable $X_i$.
%variables $X_{i1}$, \ldots, $X_{im}$ where $m = max(D_i)$.
In $\db$ we assign each tuple $\tup$ the polynomial:
%
\[
\db(\tup) = \sum_{i=1}^{\abs{\idb}} D_i(\tup)\cdot X_{i}
\]
The probability distribution $\pd'$ assigns all world vectors zero probability except for $\abs{\idb}$ world vectors (representing the possible worlds) $\vct{w_i}$. All elements of $\vct{w_i}$ are zero except for the positions corresponding to variables $X_{ij}$ for $j \in \{1, \ldots \}$ which are set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, $\semNX$ are a complete representation system.
The closure under $\raPlus$ queries follows from the fact that an assignment $\vct{X} \to \{0,1\}$ is a semiring homomorphism and that semiring homomorphisms commute with queries over $\semK$-relations.
\subsection{Proof of~\Cref{prop:expection-of-polynom}}
\BG{TODO}
\subsection{Proof for Proposition ~\ref{proposition:q-qtilde}}
Note that any $\poly$ in factorized form is equivalent to its \abbrSMB expansion. For each term in the expanded form, further note that for all $b \in \{0, 1\}$ and all $e \geq 1$, $b^e = b$. \qed
\subsection{Proof for Lemma ~\ref{lem:exp-poly-rpoly}}
Let $\poly$ be the generalized polynomial, i.e., the polynomial of $\numvar$ variables with highest degree $= B$: %, in which every possible monomial permutation appears,
\[\poly(X_1,\ldots, X_\numvar) = \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar X_i^{d_i}\].
Then, assigning $\vct{w}$ to $\vct{X}$, for expectation we have
\begin{align}
\expct_{\vct{w}}\pbox{\poly(\vct{w})} &= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \expct_{\vct{w}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar w_i^{d_i}}\label{p1-s1}\\
&= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{w}}\pbox{w_i^{d_i}}\label{p1-s2}\\
&= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{w}}\pbox{w_i}\label{p1-s3}\\
&= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \prob_i\label{p1-s4}\\
&= \rpoly(\prob_1,\ldots, \prob_\numvar)\label{p1-s5}
\end{align}
In steps \cref{p1-s1} and \cref{p1-s2}, by linearity of expectation (recall the variables are independent), the expecation can be pushed all the way inside of the product. In \cref{p1-s3}, note that $w_i \in \{0, 1\}$ which further implies that for any exponent $e \geq 1$, $w_i^e = w_i$. Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.
Finally, observe \Cref{p1-s5} by construction in \Cref{lem:pre-poly-rpoly}, that $\rpoly(\prob_1,\ldots, \prob_\numvar)$ is exactly the product of probabilities of each variable in each monomial across the entire sum.
\subsection{Proof For Corollary ~\ref{cor:expct-sop}}
Note that \cref{lem:exp-poly-rpoly} shows that $\expct\pbox{\poly} =$ $\rpoly(\prob_1,\ldots, \prob_\numvar)$. Therefore, if $\poly$ is already in \abbrSMB form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $O(\smbOf{|\poly|})$ computations.\qed
\section{Missing details from Section~\ref{sec:hard}}
\label{app:hard}

View File

@ -109,6 +109,7 @@ The usefulness of this will reduction become clear in \Cref{lem:exp-poly-rpoly}.
\begin{Lemma}\label{lem:pre-poly-rpoly}
When $\poly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}} \cdot \prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i^{d_i}$, we have then that $\rpoly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} \in \{0,\ldots, B\}^\numvar} q_{\vct{d}}\cdot\prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i$.
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}
Follows by the construction of $\rpoly$ in \cref{def:reduced-bi-poly}. \qed
@ -123,10 +124,10 @@ Note the following fact:
\poly(\vct{w}) = \rpoly(\vct{w}).
\]
\end{Proposition}
The proof is in~\Cref{sec:proofs-background}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof for Proposition ~\ref{proposition:q-qtilde}]
Note that any $\poly$ in factorized form is equivalent to its \abbrSMB expansion. For each term in the expanded form, further note that for all $b \in \{0, 1\}$ and all $e \geq 1$, $b^e = b$. \qed
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -146,38 +147,12 @@ Note that in the preceding lemma, we have assigned $\vct{p}$
%(introduced in \Cref{subsec:def-data})
to the variables $\vct{X}$. Intuitively, \Cref{lem:exp-poly-rpoly} states that when we replace each variable $X_i$ with its probability $\prob_i$ in the reduced form of a \bi-lineage polynomial and evaluate the resulting expression in $\mathbb{R}$, then the result is the expectation of the polynomial.
The proof for~\Cref{lem:exp-poly-rpoly} can be seen in~\Cref{sec:proofs-background}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof for Lemma ~\ref{lem:exp-poly-rpoly}]
%Using the fact above, we need to compute \[\sum_{(\wbit_1,\ldots, \wbit_\numvar) \in \{0, 1\}}\rpoly(\wbit_1,\ldots, \wbit_\numvar)\]. We therefore argue that
%\[\sum_{(\wbit_1,\ldots, \wbit_\numvar) \in \{0, 1\}}\rpoly(\wbit_1,\ldots, \wbit_\numvar) = 2^\numvar \cdot \rpoly(\frac{1}{2},\ldots, \frac{1}{2}).\]
Let $\poly$ be the generalized polynomial, i.e., the polynomial of $\numvar$ variables with highest degree $= B$: %, in which every possible monomial permutation appears,
\[\poly(X_1,\ldots, X_\numvar) = \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar X_i^{d_i}\].
Then, assigning $\vct{w}$ to $\vct{X}$, for expectation we have
\begin{align}
\expct_{\vct{w}}\pbox{\poly(\vct{w})} &= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \expct_{\vct{w}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar w_i^{d_i}}\label{p1-s1}\\
&= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{w}}\pbox{w_i^{d_i}}\label{p1-s2}\\
&= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{w}}\pbox{w_i}\label{p1-s3}\\
&= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \prob_i\label{p1-s4}\\
&= \rpoly(\prob_1,\ldots, \prob_\numvar)\label{p1-s5}
\end{align}
In steps \cref{p1-s1} and \cref{p1-s2}, by linearity of expectation (recall the variables are independent), the expecation can be pushed all the way inside of the product. In \cref{p1-s3}, note that $w_i \in \{0, 1\}$ which further implies that for any exponent $e \geq 1$, $w_i^e = w_i$. Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.
%\OK{
% You don't need to tie this to TI-DBs if you define the variables ($X_i$) to be independent.
% Annotations
% Boolean expressions over uncorrelated boolean variables are sufficient to model TI-, BI-, and
% PC-Tables. This should still hold for arithmetic over the naturals.
%}
Finally, observe \Cref{p1-s5} by construction in \Cref{lem:pre-poly-rpoly}, that $\rpoly(\prob_1,\ldots, \prob_\numvar)$ is exactly the product of probabilities of each variable in each monomial across the entire sum.
\qed
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -185,10 +160,10 @@ Finally, observe \Cref{p1-s5} by construction in \Cref{lem:pre-poly-rpoly}, that
If $\poly$ is a \bi-lineage polynomial, then the expectation of $\poly$, i.e., $\expct\pbox{\poly} = \rpoly\left(\prob_1,\ldots, \prob_\numvar\right)$ can be computed in $O(|\smbOf{\poly}|)$, where $|\poly|$ denotes the total number of multiplication/addition operators in $\poly$.
\end{Corollary}
\AH{What if $\poly$ is not in \abbrSMB form?}
We defer the proof for~\Cref{cor:expct-sop} in~\Cref{sec:proofs-background}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof For Corollary ~\ref{cor:expct-sop}]
Note that \cref{lem:exp-poly-rpoly} shows that $\expct\pbox{\poly} =$ $\rpoly(\prob_1,\ldots, \prob_\numvar)$. Therefore, if $\poly$ is already in \abbrSMB form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $O(\smbOf{|\poly|})$ computations.\qed
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Local Variables:

View File

@ -90,19 +90,10 @@ Importantly, as the following proposition shows, any finite $\semN$-PDB can be e
\begin{Proposition}\label{prop:semnx-pdbs-are-a-}
$\semNX$-PDBs are a complete representation system for $\semN$-PDBs that is closed under $\raPlus$ queries.
\end{Proposition}
The proof can be seen in~\Cref{sec:proofs-background}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\AH{I made small changes to the proof, noteably the summation, the variable definition and the world subscript, the latter of which I am not sure if it is the best notation or not.}
\begin{proof}
To prove that $\semNX$-PDBs are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an $\semNX$-PDB $\pxdb = (\db, \pd')$ such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}$ and let $max(D_i)$ denote $max_{\tup} D_i(\tup)$. For each world $D_i$ we create a corresponding variable $X_i$.
%variables $X_{i1}$, \ldots, $X_{im}$ where $m = max(D_i)$.
In $\db$ we assign each tuple $\tup$ the polynomial:
%
\[
\db(\tup) = \sum_{i=1}^{\abs{\idb}} D_i(\tup)\cdot X_{i}
\]
The probability distribution $\pd'$ assigns all world vectors zero probability except for $\abs{\idb}$ world vectors (representing the possible worlds) $\vct{w_i}$. All elements of $\vct{w_i}$ are zero except for the positions corresponding to variables $X_{ij}$ for $j \in \{1, \ldots \}$ which are set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, $\semNX$ are a complete representation system.
The closure under $\raPlus$ queries follows from the fact that an assignment $\vct{X} \to \{0,1\}$ is a semiring homomorphism and that semiring homomorphisms commute with queries over $\semK$-relations.
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Now let us consider computing the expected multiplicity of a tuple $\tup$ in the result of a query $\query$ over an $\semN$-PDB $\pdb$ using the annotation of $\tup$ in the result of evaluating $\query$ over an $\semNX$-PDB $\pxdb$ for which $\rmod(\pxdb) = \pdb$. The expectation of the polynomial $\poly = \query(\pxdb)(\tup)$ based on the probability distribution of $\pxdb$ over the variables in $\pxdb$ is:
@ -118,10 +109,10 @@ Since $\semNX$-PDBs $\pxdb$ are a complete representation system for $\semN$-PDB
Given an $\semN$-PDB $\pdb = (\idb,\pd)$ and $\semNX$-PDB $\pxdb = (\db,\pd')$ such that $\rmod(\pxdb) = \pdb$, we have:
\[ \expct_{\idb \sim \pd}[\query(\db)(t)] = \expct_{\vct{X} \sim \pd'}\pbox{\poly(\vct{X})} \]
\end{Proposition}
The proof can be seen in~\Cref{sec:proofs-background}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}
\BG{TODO}
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Two important subclasses of $\semNX$-PDBs that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob(\tup)$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is $1$. The probability of such a world is the product of the probabilities of all tuples present in the world. %and one minus the sum of the probabilities of all tuples from blocks for which no tuple is present in the world.