More changes to Intro changing def 1.4 and example.

master
Aaron Huber 2022-02-02 10:42:44 -05:00
parent bc1066e7dd
commit f08e94482d
2 changed files with 21 additions and 20 deletions

View File

@ -145,7 +145,7 @@ Finally, note that there are exactly three cases where the expectation of a mono
\begin{proof}
Let $\poly$ be a polynomial of $\numvar$ variables with highest degree $= B$, defined as follows: %, in which every possible monomial permutation appears,
\[\poly(X_1,\ldots, X_\numvar) = \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar X_i^{d_i}.\]
Note that replacing the variables $X_1,\ldots, X_\numvar$ with $\inset{X_{i, j}~|~ i\in [\numvar], j\in[\bound]}$ and converting to \abbrSMB produces a polynomial that satisfies the above definition. Let the boolean function $\isInd{\cdot}$ take $\vct{d}$ as input and return true if there does not exist any dependent variables in $\vct{d}$, i.e., $\not\exists ~\block, i\neq j\suchthat d_{\block, i}, d_{\block, j} \geq 1$.\footnote{This \abbrBIDB notation is used and discussed in \cref{subsec:tidbs-and-bidbs}}.
Note that replacing the variables $X_1,\ldots, X_{\abs{\tupset}}$ with $\inset{j\cdot X_{\tup, j}~|~ \tup\in \tupset, j\in[\bound]}$ (i.e. replacing a variable with a polynomial) and converting to \abbrSMB produces a polynomial that satisfies the above definition (with $\numvar = j\cdot\abs{\tupset}$). Let the boolean function $\isInd{\cdot}$ take $\vct{d}$ as input and return true if there does not exist any dependent variables in $\vct{d}$, i.e., $\not\exists ~\block, i\neq j\suchthat d_{\block, i}, d_{\block, j} \geq 1$.\footnote{This \abbrBIDB notation is used and discussed in \cref{subsec:tidbs-and-bidbs}}.
Then in expectation we have
\begin{align}
\expct_{\vct{\randWorld}}\pbox{\poly(\vct{\randWorld})} &= \expct_{\vct{\randWorld}}\pbox{\sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i} + \sum_{\substack{\vct{d} \in \{0,\ldots, B\}^\numvar\\\wedge ~\neg\isInd{\vct{d}}}} c_{\vct{d}}\cdot\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar\randWorld_i^{d_i}}\label{p1-s1a}\\

View File

@ -223,19 +223,20 @@ The lineage polynomial for $Q_1^2$ is given by $\poly^2\inparen{A, B, C, E, X, Y
$$
=A^2X^2B^2 + B^2Y^2E^2 + B^2Z^2C^2 + 2AXB^2YE + 2AXB^2ZC + 2B^2YEZC.
$$
By linearity of expectation we can push the expectation through each summand. Let us then focus on the summand $A^2X^2B^2$. Let $\randWorld_A$ be the random variable corresponding to a lineage variable $A$. Because the distinct variables in the product are independent, we can push expectation through them yielding $\expct\pbox{\randWorld_A^2\randWorld_X^2\randWorld_B^2}=\expct\pbox{\randWorld_A^2}\expct\pbox{\randWorld_X^2}\expct\pbox{\randWorld_B^2}$. Since $\randWorld_A, \randWorld_B\in \inset{0, 1}$ we can further derive $\expct\pbox{\randWorld_A}\expct\pbox{\randWorld_X^2}\expct\pbox{\randWorld_B}$ by the fact that for any $W\in \inset{0, 1}$, $W^2 = W$. However, we get stuck with $\expct\pbox{\randWorld_X^2}$, since $\randWorld_X\in\inset{0, 1, 2}$ and for $\randWorld_X \gets 2$, $\randWorld_X^2 \neq \randWorld_X$.
To compute $\expct\pbox{\poly^2}$ we can use linearity of expectation and push the expectation through each summand. To keep things simple, let us focus on the summand $A^2X^2B^2$ as the procedure is the same for all other summands of $\poly^2$. Let $\randWorld_X$ be the random variable corresponding to a lineage variable $X$. Because the distinct variables in the product are independent, we can push expectation through them yielding $\expct\pbox{\randWorld_A^2\randWorld_X^2\randWorld_B^2}=\expct\pbox{\randWorld_A^2}\expct\pbox{\randWorld_X^2}\expct\pbox{\randWorld_B^2}$. Since $\randWorld_A, \randWorld_B\in \inset{0, 1}$ we can further derive $\expct\pbox{\randWorld_A}\expct\pbox{\randWorld_X^2}\expct\pbox{\randWorld_B}$ by the fact that for any $W\in \inset{0, 1}$, $W^2 = W$. However, we get stuck with $\expct\pbox{\randWorld_X^2}$, since $\randWorld_X\in\inset{0, 1, 2}$ and for $\randWorld_X \gets 2$, $\randWorld_X^2 \neq \randWorld_X$.
%the expectation is $\expct\pbox{A^2X^2B^2} = A\cdot\prob_A\cdot\inparen{\sum\limits_{i \in [2]}X_i\cdot \prob_{X, i}}\cdot B\prob_B$ for $X \in \inset{0, 1, 2}$.
An equivalent representation of the expectation exists when we think of having a separate variable for each multiplicity value $m>0$, such that the original `base' variable is equal to the sum of the multiplicity variables. For this example, the set of variables could be $\inset{A, X_1, X_2, B}$, where $X$ now equals $X_1 + X_2$ and each variable takes values from the set $\inset{0, 1}$. In this setting we can then derive
\begin{footnotesize}
\begin{align*}
&\expct\pbox{\randWorld_A^2\randWorld_X^2\randWorld_B^2} = \expct\pbox{\randWorld_A^2}\expct\pbox{\inparen{\randWorld_{X_1} + \randWorld_{X_2}}^2}\expct\pbox{\randWorld_B^2} = \expct\pbox{\randWorld_A}\expct\pbox{\randWorld_{X_1}^2 + 2\randWorld_{X_1}\randWorld_{X_2} + \randWorld_{X_2}^2}\expct\pbox{\randWorld_B} =\\
&\expct\pbox{\randWorld_A}\inparen{\expct\pbox{\randWorld_{X_1}^2}+\expct\pbox{2\randWorld_{X_1}\randWorld_{X_2}}+\expct\pbox{\randWorld_{X_2}^2}}\expct\pbox{\randWorld_B} = \expct\pbox{\randWorld_A}\inparen{\expct\pbox{\randWorld_{X_1}} + \expct\pbox{2\randWorld_{X_1}\randWorld_{X_2}} + \expct\pbox{\randWorld_{X_2}}}\expct\pbox{\randWorld_B} = \\
&\expct\pbox{\randWorld_A}\inparen{\sum\limits_{i \in \pbox{\bound}}\expct\pbox{\randWorld_{X_i}}}\expct\pbox{\randWorld_B}.
\end{align*}
\end{footnotesize}
We can drop the term $\expct\pbox{2\randWorld_{X_1}\randWorld_{X_2}}$ since by definition a tuple can only have one multiplicity value in a possible world, thus always making $\randWorld_{X_1}\cdot \randWorld_{X_2} = 0$. Another subtlety to note is that for any $i\in \pbox{\bound}$, $\expct\pbox{\randWorld_{X_i}} = i\cdot\prob_{X, i}$.
An equivalent representation of $\poly^2$ can be derived by thinking of having a separate product $j\cdot X_j$ for each multiplicity value $j\in\pbox{\bound}$ such that the original `base' variable $X$ is equal to the sum of these products. For this example, the set of variables could be $\inset{A_1,\ldots,A_4, X_1,\ldots,X_4, B_1,\ldots,B_4}$, where e.g. $X$ now equals $\sum_{j\in\pbox{4}}j\cdot X_j$ and each variable takes values from the set $\inset{0, 1}$. Our reformulated polynomial $\poly_R^2 = \inparen{\sum_{j_1\in\pbox{\bound}}j_1A_{j_1}}^2$ $\inparen{\sum_{j_2\in\pbox{\bound}}j_2X_{j_2}}^2$ $\inparen{\sum_{j_3\in\pbox{\bound}}j_3B_{j_3}}^2$. Since tuple multiplicities by nature are disjoint we can drop all cross terms and have $\poly_R^2 = \sum_{j_1, j_2, j_3 \in \pbox{\bound}}j_1^2A^2_{j_1}j_2^2X_{j_2}^2j_3^2B^2_{j_3}$. With the reframed polynomial, the expectation is $\expct\pbox{\poly^2}=\sum_{j_1,j_2,j_3\in\pbox{\bound}}j_1^2j_2^2j_3^2\expct\pbox{A_{j_1}}\expct\pbox{X_{j_2}}\expct\pbox{X_{j_3}}$, since we now have that all $\randWorld_{X_j}\in\inset{0, 1}$.
% \begin{footnotesize}
% \begin{align*}
% &\expct\pbox{\randWorld_A^2\randWorld_X^2\randWorld_B^2} = \expct\pbox{\randWorld_A^2}\expct\pbox{\inparen{\randWorld_{X_1} + \randWorld_{X_2}}^2}\expct\pbox{\randWorld_B^2} = \expct\pbox{\randWorld_A}\expct\pbox{\randWorld_{X_1}^2 + 2\randWorld_{X_1}\randWorld_{X_2} + \randWorld_{X_2}^2}\expct\pbox{\randWorld_B} =\\
% &\expct\pbox{\randWorld_A}\inparen{\expct\pbox{\randWorld_{X_1}^2}+\expct\pbox{2\randWorld_{X_1}\randWorld_{X_2}}+\expct\pbox{\randWorld_{X_2}^2}}\expct\pbox{\randWorld_B} = \expct\pbox{\randWorld_A}\inparen{\expct\pbox{\randWorld_{X_1}} + \expct\pbox{2\randWorld_{X_1}\randWorld_{X_2}} + \expct\pbox{\randWorld_{X_2}}}\expct\pbox{\randWorld_B} = \\
% &\expct\pbox{\randWorld_A}\inparen{\sum\limits_{j \in \pbox{\bound}}\expct\pbox{j\cdot\randWorld_{X_j}}}\expct\pbox{\randWorld_B}.
% \end{align*}
% \end{footnotesize}
%We can drop the term $\expct\pbox{2\randWorld_{X_1}\randWorld_{X_2}}$ since by definition a tuple can only have one multiplicity value in a possible world, thus always making $\randWorld_{X_1}\cdot \randWorld_{X_2} = 0$.
%Another subtlety to note is that for any $i\in \pbox{\bound}$, $\expct\pbox{\randWorld_{X_i}} = i\cdot\prob_{X, i}$.
This reformulation of the problem leads us to consider a structure related to the lineage polynomial.
%By exploiting linearity of expectation, further pushing expectation through independent variables and observing that for any $\randWorld\in\{0, 1\}$, we have $\randWorld^2=\randWorld$, the expectation is
@ -260,29 +261,29 @@ This reformulation of the problem leads us to consider a structure related to th
%\end{footnotesize}
%\noindent This property leads us to consider a structure related to the lineage polynomial.
\begin{Definition}\label{def:reduced-poly}
For any polynomial $\poly(\vct{X})$ define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by i) replacing all monomials with a sum of random variables $\sum_{i \in \pbox{c}}X_i$ such that each variable has a domain of $\inset{0, 1}$, ii) produce the intermediate polynomial formed by step i, iii) convert the intermediate polynomial into the standard monomial basis (\abbrSMB)
For any polynomial $\poly(\vct{X})$ define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by i) replacing all $X_\tup \in \vct{X}$ for $\tup \in \tupset$ with $\sum_{j\in\pbox{\bound}}j\cdot X_{\tup, j}$, i.e. $\rpoly\inparen{\vct{X}}$ has variables $X_{\tup, j}$ for $j \in \pbox{\bound}$ such that $X_{\tup, j} \in \inset{0, 1}$, ii) convert the reformulated polynomial formed into the standard monomial basis (\abbrSMB)
\footnote{
This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
}
and iv) setting all exponents $e > 1$ to $1$.
while setting all \emph{variable} exponents $e > 1$ to $1$.
\end{Definition}
With $\poly^2\inparen{A, B, C, E, X_1, X_2, Y, Z}$ as an example, we have:
Continuing with the example $\poly^2\inparen{A, B, C, E, X_1, X_2, Y, Z}$, to save clutter we i) do not show the full expansion for variables with greatest multiplicity $= 1$ since e.g. for variable $A$, the sum of products itself evaluates to $1^2\cdot A^2 = A$, and ii) for $\sum_{j\in\pbox{\bound}}j^2\cdot X_j$, we omit the summands encoding multiplicities $> 2$, since the greatest multiplicity of the tuple annotated with $X$ is $2$, likewise those summands will always evaluated to $0$ since the tuple will never have a multiplicity of $>2$.
\begin{multline*}
\rpoly^2(A, B, C, E, X_1, X_2, Y, Z) = \\
A\inparen{\sum\limits_{i\in\pbox{\bound}}X_i}B + BYE + BZC + 2A\inparen{\sum\limits_{i\in\pbox{\bound}}X_i}BYE + 2A\inparen{\sum\limits_{i\in\pbox{\bound}}X_i}BZC + 2BYEZC =\\
ABX_1 + ABX_2 + BYE + BZC + 2AX_1BYE + 2AX_2BYE + 2AX_1BZC + 2AX_2BZC + 2BYEZC.
A\inparen{\sum\limits_{j\in\pbox{\bound}}j^2X_j}B + BYE + BZC + 2A\inparen{\sum\limits_{j\in\pbox{\bound}}j^2X_j}BYE + 2A\inparen{\sum\limits_{j\in\pbox{\bound}}j^2X_j}BZC + 2BYEZC =\\
ABX_1 + AB\inparen{2}^2X_2 + BYE + BZC + 2AX_1BYE + 2A\inparen{2}^2X_2BYE + 2AX_1BZC + 2A\inparen{2}^2X_2BZC + 2BYEZC.
%&\; = AXB + BYD + BZC + 2AXBYD + 2AXBZC + 2BYDZC
\end{multline*}
Note that we have argued that for our specific example the expectation that we want is $\widetilde{\poly^2}(\probOf\inparen{A=1},$ $\probOf\inparen{B=1}, \probOf\inparen{C=1}), \probOf\inparen{E=1}, \probOf\inparen{X_1=1}, \probOf\inparen{X_2=1}, \probOf\inparen{Y=1}, \probOf\inparen{Z=1})$.
%It can be verified that the reduced polynomial parameterized with each variable's respective marginal probability is a closed form of the expected count (i.e., $\expct\limits_{\vct{\randWorld}\sim\pd}\pbox{\Phi^2\inparen{\vct{X}}} = \widetilde{\Phi^2}(\probOf\pbox{A=1},$ $\probOf\pbox{B=1}, \probOf\pbox{C=1}), \probOf\pbox{D=1}, \probOf\pbox{X=1}, \probOf\pbox{Y=1}, \probOf\pbox{Z=1})$).
\Cref{lem:tidb-reduce-poly} generalizes the equivalence to {\em all} $\raPlus$ queries on \abbrCTIDB\xplural (proof in \Cref{subsec:proof-exp-poly-rpoly}).
\begin{Lemma}\label{lem:tidb-reduce-poly}
For any \abbrCTIDB $\pdb$, $\raPlus$ query.$\query$, and lineage polynomial
For any \abbrCTIDB $\pdb$, $\raPlus$ query $\query$, and lineage polynomial
%\BG{Term has not been introduced yet.}
%Atri: fixed
$\poly\inparen{\vct{X}}=\apolyqdt(\vct{X})$, it holds that $
\expct_{\vct{W} \sim \pdassign}\pbox{\poly\inparen{\vct{W}}} = \rpoly\inparen{\probAllTup}.
$
$\poly\inparen{\vct{X}}=\poly\pbox{\query,\tupset,\tup}\inparen{\vct{X}}$, it holds that $
\expct_{\vct{W} \sim \pdassign}\pbox{\poly\inparen{\vct{W}}} = \rpoly\inparen{\probAllTup}
$, where $\probAllTup = \inparen{\prob_{1, 1},\ldots,\prob_{\abs{\tupset}, \bound}}$ is defined by $\bpd$.
\end{Lemma}
\AH{Here is what I stopped.}