2020-03-16 13:50:22 -04:00
% -*- root: main.tex -*-
\pagebreak
\section { POS Queries}
2020-04-20 12:37:19 -04:00
The following lemma is used in subsequent proofs for bounding various queries.
2020-03-16 13:50:22 -04:00
\begin { Lemma} \label { lem:exp-sine}
$ \forall \wElem \in \wSet $ ,\newline
$ \ex { \sine ( \wElem ) ^ i } = \begin { cases }
2020-04-02 18:22:34 -04:00
0 & 1 \leq i < \prodsize \\
2020-03-16 13:50:22 -04:00
1 & \text { otherwise} .
\end { cases} $
\end { Lemma}
2020-04-20 12:37:19 -04:00
\begin { proof}
2020-04-02 18:22:34 -04:00
Notice that, $ \forall i \in [ 1 , \prodsize - 1 ] $ , $ \ex { \sine ( \wElem ) ^ i } = \frac { \sum \limits _ { \omega \in \Omega } \omega ^ i } { \prodsize } = \frac { \sum \limits _ { l = 0 } ^ { \prodsize - 1 } ( \omega ^ i ) ^ l } { \prodsize } $ . To prove the lemma then, one needs only to prove that $ \sum \limits _ { l = 0 } ^ { \prodsize - 1 } \omega ^ i = \begin { cases } 0 & 1 \leq i < \prodsize \\ \prodsize & \text { otherwise } . \end { cases } $
For the case of $ i = \prodsize $ ,
2020-03-16 13:50:22 -04:00
\begin { equation}
2020-04-02 18:22:34 -04:00
\frac { \sum \limits _ { l = 0} ^ { \prodsize - 1} (\omega ^ \prodsize )^ l} { \prodsize } = \frac { \sum \limits _ { l = 0} ^ { \prodsize - 1} 1^ l} { \prodsize } = \frac { \prodsize } { \prodsize } = 1.
2020-03-16 13:50:22 -04:00
\end { equation}
2020-04-02 18:22:34 -04:00
For $ i \in [ 1 , \prodsize - 1 ] $ , we can show by geometric sum series that
2020-03-16 13:50:22 -04:00
\begin { equation}
2020-04-02 18:22:34 -04:00
\sum _ { l = 0} ^ { \prodsize - 1} (\omega ^ i)^ l = \frac { (\omega ^ i)^ \prodsize - 1} { \omega ^ i - 1} = \frac { 1 - 1} { \omega ^ i - 1} = 0.
2020-03-16 13:50:22 -04:00
\end { equation}
\qed
2020-04-20 12:37:19 -04:00
\end { proof}
2020-03-16 13:50:22 -04:00
2020-04-02 18:22:34 -04:00
We target the specific query where it is optimal to push down projections below join operators. Such a query is a product of sums ($ \pos $ ). To show that our scheme works in this setting, we first compute the expectation of a $ \pos $ ~ query over sketch annotations, i.e. $ \pos $ = $ \sum _ { \buck = 1 } ^ { \sketchCols } \left ( \sum _ { i \in \kvec ' } \sk ^ { \vect _ i } \left [ \buck \right ] \right ) \left ( \sum _ { i' \in \kvec '' } \sk ^ { \vect _ { i' } } \left [ \buck \right ] \right ) $ , for the set of matching projected tuples from each input, denoted $ \prodsize ', \prodsize '' $ . Note that we denote the $ i ^ { th } $ vector as $ \vect _ i $ and the sketch of the $ i ^ { th } $ vector $ \sk ^ { \vect _ i } $ .
2020-03-16 13:50:22 -04:00
\begin { align}
& \ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { i \in \kvec '} \sk ^ { \vect _ i} \left [\buck\right] \right ) \left (\sum _ { i' \in \kvec ''} \sk ^ { \vect _ { i'} } \left [\buck\right] \right )} \nonumber \\
2020-03-17 13:55:44 -04:00
=& \ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { i \in \kvec '} \sum _ { \wElem \in \wSet } \vect _ i(\wElem )\ind { \hfunc (\wElem ) = \buck } \sine (\wElem )\right ) \left (\sum _ { i' \in \kvec ''} \sum _ { \wElem ' \in \wSet } \vect _ { i'} (\wElem ')\ind { \hfunc (\wElem ) = \buck } \sine (\wElem ')\right )} \label { eq:exp-pos1} \\
=& \ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { \wElem \in \wSet } \ind { \hfunc (\wElem ) = \buck } \left (\sum _ { i \in \kvec '} \vect _ i(\wElem )\right )\sine (\wElem )\right ) \left (\sum _ { \wElem ' \in \wSet } \ind { \hfunc (\wElem ') = j} \left (\sum _ { i' \in \kvec ''} \vect _ { i'} (\wElem ')\right )\sine (\wElem ')\right )} \label { eq:exp-pos2} \\
2020-04-02 18:22:34 -04:00
=& \ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { \wElem \in \wSet } \ind { \hfunc (\wElem ) = \buck } \left (\sum _ { i \in \prodsize '} \vect _ i(\wElem )\right )\left (\sum _ { i' \in \prodsize ''} \vect _ { i'} (\wElem )\right )\sine (\wElem )^ { 2 = \prodsize } \right ) + \left (\sum _ { \substack { \wElem , \wElem ' \in \wSet ,\\ \wElem \neq \wElem '} } \ind { \hfunc (\wElem ) = j} \ind { \hfunc (\wElem ') = j} \left (\left (\sum _ { i \in \prodsize '} \vect _ i(\wElem )\right )\sine (\wElem )\right )\left (\sum _ { i' \in \prodsize ''} \vect _ { i'} (\wElem ')\right )\sine (\wElem ')\right )} \label { eq:exp-pos3} \\
=& \sum _ { \buck = 1} ^ { \sketchCols } \sum _ { \wElem \in \wSet } \ind { \hfunc (\wElem ) = \buck } \left (\sum _ { i \in \prodsize '} \vect _ i(\wElem )\right )\left (\sum _ { i' \in \prodsize ''} \vect _ { i'} (\wElem )\right )\label { eq:exp-pos4} \\
=& \sum _ { \wElem \in \wSet } \left (\sum _ { i \in \prodsize '} \vect _ i(\wElem )\right )\left (\sum _ { i' \in \prodsize ''} \vect _ { i'} (\wElem )\right )\label { eq:exp-pos5}
2020-03-16 13:50:22 -04:00
\end { align}
\qed \newline
2020-04-02 18:22:34 -04:00
Equation \eqref { eq:exp-pos1} follows from expanding the definitions of $ \sk ^ { v _ i } $ . Equation \eqref { eq:exp-pos2} follows from the associative property of addition and the distributive property of addition over multiplication. Equation \eqref { eq:exp-pos3} also uses the associative and distributive properties to rearrange the $ \pos $ . Equation \eqref { eq:exp-pos4} results from Lemma \ref { lem:exp-sine} , where it can be seen that $ \ex { \sine ( \wElem ) \sine ( \wElem ' ) } = 0 $ , thus eliminating the right hand term. The left hand operand stays, since by Lemma \ref { lem:exp-sine} we know that $ \ex { \sine ( \wElem ) ^ \prodsize } = 1 $ . Finally, equation \eqref { eq:exp-pos4} follows from the construction of $ \sk $ .
2020-03-16 13:50:22 -04:00
We now move to computing the variance of a $ \pos $ ~ query. Note, that the use of complex numbers requires the variance formula $ \var = \ex { \pos \cdot \conj { \pos } } - \ex { \pos } \ex { \conj { \pos } } $ .
To make this easier to present and digest, we start by turning our focus on the first term, $ T _ 1 = \ex { \pos \cdot \conj { \pos } } $ .
\begin { align}
2020-04-02 18:22:34 -04:00
& \ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { i_ 1 \in \prodsize '} \sk ^ { \vect _ { i_ 1} } [\buck ]\right )\left (\sum _ { i_ 1' \in \prodsize ''} \sk ^ { \vect _ { i_ 1'} } [\buck ]\right ) \cdot
\conj { \sum _ { \buck ' = 1} ^ { \sketchCols } \left (\sum _ { i_ 2 \in \prodsize '} \sk ^ { \vect _ { i_ 2} } [\buck ]\right )\left (\sum _ { i_ 2' \in \prodsize ''} \sk ^ { \vect _ { i_ 2'} } [\buck ]\right )} } \\
& =\ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { i_ 1 \in \prodsize '} \sum _ { \wElem _ 1 \in \wSet } \ind { \hfunc (\wElem _ 1) = \buck } \vect _ { i_ 1} (\wElem _ 1)\sine (\wElem _ 1)\sum _ { i_ 1' \in \prodsize ''} \sum _ { \wElem _ 1' \in \wSet } \ind { \hfunc (\wElem _ 1') = \buck } \vect _ { i_ 1'} (\wElem _ 1')\sine (\wElem _ 1')\right )
\conj { \sum _ { \buck ' = 1} ^ { \sketchCols } \left (\sum _ { i_ 2 \in \prodsize '} \sum _ { \wElem _ 2 \in \wSet } \ind { \hfunc (\wElem _ 2) = \buck '} \vect _ { i_ 2} (\wElem _ 2)\conj { \sine (\wElem _ 2)} \sum _ { i_ 2' \in \prodsize ''} \sum _ { \wElem _ 2' \in \wSet } \ind { \hfunc (\wElem _ 2') = \buck '} \vect _ { i_ 2'} (\wElem _ 2')\conj { \sine (\wElem _ 2')} \right )} } \label { eq:var-pos1} \\
& =\ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { i_ 1 \in \prodsize '} \sum _ { \wElem _ 1 \in \wSet } \ind { \hfunc (\wElem _ 1) = \buck } \vect _ { i_ 1} (\wElem _ 1)\sine (\wElem _ 1)\sum _ { i_ 1' \in \prodsize ''} \sum _ { \wElem _ 1' \in \wSet } \ind { \hfunc (\wElem _ 1') = \buck } \vect _ { i_ 1'} (\wElem _ 1')\sine (\wElem _ 1')\right )
\sum _ { \buck ' = 1} ^ { \sketchCols } \left (\sum _ { i_ 2 \in \prodsize '} \sum _ { \wElem _ 2 \in \wSet } \ind { \hfunc (\wElem _ 2) = \buck '} \vect _ { i_ 2} (\wElem _ 2)\conj { \sine (\wElem _ 2)} \sum _ { i_ 2' \in \prodsize ''} \sum _ { \wElem _ 2' \in \wSet } \ind { \hfunc (\wElem _ 2') = \buck '} \vect _ { i_ 2'} (\wElem _ 2')\conj { \sine (\wElem _ 2')} \right )} \label { eq:var-pos2} \\
2020-03-17 13:55:44 -04:00
%
2020-04-02 18:22:34 -04:00
& =\mathbb { E} \left [\sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { \wElem _ 1 \in \wSet } \ind { \hfunc (\wElem _ 1) = \buck } \left (\sum _ { i_ 1 \in \prodsize '} \vect _ { i_ 1} (\wElem _ 1)\right )\sine (\wElem _ 1)\right )\left (\sum _ { \wElem _ 1' \in \wSet } \ind { \hfunc (\wElem _ 1') = \buck } \left (\sum _ { i_ 1' \in \prodsize ''} \vect _ { i_ 1'} (\wElem _ 1')\right )\sine (\wElem _ 1')\right )\right .\nonumber \\
& \left .\qquad \qquad \qquad \sum _ { \buck ' = 1} ^ { \sketchCols } \left (\sum _ { \wElem _ 2 \in \wSet } \ind { \hfunc (\wElem _ 2) = \buck '} \left (\sum _ { i_ 2 \in \prodsize '} \vect _ { i_ 2} (\wElem _ 2)\right )\conj { \sine (\wElem _ 2)} \right )\left (\sum _ { \wElem _ 2' \in \wSet } \ind { \hfunc (\wElem _ 2') = \buck '} \left (\sum _ { i_ 2' \in \prodsize ''} \vect _ { i_ 2'} (\wElem _ 2')\right )\conj { \sine (\wElem _ 2')} \right )\right ]\label { eq:var-pos3} \\
2020-03-17 13:55:44 -04:00
%
2020-04-02 18:22:34 -04:00
& =\ex { \sum _ { \buck = 1} ^ { \sketchCols } \sum _ { \wElem _ 1, \wElem _ 1' \in \wSet } \ind { \hfunc (\wElem _ 1) = \buck } \ind { \hfunc (\wElem _ 1') = \buck } \left (\sum _ { i_ 1 \in \prodsize '} \vect _ { i_ 1} (\wElem _ 1)\right )\sine (\wElem _ 1)\left (\sum _ { i_ 1' \in \prodsize ''} \vect _ { i_ 1'} (\wElem _ 1')\right )\sine (\wElem _ 1')\cdot
\sum _ { \buck ' = 1} ^ { \sketchCols } \sum _ { \wElem _ 2, \wElem _ 2' \in \wSet } \ind { \hfunc (\wElem _ 2) = \buck '} \ind { \hfunc (\wElem _ 2') = \buck '} \left (\sum _ { i_ 2 \in \prodsize '} \vect _ { i_ 2} (\wElem _ 2)\right )\conj { \sine (\wElem _ 2)} \left (\sum _ { i_ 2' \in \prodsize ''} \vect _ { i_ 2'} (\wElem _ 2')\right )\conj { \sine (\wElem _ 2')} } \label { eq:var-pos4} \\
2020-03-17 13:55:44 -04:00
%
2020-04-02 18:22:34 -04:00
& =\ex { \sum _ { \buck , \buck ' \in [\sketchCols ]} \sum _ { \substack { \wElem _ 1, \wElem _ 1',\\ \wElem _ 2, \wElem _ 2'\\ \in \wSet } } \ind { \hfunc (\wElem _ 1) = \buck } \ind { \hfunc (\wElem _ 1') = \buck } \ind { \hfunc (\wElem _ 2) = \buck '} \ind { \hfunc (\wElem _ 2') = \buck '} \left (\sum _ { i_ 1 \in \prodsize '} \vect _ { i_ 1} (\wElem _ 1)\right )\sine (\wElem _ 1)\left (\sum _ { i_ 1' \in \prodsize ''} \vect _ { i_ 1'} (\wElem _ 1')\right )\sine (\wElem _ 1')\left (\sum _ { i_ 2 \in \prodsize '} \vect _ { i_ 2} (\wElem _ 2)\right )\conj { \sine (\wElem _ 2)} \left (\sum _ { i_ 2' \in \prodsize ''} \vect _ { i_ 2'} (\wElem _ 2')\right )\conj { \sine (\wElem _ 2')} } \label { eq:var-pos5} \\
2020-03-17 13:55:44 -04:00
%
2020-04-02 18:22:34 -04:00
& =\sum _ { \buck , \buck ' \in [\sketchCols ]} \sum _ { \substack { \wElem _ 1, \wElem _ 1',\\ \wElem _ 2, \wElem _ 2'\\ \in \wSet } } \sum _ { \substack { i_ 1, i_ 2 \in \prodsize ',\\ i_ 1', i_ 2' \in \prodsize ''} } \vect _ { i_ 1} (\wElem _ 1)\vect _ { i_ 1'} (\wElem _ 1')\vect _ { i_ 2} (\wElem _ 2)\vect _ { i_ 2'} (\wElem _ 2')\ex { \ind { \hfunc (\wElem _ 1) = \buck } \ind { \hfunc (\wElem _ 1') = \buck } \ind { \hfunc (\wElem _ 2) = \buck '} \ind { \hfunc (\wElem _ 2') = \buck '} \sine (\wElem _ 1)\sine (\wElem _ 1')\conj { \sine (\wElem _ 2)} \conj { \sine (\wElem _ 2')} } \label { eq:var-pos6}
2020-03-17 13:55:44 -04:00
%--Below is part of the derivation without using the indicator variables. Only saving for short term...
%&=\ex{\sum_{\buck = 1}^{\sketchCols}\left(\sum_{\wElem_1 \in \wSet_j}\left(\sum_{i \in \kvec'}\vect_i(\wElem_1)\right)\sine(\wElem_1)\right) \left(\sum_{\wElem_2 \in \wSet_j}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_2)\right)\conj{\sine(\wElem_2)}\right) \cdot \sum_{\buck' = 1}^{\sketchCols}\left(\sum_{\wElem_3 \in \wSet_{j'}}\left(\sum_{i \in \kvec'}\vect_i(\wElem_3)\right)\conj{\sine(\wElem_3)}\right) \left(\sum_{\wElem_4 \in \wSet_{j'}}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_4)\right)\conj{\sine(\wElem_4)}\right)}\label{eq:var-pos1}\\
%=&\ex{\sum_{\buck, \buck' \in \sketchCols}\left(\sum_{\wElem_1 \in \wSet_j}\left(\sum_{i \in \kvec'}\vect_i(\wElem_1)\right)\sine(\wElem_1)\right) \left(\sum_{\wElem_2 \in \wSet_j}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_2)\right)\conj{\sine(\wElem_2)}\right) \cdot \left(\sum_{\wElem_3 \in \wSet_{j'}}\left(\sum_{i \in \kvec'}\vect_i(\wElem_3)\right)\conj{\sine(\wElem_3)}\right) \left(\sum_{\wElem_4 \in \wSet_{j'}}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_4)\right)\conj{\sine(\wElem_4)}\right)}\label{eq:var-pos2}\\
%=&\sum_{\buck, \buck' \in \sketchCols}\sum_{\wElem_1 \in \wSet_j}\left(\sum_{i \in \kvec'}\vect_i(\wElem_1)\right)\sum_{\wElem_2 \in \wSet_j}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_2)\right) \sum_{\wElem_3 \in \wSet_{j'}}\left(\sum_{i \in \kvec'}\vect_i(\wElem_3)\right) \sum_{\wElem_4 \in \wSet_{j'}}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_4)\right)\ex{\sine(\wElem_1)\cdot \conj{\sine(\wElem_2)}\cdot\conj{\sine(\wElem_3)}\cdot \conj{\sine(\wElem_4)}}\label{eq:var-pos3}
\end { align}
Equation \eqref { eq:var-pos1} follows from expanding the definition of a sketch $ \sk $ .
Equation \eqref { eq:var-pos2} uses the fact that the sum (product) of conjugates is equal to the conjugate of the sum (product).
Equation \eqref { eq:var-pos3} results from rewriting the summations using the law of associativity, and then applying the law of distributivity of addition over multiplication to the rewrite.
Equations \eqref { eq:var-pos4} , \eqref { eq:var-pos5} again rewrite the summation(s) using the law of distributivity of addition over multiplication.
Equation \eqref { eq:var-pos6} is the result of factoring out non-random terms from the expectation.\newline
2020-04-02 18:22:34 -04:00
When considering the terms that survive the expecation in \eqref { eq:var-pos6} , recall that it is a known fact when working with $ \prodsize ^ { th } $ roots of unity ($ R ^ \prodsize $ ) in the complex numbers that a complex number times its conjugate has a product of one, formally:
2020-03-17 13:55:44 -04:00
\begin { equation*}
2020-04-02 18:22:34 -04:00
\forall c \in \mathbb { C} \text { s.t. } c \in R^ \prodsize , c \cdot \conj { c} = 1.
2020-03-17 13:55:44 -04:00
\end { equation*}
Combining this result with Lemma \eqref { lem:exp-sine} one can see that only two possible cases of terms survive the expectation in \eqref { eq:var-pos6} .
First by Lemma \eqref { lem:exp-sine} ,
%labels not compiling
\begin { align}
& \emph { case 1} \nonumber \\
& \qquad \text { a: } w_ 1 = w_ 1' =w_ 2 = w_ 2'\label { this-1} \\ %\label{var:pos-case-1a}
& \qquad \text { b: } w_ 1 = w_ 1' \neq w_ 2 = w_ 2'\label { this-2} %\label{var:pos-case-1b}
\end { align}
Second, by the law of conjugates,
\begin { align}
& \emph { case 2} \nonumber \\
& \qquad \text { a: } w_ 1 = w_ 2 \neq w_ 1' = w_ 2'\label { joe-a} \\ %\label{var:pos-Case-2a}
& \qquad \text { b: } w_ 1 = w_ 2' \neq w_ 1' = w_ 2\label { joe-b} %\label{var:pos-Case-2b}
2020-03-16 13:50:22 -04:00
\end { align}
2020-03-17 13:55:44 -04:00
Next, we show that the second term, $ T _ 2 = \ex { \pos } \ex { \conj { \pos } } $ , has the same term as $ T _ 1 $ factor out of the expectations.
2020-03-16 13:50:22 -04:00
\begin { align}
2020-04-02 18:22:34 -04:00
& \ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { i_ 1 \in \prodsize '} \sk ^ { \vect _ { i_ 1} } [\buck ]\right )\left (\sum _ { i_ 1' \in \prodsize ''} \sk ^ { \vect _ { i_ 1'} } [\buck ]\right )}
\ex { \conj { \sum _ { \buck ' = 1} ^ { \sketchCols } \left (\sum _ { i_ 2 \in \prodsize '} \sk ^ { \vect _ { i_ 2} } [\buck ]\right )\left (\sum _ { i_ 2' \in \prodsize ''} \sk ^ { \vect _ { i_ 2'} } [\buck ]\right )} } \label { eq:var-t2-pos1} \\
2020-03-17 13:55:44 -04:00
%
2020-04-02 18:22:34 -04:00
& \ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { i_ 1 \in \prodsize '} \sum _ { \wElem _ 1 \in \wSet } \ind { \hfunc (\wElem _ 1) = \buck } \vect _ { i_ 1} (\wElem _ 1)\sine (\wElem _ 1)\right )\left (\sum _ { i_ 1' \in \prodsize ''} \sum _ { \wElem _ 1' \in \wSet } \ind { \hfunc (\wElem _ 1') = \buck } \vect _ { i_ 1'} (\wElem _ 1')\sine (\wElem _ 1')\right )} \ex { \sum _ { \buck ' = 1} ^ { \sketchCols } \left (\sum _ { i_ 2 \in \prodsize '} \sum _ { \wElem _ 2 \in \wSet } \ind { \hfunc (\wElem _ 2) = \buck '} \vect _ { i_ 2} (\wElem _ 2)\conj { \sine (\wElem _ 2)} \right )\left (\sum _ { i_ 2' \in \prodsize ''} \sum _ { \wElem _ 2' \in \wSet } \ind { \hfunc (\wElem _ 2') = \buck '} \vect _ { i_ 2'} (\wElem _ 2')\conj { \sine (\wElem _ 2')} \right )} \label { eq:var-t2-pos2} \\
2020-03-17 13:55:44 -04:00
%
2020-04-02 18:22:34 -04:00
& \ex { \sum _ { \buck = 1} ^ { \sketchCols } \left (\sum _ { \wElem _ 1 \in \wSet } \ind { \hfunc (\wElem _ 1) = \buck } \left (\sum _ { i_ 1 \in \prodsize '} \vect _ { i_ 1} (\wElem _ 1)\right )\sine (\wElem _ 1)\right )\left (\sum _ { \wElem _ 1' \in \wSet } \ind { \hfunc (\wElem _ 1') = \buck } \left (\sum _ { i_ 1' \in \prodsize ''} \vect _ { i_ 1'} (\wElem _ 1')\right )\sine (\wElem _ 1')\right )} \nonumber \\
& \qquad \qquad \qquad \ex { \sum _ { \buck ' = 1} ^ { \sketchCols } \left (\sum _ { \wElem _ 2 \in \wSet } \ind { \hfunc (\wElem _ 2) = \buck '} \left (\sum _ { i_ 2 \in \prodsize '} \vect _ { i_ 2} (\wElem _ 2)\right )\conj { \sine (\wElem _ 2)} \right )\left (\sum _ { \wElem _ 2' \in \wSet } \ind { \hfunc (\wElem _ 2') = \buck '} \left (\sum _ { i_ 2' \in \prodsize ''} \vect _ { i_ 2'} (\wElem _ 2')\right )\conj { \sine (\wElem _ 2')} \right )} \label { eq:var-t2-pos3} \\
2020-03-17 13:55:44 -04:00
%
2020-04-02 18:22:34 -04:00
& \ex { \sum _ { \buck = 1} ^ { \sketchCols } \sum _ { \wElem _ 1, \wElem _ 1' \in \wSet } \ind { \hfunc (\wElem _ 1) = \buck } \ind { \hfunc (\wElem _ 1') = \buck } \left (\sum _ { i_ 1 \in \prodsize '} \vect _ { i_ 1} (\wElem _ 1)\right )\sine (\wElem _ 1)\left (\sum _ { i_ 1' \in \prodsize ''} \vect _ { i_ 1'} (\wElem _ 1')\right )\sine (\wElem _ 1')} \ex { \sum _ { \buck ' = 1} ^ { \sketchCols } \sum _ { \wElem _ 2, \wElem _ 2' \in \wSet } \ind { \hfunc (\wElem _ 2) = \buck '} \ind { \hfunc (\wElem _ 2') = \buck '} \left (\sum _ { i_ 2 \in \prodsize '} \vect _ { i_ 2} (\wElem _ 2)\right )\conj { \sine (\wElem _ 2)} \left (\sum _ { i_ 2' \in \prodsize ''} \vect _ { i_ 2'} (\wElem _ 2')\right )\conj { \sine (\wElem _ 2')} } \label { eq:var-t2-pos4} \\
2020-03-17 13:55:44 -04:00
%
2020-04-02 18:22:34 -04:00
& \sum _ { \buck = 1} ^ { \sketchCols } \sum _ { \wElem _ 1, \wElem _ 1' \in \wSet } \left (\sum _ { i_ 1 \in \prodsize '} \vect _ { i_ 1} (\wElem _ 1)\right )\left (\sum _ { i_ 1' \in \prodsize ''} \vect _ { i_ 1'} (\wElem _ 1')\right )\ex { \ind { \hfunc (\wElem _ 1) = \buck } \ind { \hfunc (\wElem _ 1') = \buck } \sine (\wElem _ 1)\sine (\wElem _ 1')} \sum _ { \buck ' = 1} ^ { \sketchCols } \sum _ { \wElem _ 2, \wElem _ 2' \in \wSet } \left (\sum _ { i_ 2 \in \prodsize '} \vect _ { i_ 2} (\wElem _ 2)\right )\left (\sum _ { i_ 2' \in \prodsize ''} \vect _ { i_ 2'} (\wElem _ 2')\right )\ex { \ind { \hfunc (\wElem _ 2) = \buck '} \ind { \hfunc (\wElem _ 2') = \buck '} \conj { \sine (\wElem _ 2)} \conj { \sine (\wElem _ 2')} } \label { eq:var-t2-pos5} \\
2020-03-17 13:55:44 -04:00
%
2020-04-02 18:22:34 -04:00
& \sum _ { \buck , \buck ' \in [\sketchCols ]} \sum _ { \substack { \wElem _ 1, \wElem _ 1',\\ \wElem _ 2, \wElem _ 2' \in \wSet } } \left (\sum _ { \substack { i_ 1, i_ 2 \in \prodsize ',\\ i_ 1', i_ 2' \in \prodsize ''} } \vect _ { i_ 1} (\wElem _ 1)\vect _ { i_ 1'} (\wElem _ 1')\vect _ { i_ 2} (\wElem _ 2)\vect _ { i_ 2'} (\wElem _ 2')\right )\left (\ex { \ind { \hfunc (\wElem _ 1) = \buck } \ind { \hfunc (\wElem _ 1') = \buck } \sine (\wElem _ 1)\sine (\wElem _ 1')} \ex { \ind { \hfunc (\wElem _ 2) = \buck '} \ind { \hfunc (\wElem _ 2') = \buck '} \conj { \sine (\wElem _ 2)} \conj { \sine (\wElem _ 2')} } \right )\label { eq:var-t2-pos5}
2020-03-17 13:55:44 -04:00
%
%&\ex{\sum_{\buck = 1}^{\sketchCols}\left(\sum_{\wElem_1 \in \wSet_j}\left(\sum_{i \in \kvec'}\vect_i(\wElem_1)\right)\sine(\wElem_1)\right) \left(\sum_{\wElem_2 \in \wSet_j}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_2)\right)\conj{\sine(\wElem_2)}\right)} \cdot \ex{\sum_{\buck' = 1}^{\sketchCols}\left(\sum_{\wElem_3 \in \wSet_{j'}}\left(\sum_{i \in \kvec'}\vect_i(\wElem_3)\right)\conj{\sine(\wElem_3)}\right) \left(\sum_{\wElem_4 \in \wSet_{j'}}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_4)\right)\conj{\sine(\wElem_4)}\right)}\label{eq:var-t2-pos1}\\
%=&\sum_{\buck, \buck' \in \sketchCols}\sum_{\wElem_1 \in \wSet_j}\left(\sum_{i \in \kvec'}\vect_i(\wElem_1)\right)\sum_{\wElem_2 \in \wSet_j}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_2)\right) \sum_{\wElem_3 \in \wSet_{j'}}\left(\sum_{i \in \kvec'}\vect_i(\wElem_3)\right) \sum_{\wElem_4 \in \wSet_{j'}}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_4)\right)\ex{\sine(\wElem_1)\cdot \conj{\sine(\wElem_2)}}\ex{\cdot\conj{\sine(\wElem_3)}\cdot \conj{\sine(\wElem_4)}}\label{eq:var-t2-pos2}
2020-03-16 13:50:22 -04:00
\end { align}
2020-03-17 13:55:44 -04:00
The justification of steps is almost identical to the justification used in $ T _ 1 $ derivation.
Equation\eqref { eq:var-t2-pos1} expands out the definition of $ \sk $ , and also uses the fact that the sum (product) of conjugates is equal to the conjugate of the sum (product).
Equations \eqref { eq:var-t2-pos2} and \eqref { eq:var-t2-pos3} rely on the associativity and distributivity properties of addition.
Equation \eqref { eq:var-t2-pos4} factors out non-random terms from the expectations.
Equation \eqref { eq:var-t2-pos5} uses the distributive property of addition over multiplication, along with the commutative and associativity of multiplication.
2020-03-16 13:50:22 -04:00
2020-03-17 13:55:44 -04:00
Notice that both $ T _ 1 $ and $ T _ 2 $ have the same left side factor, so the $ \var $ can be written as
2020-03-16 13:50:22 -04:00
\begin { align}
2020-04-02 18:22:34 -04:00
& \sum _ { \buck , \buck ' \in [\sketchCols ]} \sum _ { \substack { \wElem _ 1, \wElem _ 1',\\ \wElem _ 2, \wElem _ 2' \in \wSet } } \left (\sum _ { \substack { i_ 1, i_ 2 \in \prodsize ',\\ i_ 1', i_ 2' \in \prodsize ''} } \vect _ { i_ 1} (\wElem _ 1)\vect _ { i_ 1'} (\wElem _ 1')\vect _ { i_ 2} (\wElem _ 2)\vect _ { i_ 2'} (\wElem _ 2')\right )\left (\ex { \ind { \hfunc (\wElem _ 1) = \buck } \ind { \hfunc (\wElem _ 1') = \buck } \ind { \hfunc (\wElem _ 2) = \buck '} \ind { \hfunc (\wElem _ 2') = \buck '} \sine (\wElem _ 1)\sine (\wElem _ 1')\conj { \sine (\wElem _ 2)} \conj { \sine (\wElem _ 2')} } \right .\nonumber \\
2020-03-17 13:55:44 -04:00
& \left .\qquad \qquad \qquad - \ex { \ind { \hfunc (\wElem _ 1) = \buck } \ind { \hfunc (\wElem _ 1') = \buck } \sine (\wElem _ 1)\sine (\wElem _ 1')} \ex { \ind { \hfunc (\wElem _ 2) = \buck '} \ind { \hfunc (\wElem _ 2') = \buck '} \conj { \sine (\wElem _ 2)} \conj { \sine (\wElem _ 2')} } \right )\\ \label { eq:var-t1-t2}
\end { align}
2020-03-16 13:50:22 -04:00
2020-03-17 21:55:46 -04:00
Notice that the expectation terms coming from $ T _ 2 $ cancel out case 1 leaving the two possibilities of case 2, \eqref { joe-a} and \eqref { joe-b} as surviving terms in $ \var $ . Note that both \eqref { joe-a} and \eqref { joe-b} have all their variables coming from the same $ \buck ^ { th } $ bucket because of equality amongst cross terms. The equalities also have the added effect of setting two of the four indicator variables to 1.
2020-03-17 13:55:44 -04:00
Thus,
\begin { equation}
2020-04-02 18:22:34 -04:00
\var \left [\pos\right] = \sum _ j\sum _ { \wElem , \wElem '} \frac { 1} { \sketchCols ^ 2} \left (\sum _ { \substack { i \in \prodsize ',\\ i' \in \prodsize ''} } \vect _ i(\wElem )^ 2\vect _ { i'} (\wElem ')^ 2 + \vect _ i(\wElem )\vect _ { i'} (\wElem )\vect _ i(\wElem ')\vect _ { i'} (\wElem ')\right )
2020-03-17 13:55:44 -04:00
\end { equation}
2020-03-17 21:55:46 -04:00
2020-03-17 13:55:44 -04:00
%Putting things together we have,
%\begin{align}
%&\sum_{\buck, \buck' \in \sketchCols}\sum_{\wElem_1 \in \wSet_j}\left(\sum_{i \in \kvec'}\vect_i(\wElem_1)\right)\sum_{\wElem_2 \in \wSet_j}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_2)\right) \sum_{\wElem_3 \in \wSet_{j'}}\left(\sum_{i \in \kvec'}\vect_i(\wElem_3)\right) \sum_{\wElem_4 \in \wSet_{j'}}\left(\sum_{i' \in \kvec''}\vect_{i'}(\wElem_4)\right)\left(\ex{\sine(\wElem_1) \conj{\sine(\wElem_2)}\conj{\sine(\wElem_3)}\cdot \conj{\sine(\wElem_4)}}-\ex{\sine(\wElem_1) \conj{\sine(\wElem_2)}}\ex{\conj{\sine(\wElem_3)}\cdot \conj{\sine(\wElem_4)}}\right)\label{eq:var-both-pos1}\\
2020-04-02 18:22:34 -04:00
%=&\sum_{\buck}\sum_{\wElem \neq \wElem' \in \wSet}\left(\sum_{i \in \prodsize'}\vect_i(\wElem)\right)^2\left(\sum_{i' \in \prodsize''}\vect_{i'}(\wElem')\right)^2 + \left(\sum_{i \in \prodsize'}\vect_i(\wElem)\right)\left(\sum_{i' \in \prodsize''}\vect_{i'}(\wElem)\right)\left(\sum_{i' \in \prodsize''}\vect_{i'}(\wElem')\right) \left(\sum_{i \in \prodsize'}\vect_i(\wElem')\right)\label{eq:var-both-pos2}\\
%\leq&\norm{\sum_{i \in \prodsize'}\vect_i}_2^2\cdot\norm{\sum_{i' \in \prodsize''}\vect_{i'}}_2^2 + \norm{\sum_{i \in \prodsize'}\vect_i \had \sum_{i' \in \prodsize''}\vect_{i'}}_2^2\label{eq:var-both-pos3}
2020-03-17 13:55:44 -04:00
%\end{align}
%\qed
%
%Equation \eqref{eq:var-both-pos2} relies on the fact that the difference in expectation will only be non-zero when $\wElem_1 = \wElem_3 \neq \wElem_2 = \wElem_4$ or $\wElem_1 = \wElem_4 \neq \wElem_2 = \wElem_3$.