Rewritten proof for \lambda(j, j'), j \neq j'

master
Aaron Huber 2020-04-20 12:37:19 -04:00
parent 73fbc8f6a0
commit 2cb84b6204
3 changed files with 57 additions and 19 deletions

View File

@ -211,8 +211,8 @@
%
%many of these are outdated and need to be cleaned up
%
\newcommand{\startOld}[1]{\textcolor{purple}{\newline-------------------------\newline\textbf{Old Content:\newline-------------------------\newline} #1}\newline}
\newcommand{\finOld}{\newline\textcolor{purple}{------------------------------\newline\textbf{END} Old Content\newline ------------------------------\newline}}
\newcommand{\startOld}[1]{\textcolor{purple}{-------------------------\newline\textbf{Old}\textit{ #1 \newline} }------------------------------\newline}
\newcommand{\finOld}{\newline\textcolor{purple}{------------------------------\newline\textbf{END}\text{ Old}\newline ------------------------------\newline}}
%\newcommand{\comment}[1]{}

View File

@ -1,9 +1,8 @@
% -*- root: main.tex -*-
\pagebreak
\section{POS Queries}
\AH{The following lemma will probably be moved later on.}
The following property of the sine function $\sine$ is used in $\ex{\pos}$ derivation.
The following lemma is used in subsequent proofs for bounding various queries.
\begin{Lemma}\label{lem:exp-sine}
$\forall \wElem \in \wSet$,\newline
$\ex{\sine(\wElem)^i} = \begin{cases}
@ -11,6 +10,8 @@ $\ex{\sine(\wElem)^i} = \begin{cases}
1 &\text{otherwise}.
\end{cases}$
\end{Lemma}
\begin{proof}
Notice that, $\forall i \in [1, \prodsize - 1]$, $\ex{\sine(\wElem)^i} = \frac{\sum\limits_{\omega \in \Omega}\omega^i}{\prodsize} = \frac{\sum\limits_{l = 0}^{\prodsize - 1}(\omega^i)^l}{\prodsize}$. To prove the lemma then, one needs only to prove that $\sum\limits_{l = 0}^{\prodsize - 1}\omega^i = \begin{cases}0&1 \leq i < \prodsize\\\prodsize&\text{otherwise}.\end{cases}$
For the case of $i = \prodsize$,
\begin{equation}
@ -21,6 +22,7 @@ For $i \in [1, \prodsize - 1]$, we can show by geometric sum series that
\sum_{l = 0}^{\prodsize - 1}(\omega^i)^l = \frac{(\omega^i)^\prodsize - 1}{\omega^i - 1} = \frac{1 - 1}{\omega^i - 1} = 0.
\end{equation}
\qed
\end{proof}
We target the specific query where it is optimal to push down projections below join operators. Such a query is a product of sums ($\pos$). To show that our scheme works in this setting, we first compute the expectation of a $\pos$~ query over sketch annotations, i.e. $\pos$ = $\sum_{\buck = 1}^{\sketchCols}\left(\sum_{i \in \kvec'}\sk^{\vect_i}\left[\buck\right]\right) \left(\sum_{i' \in \kvec''}\sk^{\vect_{i'}}\left[\buck\right]\right)$, for the set of matching projected tuples from each input, denoted $\prodsize', \prodsize''$. Note that we denote the $i^{th}$ vector as $\vect_i$ and the sketch of the $i^{th}$ vector $\sk^{\vect_i}$.

66
sop.tex
View File

@ -35,6 +35,11 @@ Fix the variables $\wElem_1,\ldots, \wElem_{\prodsize}$. Define $\dist$ to be th
1 & \dist = 1.
\end{cases}
\end{align*}
\AH{Oliver had suggested that we change the proof to lemma 1...I think he wanted to allow for values of $i$ to be within the set of integers, not just restricted to $0 < i \leq k$.
Even if we don't change the lemma, I think the proof itself is inaccurate and needs to be rewritten.}
We obtain the final equality by \cref{lem:exp-sine}, which states that the only way in expectation that $\sine(\wElem_{\ell})^{e_{\ell}}$ can be something other than $0$ is when $e_{\ell} = \prodsize$. It can further be seen that the only way this can happen is when $\dist = \prodsize$.
Notice, that the above leaves us with the only remaining condition that $\forall i, j \in [\prodsize], \wElem_i = \wElem_j$,
@ -59,17 +64,9 @@ Substituting in the definition of variance for complex numbers,
&= \sum_j \sigsq_j + \sum_{j \neq j'}\cvar{j, j'} \label{eq:sigsq}
\end{align}
Notice that assuming independence of $\sigsq_j ~\forall j \in \sketchCols$, we can push the variance through the sum and obtain the result
\begin{align*}
&\sigsq - \sum_j \sigsq_j = \cvar{j, j'}\\
&\implies \cvar{j, j'} \leq 0.
\end{align*}
\AH{The implication above was discussed months ago, but I don't see how it's true. Is it true?}
One can see that \cref{eq:sigsq} is composed of two addends. We now bound each of them separately.
\subsection{Bounding $\cvar{j, j'}$}
\AR{You need to re-write the stuff below. First in the 2nd equality suddenly the sum on $j\ne j'$ has vanished. Also I think you should first analyze $\lambda(j,j')$ for both $j=j'$ and $j\ne j'$ for as long as you can. Only when it is needed should you divide into the two cases-- do not do the division up front.}
Notice we have two cases of $\cvar{j, j'}$, the first is when $j = j'$, i.e. $(\sigsq_j)$, and the second when $j \neq j'$.
\begin{align}
\cvar{j, j'} &= \ex{\est_j \cdot \conj{\est_{j'}}} - \ex{\est_j}\cdot\ex{\conj{\est_{j'}}}\nonumber\\
@ -79,17 +76,56 @@ Notice we have two cases of $\cvar{j, j'}$, the first is when $j = j'$, i.e. $(\
&= \sum_{\substack{\wElem_1,\cdots,\wElem_\prodsize,\\\wElem'_1,\cdots,\wElem'_\prodsize\\\in W}}\prod_{i = 1}^{\prodsize}v_i(\wElem_i)v_i(\wElem'_i)\ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\conj{s(\wElem'_i)}\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}} - \prod_{i = 1}^{\prodsize}v_i(\wElem_i)\ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\ind{h(\wElem_i) = j}}\cdot \prod_{i = 1}^{\prodsize}v_i(\wElem'_i)\ex{\prod_{i = 1}^{\prodsize}\conj{s(\wElem'_i)}\ind{h(\wElem_i') = j'}}\nonumber\\
&= \sum_{\substack{\wElem_1,\cdots,\wElem_\prodsize,\\\wElem'_1,\cdots,\wElem'_\prodsize\\\in W}}\prod_{i = 1}^{\prodsize}v_i(\wElem_i)v_i(\wElem'_i)\left(\ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\conj{s(\wElem'_i)}\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}} - \ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\ind{h(\wElem_i) = j}}\cdot\ex{\prod_{i = 1}^{\prodsize}\conj{s(\wElem'_i)}\ind{h(\wElem_i') = j'}} \right).\label{eq:var-lambda-j-j'}
\end{align}
\AH{How can I present the derivation of the bounds below in a \textit{better} way?}
\AH{Have subsections for $j = j'$ and $j \neq j'$}
\AH{A better argument might be: Let us assume that there $i \neq i'$, $w_i \neq w'_i$ which is $0$ in expectation. Don't forget parameters for $\term_1$ and change this notation. The cardinal rule is: if you don't need it, then don't use it (notation). Define $\term_1$ once and then use later.}
Equation ~\eqref{eq:var-lambda-j-j'} for $j \neq j'$ bounds to the rightmost sum of \cref{eq:sigsq}. For $\term_1^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)s(\wElem'_i)\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}}$, because hash function $h$ cannot bucket the same world to two different buckets, the only instance $\term_1^{\cvar{j, j'}} = 1$ occurs when there is no overlap between the $\wElem_i$ and $\wElem'_i$ variables. Given the condition of no overlap, $\term_1^{\cvar{j, j'}} = 1$ only with the further condition that $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem', \wElem \neq \wElem'$. Notice, however, given the conditions, the product of the remaining expectations will cancel this out. Looking at the remaining two expectations $\term_2^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i) \ind{\hfunc(\wElem_i) = j}} \cdot \ex{\prod_{i = 1}^{\prodsize}\conj{\sine(\wElem'_i)} \ind{\hfunc(\wElem'_i) = j'}}$, that $\term_2^{\cvar{j, j'}} = 1$ only when $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem'$. Taken together, the constraints leave us with only one possible case for $\term_1^{\cvar{j, j'}} - \term_2^{\cvar{j, j'}} \neq 0$, when all variables are the same world. Thus,
\subsection{$\cvar{j, j'}~|~j \neq j'$}
For notational convenience set
\begin{align*}
\term_1\left(\wElem_1,\ldots, \wElem_{\prodsize}\right) = &\ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i)\conj{\sine(\wElem'_i)}\ind{\hfunc(\wElem_i) = j} \ind{\hfunc(\wElem'_i) = j'}}\\
\term_2\left(\wElem_1,\ldots, \wElem_{\prodsize}\right) = &\ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i)\ind{\hfunc(\wElem_i) = j}} \cdot \ex{\prod_{i = 1}^{\prodsize}\conj{\sine(\wElem'_i)}\ind{\hfunc(\wElem'_i) = j'}}
\end{align*}
Focusing on $\term_1$, observe that $\term_1 = 1$ if and only if all the $\wElem_i$'s are equal, all the $\wElem'_i$'s are equal, and the two groups of variables do not equal each other,
\begin{equation*}
\term_1\left(\wElem_1,\ldots, \wElem_{\prodsize}\right) =
\begin{cases}
1 &\text{if } \forall i, j \in [\prodsize], \wElem_i \neq \wElem'_j, \wElem_i = \wElem_j, \wElem'_i = \wElem'_j\\
0 &otherwise.
\end{cases}
\end{equation*}
Focusing on $\term_2$, it can be seen that $\term_2 = 1$ when we have that all $\wElem_i$'s are equal, and all $\wElem'_i$'s are equal,
\begin{equation*}
\term_2\left(\wElem_1,\ldots, \wElem_{\prodsize}\right) =
\begin{cases}
1 &\text{if } \forall i, j \in [\prodsize], \wElem_i = \wElem_j, \wElem'_i = \wElem'_j\\
0 &otherwise.
\end{cases}
\end{equation*}
\underline{Case 1:}
Equation ~\eqref{eq:var-lambda-j-j'} for $j \neq j'$ bounds to the rightmost sum of \cref{eq:sigsq}.\newline
Assume that $\exists i, j \in [k] ~|~ \wElem_i = \wElem'_j$. Then $\term_1 = 0$, and if we have that all $\wElem_i$ are equal, and all $\wElem'_i$ are equal, $\term_2 = 1$, and otherwise $\term_2 = 0$. \newline
\underline{Case 2:}
Alternatively, assume that $\forall i \in [\prodsize] \nexists j ~|~ \wElem_i = \wElem'_j$. Then, if $\forall i, j \in [\prodsize] ~|~ \wElem_i = \wElem_j, \wElem'_i = \wElem'_j$, $\term_1 = 1$, and $0$ otherwise. Should $\term_1 =1$, then it is certain that $\term_2 = 1$. Should $\term_1 = 0$, then it has to be that $\term_2 = 0$.
Thus, the only time that $\term_1 - \term_2 \neq 0$ is when we have that all $\wElem_i = \wElem'_i = \wElem$. \newline
%$\startOld{'Proof'/reasoning}$
%For $\term_1^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)s(\wElem'_i)\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}}$, because hash function $h$ cannot bucket the same world to two different buckets, the only instance $\term_1^{\cvar{j, j'}} = 1$ occurs when there is no overlap between the $\wElem_i$ and $\wElem'_i$ variables. Given the condition of no overlap, $\term_1^{\cvar{j, j'}} = 1$ only with the further condition that $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem', \wElem \neq \wElem'$. Notice, however, given the conditions, the product of the remaining expectations will cancel this out. Looking at the remaining two expectations $\term_2^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i) \ind{\hfunc(\wElem_i) = j}} \cdot \ex{\prod_{i = 1}^{\prodsize}\conj{\sine(\wElem'_i)} \ind{\hfunc(\wElem'_i) = j'}}$, that $\term_2^{\cvar{j, j'}} = 1$ only when $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem'$. Taken together, the constraints leave us with only one possible case for $\term_1^{\cvar{j, j'}} - \term_2^{\cvar{j, j'}} \neq 0$, when all variables are the same world.
%$\finOld$
Therefore,
\begin{align}
&\sum_{j \neq j'}\cvar{j, j'} = - \frac{1}{B^2}\sum_{\wElem \in W}\prod_{i = 1}^{\prodsize}v_i^2(\wElem)\label{eq:cvar-bound}.
\underset{j \neq j'}{\cvar{j, j'}} = &\sum_{\wElem \in \wSet}\prod_{i = 1}^{\prodsize}\vect_i^2(\wElem)\left(\term_1 - \term_2\right)\nonumber\\
= &\sum_{\wElem \in \wSet}\prod_{i = 1}^{\prodsize} \vect_i^2(\wElem)\left(0 - \frac{1}{\sketchCols^2}\right)\nonumber\\
= &- \frac{1}{B^2}\sum_{\wElem \in W}\prod_{i = 1}^{\prodsize}v_i^2(\wElem)\label{eq:cvar-bound}.
\end{align}
Based on the results of \cref{eq:cvar-bound}, we deduce the following,
\begin{align*}
&\sigsq - \sum_j \sigsq_j = \cvar{j, j'}\\
&\implies \sigsq \leq \sum_j \sigsq_j.
\end{align*}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{$\cvar{j, j'}~|~j = j'$}
Taking a look at the leftmost term of \cref{eq:sigsq}, we establish bounds the variance of the $j^{\text{th}}$ bucket of a $\prodsize$-way join. Note in this case that \cref{eq:var-lambda-j-j'} has that $j = j'$, and can be written in the following way,
\begin{align}
@ -106,7 +142,7 @@ Taking a look at the leftmost term of \cref{eq:sigsq}, we establish bounds the v
\end{align}
Before proceeding, we introduce some notation and terminology that will aid in communicating the bounds we are about to establish. We refer to the leftmost expectation of \cref{eq:sig-j-last} in the following way:
\AR{dangling eq ref}\AH{I don't see one}
\[\term_1\left(\wElem_1,\ldots,\wElem_\prodsize, \wElem_1',\ldots, \wElem_\prodsize'\right) = \ex{\prod_{i = 1}^\prodsize s(w_i)\overline{s(w'_i)}\ind{h(w_i) = j}\ind{h(w'_i) = j}}.%\text{, and}
\]
%\[\term_2\left(\wElem_1,\ldots,\wElem_\prodsize, \wElem_1',\ldots, \wElem_\prodsize'\right) = \ex{\prod_{i = 1}^ks(w_i)\ind{h(w_i) = j}}\cdot \ex{\prod_{i = 1}^\prodsize\overline{s(w'_i)}\ind{h(w'_i) = j}}. \]