Rewritten proof for \lambda(j, j'), j \neq j'
parent
73fbc8f6a0
commit
2cb84b6204
|
@ -211,8 +211,8 @@
|
|||
%
|
||||
%many of these are outdated and need to be cleaned up
|
||||
%
|
||||
\newcommand{\startOld}[1]{\textcolor{purple}{\newline-------------------------\newline\textbf{Old Content:\newline-------------------------\newline} #1}\newline}
|
||||
\newcommand{\finOld}{\newline\textcolor{purple}{------------------------------\newline\textbf{END} Old Content\newline ------------------------------\newline}}
|
||||
\newcommand{\startOld}[1]{\textcolor{purple}{-------------------------\newline\textbf{Old}\textit{ #1 \newline} }------------------------------\newline}
|
||||
\newcommand{\finOld}{\newline\textcolor{purple}{------------------------------\newline\textbf{END}\text{ Old}\newline ------------------------------\newline}}
|
||||
%\newcommand{\comment}[1]{}
|
||||
|
||||
|
||||
|
|
6
pos.tex
6
pos.tex
|
@ -1,9 +1,8 @@
|
|||
% -*- root: main.tex -*-
|
||||
\pagebreak
|
||||
\section{POS Queries}
|
||||
\AH{The following lemma will probably be moved later on.}
|
||||
The following property of the sine function $\sine$ is used in $\ex{\pos}$ derivation.
|
||||
|
||||
The following lemma is used in subsequent proofs for bounding various queries.
|
||||
\begin{Lemma}\label{lem:exp-sine}
|
||||
$\forall \wElem \in \wSet$,\newline
|
||||
$\ex{\sine(\wElem)^i} = \begin{cases}
|
||||
|
@ -11,6 +10,8 @@ $\ex{\sine(\wElem)^i} = \begin{cases}
|
|||
1 &\text{otherwise}.
|
||||
\end{cases}$
|
||||
\end{Lemma}
|
||||
|
||||
\begin{proof}
|
||||
Notice that, $\forall i \in [1, \prodsize - 1]$, $\ex{\sine(\wElem)^i} = \frac{\sum\limits_{\omega \in \Omega}\omega^i}{\prodsize} = \frac{\sum\limits_{l = 0}^{\prodsize - 1}(\omega^i)^l}{\prodsize}$. To prove the lemma then, one needs only to prove that $\sum\limits_{l = 0}^{\prodsize - 1}\omega^i = \begin{cases}0&1 \leq i < \prodsize\\\prodsize&\text{otherwise}.\end{cases}$
|
||||
For the case of $i = \prodsize$,
|
||||
\begin{equation}
|
||||
|
@ -21,6 +22,7 @@ For $i \in [1, \prodsize - 1]$, we can show by geometric sum series that
|
|||
\sum_{l = 0}^{\prodsize - 1}(\omega^i)^l = \frac{(\omega^i)^\prodsize - 1}{\omega^i - 1} = \frac{1 - 1}{\omega^i - 1} = 0.
|
||||
\end{equation}
|
||||
\qed
|
||||
\end{proof}
|
||||
|
||||
We target the specific query where it is optimal to push down projections below join operators. Such a query is a product of sums ($\pos$). To show that our scheme works in this setting, we first compute the expectation of a $\pos$~ query over sketch annotations, i.e. $\pos$ = $\sum_{\buck = 1}^{\sketchCols}\left(\sum_{i \in \kvec'}\sk^{\vect_i}\left[\buck\right]\right) \left(\sum_{i' \in \kvec''}\sk^{\vect_{i'}}\left[\buck\right]\right)$, for the set of matching projected tuples from each input, denoted $\prodsize', \prodsize''$. Note that we denote the $i^{th}$ vector as $\vect_i$ and the sketch of the $i^{th}$ vector $\sk^{\vect_i}$.
|
||||
|
||||
|
|
66
sop.tex
66
sop.tex
|
@ -35,6 +35,11 @@ Fix the variables $\wElem_1,\ldots, \wElem_{\prodsize}$. Define $\dist$ to be th
|
|||
1 & \dist = 1.
|
||||
\end{cases}
|
||||
\end{align*}
|
||||
|
||||
\AH{Oliver had suggested that we change the proof to lemma 1...I think he wanted to allow for values of $i$ to be within the set of integers, not just restricted to $0 < i \leq k$.
|
||||
Even if we don't change the lemma, I think the proof itself is inaccurate and needs to be rewritten.}
|
||||
|
||||
|
||||
We obtain the final equality by \cref{lem:exp-sine}, which states that the only way in expectation that $\sine(\wElem_{\ell})^{e_{\ell}}$ can be something other than $0$ is when $e_{\ell} = \prodsize$. It can further be seen that the only way this can happen is when $\dist = \prodsize$.
|
||||
|
||||
Notice, that the above leaves us with the only remaining condition that $\forall i, j \in [\prodsize], \wElem_i = \wElem_j$,
|
||||
|
@ -59,17 +64,9 @@ Substituting in the definition of variance for complex numbers,
|
|||
&= \sum_j \sigsq_j + \sum_{j \neq j'}\cvar{j, j'} \label{eq:sigsq}
|
||||
\end{align}
|
||||
|
||||
Notice that assuming independence of $\sigsq_j ~\forall j \in \sketchCols$, we can push the variance through the sum and obtain the result
|
||||
\begin{align*}
|
||||
&\sigsq - \sum_j \sigsq_j = \cvar{j, j'}\\
|
||||
&\implies \cvar{j, j'} \leq 0.
|
||||
\end{align*}
|
||||
\AH{The implication above was discussed months ago, but I don't see how it's true. Is it true?}
|
||||
|
||||
One can see that \cref{eq:sigsq} is composed of two addends. We now bound each of them separately.
|
||||
\subsection{Bounding $\cvar{j, j'}$}
|
||||
|
||||
\AR{You need to re-write the stuff below. First in the 2nd equality suddenly the sum on $j\ne j'$ has vanished. Also I think you should first analyze $\lambda(j,j')$ for both $j=j'$ and $j\ne j'$ for as long as you can. Only when it is needed should you divide into the two cases-- do not do the division up front.}
|
||||
Notice we have two cases of $\cvar{j, j'}$, the first is when $j = j'$, i.e. $(\sigsq_j)$, and the second when $j \neq j'$.
|
||||
\begin{align}
|
||||
\cvar{j, j'} &= \ex{\est_j \cdot \conj{\est_{j'}}} - \ex{\est_j}\cdot\ex{\conj{\est_{j'}}}\nonumber\\
|
||||
|
@ -79,17 +76,56 @@ Notice we have two cases of $\cvar{j, j'}$, the first is when $j = j'$, i.e. $(\
|
|||
&= \sum_{\substack{\wElem_1,\cdots,\wElem_\prodsize,\\\wElem'_1,\cdots,\wElem'_\prodsize\\\in W}}\prod_{i = 1}^{\prodsize}v_i(\wElem_i)v_i(\wElem'_i)\ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\conj{s(\wElem'_i)}\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}} - \prod_{i = 1}^{\prodsize}v_i(\wElem_i)\ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\ind{h(\wElem_i) = j}}\cdot \prod_{i = 1}^{\prodsize}v_i(\wElem'_i)\ex{\prod_{i = 1}^{\prodsize}\conj{s(\wElem'_i)}\ind{h(\wElem_i') = j'}}\nonumber\\
|
||||
&= \sum_{\substack{\wElem_1,\cdots,\wElem_\prodsize,\\\wElem'_1,\cdots,\wElem'_\prodsize\\\in W}}\prod_{i = 1}^{\prodsize}v_i(\wElem_i)v_i(\wElem'_i)\left(\ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\conj{s(\wElem'_i)}\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}} - \ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\ind{h(\wElem_i) = j}}\cdot\ex{\prod_{i = 1}^{\prodsize}\conj{s(\wElem'_i)}\ind{h(\wElem_i') = j'}} \right).\label{eq:var-lambda-j-j'}
|
||||
\end{align}
|
||||
\AH{How can I present the derivation of the bounds below in a \textit{better} way?}
|
||||
\AH{Have subsections for $j = j'$ and $j \neq j'$}
|
||||
\AH{A better argument might be: Let us assume that there $i \neq i'$, $w_i \neq w'_i$ which is $0$ in expectation. Don't forget parameters for $\term_1$ and change this notation. The cardinal rule is: if you don't need it, then don't use it (notation). Define $\term_1$ once and then use later.}
|
||||
Equation ~\eqref{eq:var-lambda-j-j'} for $j \neq j'$ bounds to the rightmost sum of \cref{eq:sigsq}. For $\term_1^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)s(\wElem'_i)\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}}$, because hash function $h$ cannot bucket the same world to two different buckets, the only instance $\term_1^{\cvar{j, j'}} = 1$ occurs when there is no overlap between the $\wElem_i$ and $\wElem'_i$ variables. Given the condition of no overlap, $\term_1^{\cvar{j, j'}} = 1$ only with the further condition that $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem', \wElem \neq \wElem'$. Notice, however, given the conditions, the product of the remaining expectations will cancel this out. Looking at the remaining two expectations $\term_2^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i) \ind{\hfunc(\wElem_i) = j}} \cdot \ex{\prod_{i = 1}^{\prodsize}\conj{\sine(\wElem'_i)} \ind{\hfunc(\wElem'_i) = j'}}$, that $\term_2^{\cvar{j, j'}} = 1$ only when $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem'$. Taken together, the constraints leave us with only one possible case for $\term_1^{\cvar{j, j'}} - \term_2^{\cvar{j, j'}} \neq 0$, when all variables are the same world. Thus,
|
||||
|
||||
\subsection{$\cvar{j, j'}~|~j \neq j'$}
|
||||
For notational convenience set
|
||||
\begin{align*}
|
||||
\term_1\left(\wElem_1,\ldots, \wElem_{\prodsize}\right) = &\ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i)\conj{\sine(\wElem'_i)}\ind{\hfunc(\wElem_i) = j} \ind{\hfunc(\wElem'_i) = j'}}\\
|
||||
\term_2\left(\wElem_1,\ldots, \wElem_{\prodsize}\right) = &\ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i)\ind{\hfunc(\wElem_i) = j}} \cdot \ex{\prod_{i = 1}^{\prodsize}\conj{\sine(\wElem'_i)}\ind{\hfunc(\wElem'_i) = j'}}
|
||||
\end{align*}
|
||||
Focusing on $\term_1$, observe that $\term_1 = 1$ if and only if all the $\wElem_i$'s are equal, all the $\wElem'_i$'s are equal, and the two groups of variables do not equal each other,
|
||||
\begin{equation*}
|
||||
\term_1\left(\wElem_1,\ldots, \wElem_{\prodsize}\right) =
|
||||
\begin{cases}
|
||||
1 &\text{if } \forall i, j \in [\prodsize], \wElem_i \neq \wElem'_j, \wElem_i = \wElem_j, \wElem'_i = \wElem'_j\\
|
||||
0 &otherwise.
|
||||
\end{cases}
|
||||
\end{equation*}
|
||||
|
||||
Focusing on $\term_2$, it can be seen that $\term_2 = 1$ when we have that all $\wElem_i$'s are equal, and all $\wElem'_i$'s are equal,
|
||||
\begin{equation*}
|
||||
\term_2\left(\wElem_1,\ldots, \wElem_{\prodsize}\right) =
|
||||
\begin{cases}
|
||||
1 &\text{if } \forall i, j \in [\prodsize], \wElem_i = \wElem_j, \wElem'_i = \wElem'_j\\
|
||||
0 &otherwise.
|
||||
\end{cases}
|
||||
\end{equation*}
|
||||
\underline{Case 1:}
|
||||
Equation ~\eqref{eq:var-lambda-j-j'} for $j \neq j'$ bounds to the rightmost sum of \cref{eq:sigsq}.\newline
|
||||
Assume that $\exists i, j \in [k] ~|~ \wElem_i = \wElem'_j$. Then $\term_1 = 0$, and if we have that all $\wElem_i$ are equal, and all $\wElem'_i$ are equal, $\term_2 = 1$, and otherwise $\term_2 = 0$. \newline
|
||||
\underline{Case 2:}
|
||||
Alternatively, assume that $\forall i \in [\prodsize] \nexists j ~|~ \wElem_i = \wElem'_j$. Then, if $\forall i, j \in [\prodsize] ~|~ \wElem_i = \wElem_j, \wElem'_i = \wElem'_j$, $\term_1 = 1$, and $0$ otherwise. Should $\term_1 =1$, then it is certain that $\term_2 = 1$. Should $\term_1 = 0$, then it has to be that $\term_2 = 0$.
|
||||
|
||||
Thus, the only time that $\term_1 - \term_2 \neq 0$ is when we have that all $\wElem_i = \wElem'_i = \wElem$. \newline
|
||||
%$\startOld{'Proof'/reasoning}$
|
||||
%For $\term_1^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)s(\wElem'_i)\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}}$, because hash function $h$ cannot bucket the same world to two different buckets, the only instance $\term_1^{\cvar{j, j'}} = 1$ occurs when there is no overlap between the $\wElem_i$ and $\wElem'_i$ variables. Given the condition of no overlap, $\term_1^{\cvar{j, j'}} = 1$ only with the further condition that $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem', \wElem \neq \wElem'$. Notice, however, given the conditions, the product of the remaining expectations will cancel this out. Looking at the remaining two expectations $\term_2^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i) \ind{\hfunc(\wElem_i) = j}} \cdot \ex{\prod_{i = 1}^{\prodsize}\conj{\sine(\wElem'_i)} \ind{\hfunc(\wElem'_i) = j'}}$, that $\term_2^{\cvar{j, j'}} = 1$ only when $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem'$. Taken together, the constraints leave us with only one possible case for $\term_1^{\cvar{j, j'}} - \term_2^{\cvar{j, j'}} \neq 0$, when all variables are the same world.
|
||||
%$\finOld$
|
||||
|
||||
Therefore,
|
||||
\begin{align}
|
||||
&\sum_{j \neq j'}\cvar{j, j'} = - \frac{1}{B^2}\sum_{\wElem \in W}\prod_{i = 1}^{\prodsize}v_i^2(\wElem)\label{eq:cvar-bound}.
|
||||
\underset{j \neq j'}{\cvar{j, j'}} = &\sum_{\wElem \in \wSet}\prod_{i = 1}^{\prodsize}\vect_i^2(\wElem)\left(\term_1 - \term_2\right)\nonumber\\
|
||||
= &\sum_{\wElem \in \wSet}\prod_{i = 1}^{\prodsize} \vect_i^2(\wElem)\left(0 - \frac{1}{\sketchCols^2}\right)\nonumber\\
|
||||
= &- \frac{1}{B^2}\sum_{\wElem \in W}\prod_{i = 1}^{\prodsize}v_i^2(\wElem)\label{eq:cvar-bound}.
|
||||
\end{align}
|
||||
Based on the results of \cref{eq:cvar-bound}, we deduce the following,
|
||||
\begin{align*}
|
||||
&\sigsq - \sum_j \sigsq_j = \cvar{j, j'}\\
|
||||
&\implies \sigsq \leq \sum_j \sigsq_j.
|
||||
\end{align*}
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
|
||||
\subsection{$\cvar{j, j'}~|~j = j'$}
|
||||
Taking a look at the leftmost term of \cref{eq:sigsq}, we establish bounds the variance of the $j^{\text{th}}$ bucket of a $\prodsize$-way join. Note in this case that \cref{eq:var-lambda-j-j'} has that $j = j'$, and can be written in the following way,
|
||||
|
||||
\begin{align}
|
||||
|
@ -106,7 +142,7 @@ Taking a look at the leftmost term of \cref{eq:sigsq}, we establish bounds the v
|
|||
\end{align}
|
||||
|
||||
Before proceeding, we introduce some notation and terminology that will aid in communicating the bounds we are about to establish. We refer to the leftmost expectation of \cref{eq:sig-j-last} in the following way:
|
||||
\AR{dangling eq ref}\AH{I don't see one}
|
||||
|
||||
\[\term_1\left(\wElem_1,\ldots,\wElem_\prodsize, \wElem_1',\ldots, \wElem_\prodsize'\right) = \ex{\prod_{i = 1}^\prodsize s(w_i)\overline{s(w'_i)}\ind{h(w_i) = j}\ind{h(w'_i) = j}}.%\text{, and}
|
||||
\]
|
||||
%\[\term_2\left(\wElem_1,\ldots,\wElem_\prodsize, \wElem_1',\ldots, \wElem_\prodsize'\right) = \ex{\prod_{i = 1}^ks(w_i)\ind{h(w_i) = j}}\cdot \ex{\prod_{i = 1}^\prodsize\overline{s(w'_i)}\ind{h(w'_i) = j}}. \]
|
||||
|
|
Loading…
Reference in New Issue