Changes per 032720 meeting.

This commit is contained in:
Aaron Huber 2020-03-31 11:52:00 -04:00
parent f542d1daf8
commit fcd84c8c2c

46
sop.tex
View file

@ -14,18 +14,19 @@ We now seek to bound the variance of a k-way join.
\ex{\prod_{i = 1}^ks(w_i)\ind{h(w_i) = j}}\cdot \ex{\prod_{i = 1}^k\overline{s(w'_i)}\ind{h(w'_i) = j}} \right)\label{eq:sig-j-last}.
\end{align}
Before proceeding, we introduce some notation that will aid in communicating the bounds we are about to establish. First note, that the only terms that survive the expectation above are mappings of $w_i = w'_j = w$ for $i, j \in [k]$, such that each $w_i$ has a match, i.e., no $w_i$ or $w'_j$ stands alone without a matching world in its complimentary set. To help describe all possible matchings we use m-tuples and functions $f$ and $f'$.
Before proceeding, we introduce some notation that will aid in communicating the bounds we are about to establish. First note, that the only terms that survive the expectation above are mappings of $w_i = w'_j = w$ for $i, j \in [k]$, such that each $w_i$ has a match, i.e., no $w_i$ or $w'_j$ stands alone without a matching world in its complimentary set.
\subsection{M-tuples}
\begin{Definition}
Given a $k$-way join, define $m \in [k]$. An m-tuple then is a set of tuples, each tuple conatining $m$ elements, such that the values of each tuple sum up to $m$, i.e. $\forall i \in [m], \sum_j m_{t_{i, j}} = m$, where i is the $i^{th}$ tuple in $m_t$, and $j$ is the $j^{th}$ index of that tuple $t$. The set consists of each unique sum up to symmetry, meaning a tuple with the same elements only reversed is disallowed.
\end{Definition}
For example, when $k = 4$, $m = 2$, the m-tuple, denoted, $m_2$, would be$\left\{\left(1, 3\right), \left(2, 2\right)\right\}$. Here, $m_{2_{1, 1}} = 1$, and while the tuple $\left(3, 1\right)$ sums up to $k = 4$, we do not include it since we have it's symmetrical term $\left(1, 3\right)$.
\AR{Why is the definition of M-tuples needed? From what I understand you need this to define what kinds of $f$ and $f'$ are allowed but in that case why not state those properties directly in terms of $f$ and $f'$? Actually after reading the next section, I do not see why these properties are needed at all..}
\AH{I use the m-tuples to explain 1) what kind of matchings survive and 2) that $f, f'$ must only cross product from within the matchings of the same tuple. Maybe there is an easier way to do this.}
%\subsection{M-tuples}
%\begin{Definition}
%Given a $k$-way join, define $m \in [k]$. An m-tuple then is a set of tuples, each tuple conatining $m$ elements, such that the values of each tuple sum up to $m$, i.e. $\forall i \in [m], \sum_j m_{t_{i, j}} = m$, where i is the $i^{th}$ tuple in $m_t$, and $j$ is the $j^{th}$ index of that tuple $t$. The set consists of each unique sum up to symmetry, meaning a tuple with the same elements only reversed is disallowed.
%\end{Definition}
%For example, when $k = 4$, $m = 2$, the m-tuple, denoted, $m_2$, would be$\left\{\left(1, 3\right), \left(2, 2\right)\right\}$. Here, $m_{2_{1, 1}} = 1$, and while the tuple $\left(3, 1\right)$ sums up to $k = 4$, we do not include it since we have it's symmetrical term $\left(1, 3\right)$.
%
%\AR{Why is the definition of M-tuples needed? From what I understand you need this to define what kinds of $f$ and $f'$ are allowed but in that case why not state those properties directly in terms of $f$ and $f'$? Actually after reading the next section, I do not see why these properties are needed at all..}
%\AH{I use the m-tuples to explain 1) what kind of matchings survive and 2) that $f, f'$ must only cross product from within the matchings of the same tuple. Maybe there is an easier way to do this.}
\subsection{f, f'}
To help describe all possible matchings we introduce functions $f$ and $f'$.
\begin{Definition}
Functions f, f' are the set of surjective mappings from $k$ to $m$ elements: $f: [k] \rightarrow [m].$
\end{Definition}
@ -39,21 +40,42 @@ Functions f, f' are the set of surjective mappings from $k$ to $m$ elements: $f:
%\end{equation*}
The functions $f, f'$ are used to produce the mappings $w_i \mapsto \widetilde{w_{f(i)}}$.
\begin{Definition}
Functions $f:[k]\mapsto [m], f':[k]\mapsto [m]$ are said to be matching if and only if
\begin{enumerate}
\item m = m'
\item $\forall i \in [m], |f^{-1}(i)| = |f'^{-1}(i)|$, or a symmetrical mapping exists, where $\forall i \in [m], \exists i' \in [m]$ such that $i'$ is unique, $|f^{-1}(i)| = |f^{-1}(i')|$.
\end{enumerate}
\end{Definition}
We rewrite equation \eqref{eq:sig-j-last} in terms of $m$ distinct worlds, with $f, f'$ mappings.
\begin{equation*}
\sum_{m \in [k]}\sum_{m' \in [k]}\sum_{f, f'}\sum_{\substack{\wElem_1, \cdots,\wElem_m,\\\wElem'_1,\cdots,\wElem'_{m'}\\ \in W}}\prod_{i = 1}^{k}\vect_i(\widetilde{\wElem_{f(i)}})\vect_i(\widetilde{\wElem'_{f'(i)}})\cdot\left( \ex{\prod_{i = 1}^k \sine(\widetilde{\wElem_{f(i)}}\conj{\sine(\wElem'_{f'(i)})}\ind{h(\widetilde{\wElem_{f(i)}}) = j}\ind{h(\widetilde{w'_{f'(i)}}) = j}} -
\ex{\prod_{i = 1}^k \sine(\wElem_{f(i)})\ind{h(\widetilde{\wElem_{f(i)}}) = j}}\cdot \ex{\prod_{i = 1}^k\conj{\sine(\wElem'_{f'(i)})}\ind{h(\widetilde{w'_{f'(i)}}) = j}} \right)\label{eq:sig-j-distinct}
\end{equation*}
\begin{Lemma}
The only terms surviving the expectation of equation \eqref{eq:sig-j-last} are those with $f, f'$ matching, where $\forall j \in[m], \widetilde{\wElem_j} = \widetilde{\wElem'_j}$.
\end{Lemma}
The proof is immediate and follows from the fact that the random $\sine$ functions are only guaranteed to produce a product of one under one of two possible conditions:
\begin{enumerate}
\item $\sine(\wElem)^k = 1$,
\item $\sine(\wElem) \conj{\sine(\wElem)} = 1$.\qed
\end{enumerate}
In particular, $f$ and $f'$ are machinery for mapping $k$ $\wElem$-world variables to $m$ distinct values. Note that for a given $m$, we may have several ways to map $k$ worlds to $m$ distinct values.
\AH{Here is where I have attempted to use prose to discuss the restrictions on $f$ and $f'$, rather than the use of m-tuples. Maybe there is a better, cleaner formal way?}
E.g., for $k = 4, m = 2$, mappings could be such that one $\wElem_i$ is distinct, while the other three $\wElem_i$ are mapped to the other distinct value. Additionally, we would have the case where two $\wElem_i$ map to a distinct value, while the other two $\wElem_i$ map to a seperate distinct world. The expectations of equation \eqref{eq:sig-j-last} restrict $f$ and $f'$ to belonging to the same class of $m$-mapping, meaning, if the mapping $f$ for $k = 4, m = 2$ is in the setting of one distinct world and three equal world values, then $f'$ must be from that same set of mappings, and not from another class of mappings, such as when two $w_i$ map to a distinct world, while the other two $w_i$ map to a separate distinct world.
\AH{Here is the use of m-tuples to explain the same thing.}
In the example above, $f$ mappings for $m_{2_1}$ may only cross product with $f'$ mappings for $m_{2_1}$ and not with those for $m_{2_2}$. Likewise for $f, f'$ mappings of $m_{2_2}$.
% In the example above, $f$ mappings for $m_{2_1}$ may only cross product with $f'$ mappings for $m_{2_1}$ and not with those for $m_{2_2}$. Likewise for $f, f'$ mappings of $m_{2_2}$.
Using the above definitions, we can now present the variance bounds for $\sigsq_j$ based on \eqref{eq:sig-j-last}.
By the fact that the expectations cancel when $\forall i, i', j, j'\in [k], \wElem_i = \wElem_j =/\neq \wElem_{i'}' = \wElem_{j'}'$, we can rid ourselves only one distinct world value. We then need to sum up all the $m$ distinct world value possibilities for $m \in [2, k]$. From equation \eqref{eq:sig-j-last}, starting with $m$ = 2, for one $f$ and one $f'$ from the same class of mappings, we get
By the fact that the expectations cancel when $\forall i, i', j, j'\in [k], \wElem_i = \wElem_j =/\neq \wElem_{i'}' = \wElem_{j'}'$, we can rid ourselves of the case when there exists only one distinct world value. We then need to sum up all the $m$ distinct world value possibilities for $m \in [2, k]$. From equation \eqref{eq:sig-j-last}, starting with $m$ = 2, for one $f$ and one $f'$ from the same class of mappings, we get
\begin{equation*}
\frac{1}{\sketchCols^2}\sum_{\widetilde{\wElem_1}, \widetilde{\wElem_2}}\prod_{i = 1}^{k}\vect_i(\widetilde{\wElem_{f(i)}})\vect_i(\widetilde{\wElem_{f'(i)}}).
\end{equation*}
This is because we know that the expectation from \eqref{eq:sig-j-last} will survive when we have mappings that produce pairs of the form $\sine(\wElem)\conj{\sine(\wElem)}$. With two distinct variables, the indicator variables in the expectation yield $\frac{1}{\sketchCols}\cdot \frac{1}{\sketchCols}$.
This is because we know that the expectation from \eqref{eq:sig-j-last} will survive when we have mappings that produce pairs of the form $\sine(\wElem)\conj{\sine(\wElem)}$. Second, in consideration of the randomized hashing, with two distinct variables, the indicator variables in the expectation yield $\frac{1}{\sketchCols}\cdot \frac{1}{\sketchCols}$.
We need to sum over all mappings for each case (c) when the number of distinct values is $m = 2$, resulting in
\begin{equation*}