Ordering requirement to omit symmetric functions

This commit is contained in:
Aaron Huber 2020-04-09 10:43:06 -04:00
parent 257aa9dbdb
commit e78cb83be7
2 changed files with 34 additions and 2 deletions

View file

@ -65,6 +65,7 @@
\newcommand{\dist}{m}
\newcommand{\dupSize}{j}
\newcommand{\dMap}[1]{\widetilde{#1}}
\newcommand{\order}{O}
%
%number of joins/products

35
sop.tex
View file

@ -30,6 +30,10 @@ We will use the vocabulary 'term' to denote $\prod_{i = 1}^{\prodsize}\vect_i(\w
%\AH{I use the \dist-tuples to explain 1) what kind of matchings survive and 2) that $f, f'$ must only cross product from within the matchings of the same tuple. Maybe there is an easier way to do this.}
\subsection{f, f'}
\begin{Definition}
Define and then fix the lexographical ordering of distinct world elements to be the total ordering of the elements in $[\dist]$ such that $\forall i < j \in [\dist], \widetilde{\wElem_i} < \widetilde{\wElem_j}$.
%Given a fixed order $\wSet_{\order}: \left(\wSet, \wSet\right)\mapsto \mathbb{B}$ of possible worlds, define the lexographical order of distinct worlds $\wSet_\dist$ to be the ordering which complies to the identity mapping of elements in $[\prodsize]$ to elements in $[\dist]$ up to $\dist$, such that . In other worlds, $\forall \wElem, \wElem' \in \wSet_\dist, \widetilde{\wElem} < \wElem' \leftrightarrow \wSet_{\order}\left(\wElem, \wElem'\right) = T$.
\end{Definition}
To help describe all possible matchings we introduce functions $f$ and $f'$.
\begin{Definition}
Functions f, f' are the set of surjective mappings from $\prodsize$ to $\dist$ elements: $f: [\prodsize] \rightarrow [\dist], f': [\prodsize] \rightarrow [\dist'].$
@ -49,8 +53,9 @@ We rewrite equation \eqref{eq:sig-j-last} in terms of $\dist$ distinct worlds, w
\sum_{\dist \in [\prodsize]}\sum_{\dist' \in [\prodsize]}\sum_{f, f'}\sum_{\substack{\dMap{\wElem_1}, \ldots,\dMap{\wElem_\dist},\\\dMap{\wElem'_1},\ldots,\dMap{\wElem'_{\dist'}}\\ \in W}}\prod_{i = 1}^{\prodsize}\vect_i(\dMap{\wElem_{f(i)}})\vect_i(\dMap{\wElem'_{f'(i)}})\cdot\left( \ex{\prod_{i = 1}^\prodsize \sine(\dMap{\wElem_{f(i)}}\conj{\sine(\dMap{\wElem'_{f'(i)}})}\ind{h(\dMap{\wElem_{f(i)}}) = j}\ind{h(\dMap{w'_{f'(i)}}) = j}} -
\ex{\prod_{i = 1}^\prodsize \sine(\dMap{\wElem_{f(i)}})\ind{h(\dMap{\wElem_{f(i)}}) = j}}\cdot \ex{\prod_{i = 1}^\prodsize\conj{\sine(\dMap{\wElem'_{f'(i)}})}\ind{h(\dMap{w'_{f'(i)}}) = j}} \right)\label{eq:sig-j-distinct}
\end{equation}
The reason \cref{eq:sig-j-last} $\equiv$ \cref{eq:sig-j-distinct} is because the only surviving terms in $\term_1 - \term_2$ are bijective mappings of $\dist < \prodsize$ distinct pairs between $\wElem_1\ldots\wElem_\prodsize$ and $\wElem_1'\ldots\wElem_\prodsize'$. Another way of saying this is that the only surviving terms of $\term_1 - \term_2$ are those for which we have $\dist$ distinct world values such that the same cardianlity of variables in $\wElem_1\ldots\wElem_\prodsize$ that are mapped to distinct world $\wElem _i$ $\left(\forall i \in [\dist]\right)$ is the same as the cardinality of variables mapped from $\wElem_1'\ldots\wElem_\prodsize'$.\newline
Note that for a given $\dist$, we may have several ways to map $\prodsize$ worlds to $\dist$ distinct values. We need to define what if means for $f$ and $f'$ to be matching.
The fact that \cref{eq:sig-j-last} $\equiv$ \cref{eq:sig-j-distinct} follows since \cref{eq:sig-j-distinct} is simply a rearrangement of the addends in the sum.
%The reason \cref{eq:sig-j-last} $\equiv$ \cref{eq:sig-j-distinct} is because the only surviving terms in $\term_1 - \term_2$ are bijective mappings of $\dist < \prodsize$ distinct pairs between $\wElem_1\ldots\wElem_\prodsize$ and $\wElem_1'\ldots\wElem_\prodsize'$. Another way of saying this is that the only surviving terms of $\term_1 - \term_2$ are those for which we have $\dist$ distinct world values such that the same cardianlity of variables in $\wElem_1\ldots\wElem_\prodsize$ that are mapped to distinct world $\wElem _i$ $\left(\forall i \in [\dist]\right)$ is the same as the cardinality of variables mapped from $\wElem_1'\ldots\wElem_\prodsize'$.\newline
%Note that for a given $\dist$, we may have several ways to map $\prodsize$ worlds to $\dist$ distinct values. We need to define what if means for $f$ and $f'$ to be matching.
\begin{Definition}
Functions $f:[\prodsize]\mapsto [\dist], f':[\prodsize]\mapsto [\dist']$ are said to be matching, denoted $\match{f}{f'}$, if and only if
@ -61,6 +66,30 @@ Functions $f:[\prodsize]\mapsto [\dist], f':[\prodsize]\mapsto [\dist']$ are sai
\end{enumerate}
\end{Definition}
To avoid double counting, we impose an ordering on the set of functions $f, f'$ to omit symmetrical mappings.
\begin{Definition}
For every $i, j \in [\dist]~|~ i < j$, the numerical value of the concatenation of the numerically ordered elements of $f^{-1}(i)$ < the numerical value of the concatenation of the numerically ordered elements of $f^{-1}(j)$, where $<$ is the order of the natural numbers.
\end{Definition}
We illustrate with an example. Consider a join of $k = 3$ tuples, where $\dist = 2$, and we have that $f^{-1}(1) = 1$ and $f^{-1}(2) = 2$. Imposing the above ordering yields the following set of unique functions:
\begin{align*}
f_1 = \begin{cases}
1 \mapsto 1 &\implies\wElem_1 \mapsto \dMap{\wElem_1}\\
2, 3 \mapsto 2 &\implies\wElem_2, \wElem_3 \mapsto \dMap{\wElem_2}
\end{cases}\\
f_2 = \begin{cases}
2 \mapsto 1 &\implies\wElem_2 \mapsto \dMap{\wElem_1}\\
1, 3 \mapsto 2 &\implies\wElem_1, \wElem_3 \mapsto \dMap{\wElem_2}
\end{cases}\\
f_3 = \begin{cases}
3 \mapsto1 &\implies\wElem_3 \mapsto \dMap{\wElem_1}\\
1, 2 \mapsto 2 &\implies\wElem_1, \wElem_2 \mapsto \dMap{\wElem_2}
\end{cases}
\end{align*}
The above mappings satisfy the ordering constraint so that for $f_1$, $1 < 23$, for $f_2$, $2 < 13$, and for $f_3$, $3 < 12$.
Note that above orderings share no symmetry, while the symmetrical versions of the above, which are the orderings for the case when $f^{-1}(1) = 2$ and $f^{-1}(2) = 1$, break our above ordering requirements, and are therefore disallowed, thus avoiding double counting. Another way of saying this is that the preimage sizes follow the natural order of their respective counterparts in the image. For the case when the two are equal, we need a more defined order, and can distinguish using the same ideaology as first described.
\begin{Lemma}\label{lem:sig-j-survive}
The only terms surviving $\term_1 - \term_2$ are those with $f, f'$ matching, where $\forall j \in[\dist], \dMap{\wElem_j} = \dMap{\wElem'_j}$.
\end{Lemma}
@ -153,6 +182,8 @@ By the same arguments as before, we have at least one distinct world value in ea
%\AH{Here is the use of \dist-tuples to explain the same thing.}
% In the example above, $f$ mappings for $\dist_{2_1}$ may only cross product with $f'$ mappings for $\dist_{2_1}$ and not with those for $\dist_{2_2}$. Likewise for $f, f'$ mappings of $\dist_{2_2}$.
We now seek to show that when $f, f'$ are matching, that $\term_1 - \term_2$ will always equal 1.
Using the above definitions, we can now present the variance bounds for $\sigsq_j$ based on \eqref{eq:sig-j-distinct}.
By the fact that the expectations cancel when $\forall i, i', j, j'\in [\prodsize], \wElem_i = \wElem_j = \wElem_{i'}' = \wElem_{j'}' = \wElem$, we can rid ourselves of the case when there exists only one distinct world value. We then need to sum up all the $\dist$ distinct world value possibilities for $\dist \in [2, \prodsize]$. Note that the number of distinct values $\dist$ affects the randomness of the hash function $\hfunc$. E.g. only $\dist = 2$ distinct values will yield $\frac{1}{\sketchCols} \cdot \frac{1}{\sketchCols} = \frac{1}{\sketchCols^2} = \frac{1}{\sketchCols^\dist}$. By lemma \ref{lem:sig-j-survive} and equation \eqref{eq:sig-j-distinct} we get