Started to add comments on SOP section.

This commit is contained in:
Atri Rudra 2020-04-10 10:58:08 -04:00
parent e78cb83be7
commit 0691cc8298

25
sop.tex
View file

@ -1,5 +1,7 @@
%root--main.tex
\section{Sum of Products Analysis}
\AR{You should do the analysis for $\lambda(j,j')$ instead of just $\sigma_j^2=\lambda(j,j)$. The former is not that different from what you have below and you'll need to do it when comoputing $\sigma^2$ in any case.}
We now seek to bound the variance of a $\prodsize$-way join.
\begin{align}
&\sigsq_j = \ex{\est_j \cdot \overline{\est_j}} - \ex{\est_j} \cdot \ex{\overline{\est_j}} \nonumber\\
@ -18,7 +20,10 @@ Before proceeding, we introduce some notation and terminology that will aid in
\[\term_1\left(\wElem_1,\ldots,\wElem_\prodsize, \wElem_1',\ldots, \wElem_\prodsize'\right) = \ex{\prod_{i = 1}^\prodsize s(w_i)\overline{s(w'_i)}\ind{h(w_i) = j}\ind{h(w'_i) = j}} \text{, and}\]
\[\term_2\left(\wElem_1,\ldots,\wElem_\prodsize, \wElem_1',\ldots, \wElem_\prodsize'\right) = \ex{\prod_{i = 1}^ks(w_i)\ind{h(w_i) = j}}\cdot \ex{\prod_{i = 1}^\prodsize\overline{s(w'_i)}\ind{h(w'_i) = j}}. \]
We will use the vocabulary 'term' to denote $\prod_{i = 1}^{\prodsize}\vect_i(\wElem_i)\vect_i(\wElem_i') \cdot\left(\term_1 - \term_2\right)$ given a specific set of world values. To say that a term survives the expectation is to mean that $\term_1 - \term_2 \neq 0$. Note, that the only terms that survive the expectation above are mappings of $w_i = w'_j = w$ for $i, j \in [\prodsize]$, such that each $w_i$ has a match, i.e., no $w_i$ or $w'_j$ stands alone without a matching world in its complimentary set. In other words, the set of values in $\wElem_1,\ldots,\wElem_k$ has a bijective mapping to the set of values in $\wElem'_1,\ldots,\wElem'_k$.
\AR{Sorry I missed this earlier but I do not think you need the $T_2$ term since you ``cancel" out their sum from the corresponding terms in the sum of the $T_1$ terms.}
We will use the vocabulary 'term' to denote $\prod_{i = 1}^{\prodsize}\vect_i(\wElem_i)\vect_i(\wElem_i') \cdot\left(\term_1 - \term_2\right)$ given a specific set of world values. To say that a term survives \AR{Yoou should not care about whether the $T_1$ term survives or not. See the above comment on why.} the expectation is to mean that $\term_1 - \term_2 \neq 0$. Note, that the only terms that survive the expectation above are mappings of $w_i = w'_j = w$ for $i, j \in [\prodsize]$, such that each $w_i$ has a match, i.e., no $w_i$ or $w'_j$ stands alone without a matching world in its complimentary set. In other words, the set of values in $\wElem_1,\ldots,\wElem_k$ has a bijective mapping to the set of values in $\wElem'_1,\ldots,\wElem'_k$.
\AR{I am not sure this last sentence is needed here or not. I think it probably is more confusing withouot the details that are forthcoming. I think you can just give a foward pointer to Lemma~\ref{lem:sig-j-survive} here.}
%\subsection{M-tuples}
%\begin{Definition}
@ -53,7 +58,9 @@ We rewrite equation \eqref{eq:sig-j-last} in terms of $\dist$ distinct worlds, w
\sum_{\dist \in [\prodsize]}\sum_{\dist' \in [\prodsize]}\sum_{f, f'}\sum_{\substack{\dMap{\wElem_1}, \ldots,\dMap{\wElem_\dist},\\\dMap{\wElem'_1},\ldots,\dMap{\wElem'_{\dist'}}\\ \in W}}\prod_{i = 1}^{\prodsize}\vect_i(\dMap{\wElem_{f(i)}})\vect_i(\dMap{\wElem'_{f'(i)}})\cdot\left( \ex{\prod_{i = 1}^\prodsize \sine(\dMap{\wElem_{f(i)}}\conj{\sine(\dMap{\wElem'_{f'(i)}})}\ind{h(\dMap{\wElem_{f(i)}}) = j}\ind{h(\dMap{w'_{f'(i)}}) = j}} -
\ex{\prod_{i = 1}^\prodsize \sine(\dMap{\wElem_{f(i)}})\ind{h(\dMap{\wElem_{f(i)}}) = j}}\cdot \ex{\prod_{i = 1}^\prodsize\conj{\sine(\dMap{\wElem'_{f'(i)}})}\ind{h(\dMap{w'_{f'(i)}}) = j}} \right)\label{eq:sig-j-distinct}
\end{equation}
\AR{Use the $T_1$ notation that you have already defined.}
The fact that \cref{eq:sig-j-last} $\equiv$ \cref{eq:sig-j-distinct} follows since \cref{eq:sig-j-distinct} is simply a rearrangement of the addends in the sum.
\AR{I think more details here would be good. Among others this would also show why the functions $f$ and $f'$ make ``sense."}
%The reason \cref{eq:sig-j-last} $\equiv$ \cref{eq:sig-j-distinct} is because the only surviving terms in $\term_1 - \term_2$ are bijective mappings of $\dist < \prodsize$ distinct pairs between $\wElem_1\ldots\wElem_\prodsize$ and $\wElem_1'\ldots\wElem_\prodsize'$. Another way of saying this is that the only surviving terms of $\term_1 - \term_2$ are those for which we have $\dist$ distinct world values such that the same cardianlity of variables in $\wElem_1\ldots\wElem_\prodsize$ that are mapped to distinct world $\wElem _i$ $\left(\forall i \in [\dist]\right)$ is the same as the cardinality of variables mapped from $\wElem_1'\ldots\wElem_\prodsize'$.\newline
%Note that for a given $\dist$, we may have several ways to map $\prodsize$ worlds to $\dist$ distinct values. We need to define what if means for $f$ and $f'$ to be matching.
@ -72,6 +79,8 @@ To avoid double counting, we impose an ordering on the set of functions $f, f'$
For every $i, j \in [\dist]~|~ i < j$, the numerical value of the concatenation of the numerically ordered elements of $f^{-1}(i)$ < the numerical value of the concatenation of the numerically ordered elements of $f^{-1}(j)$, where $<$ is the order of the natural numbers.
\end{Definition}
\AR{As I mentioned in my email, this definition is too confusing. If you need like 1/4th of a page of explain a defintion you should re-consider if you should use it in the first place. I think putting a lex order on $\tilde{w}_1\prec \tilde{w}_2\prec \cdots \prec \tilde{w}_m$ is a better way to go.}
We illustrate with an example. Consider a join of $k = 3$ tuples, where $\dist = 2$, and we have that $f^{-1}(1) = 1$ and $f^{-1}(2) = 2$. Imposing the above ordering yields the following set of unique functions:
\begin{align*}
f_1 = \begin{cases}
@ -93,12 +102,16 @@ Note that above orderings share no symmetry, while the symmetrical versions of t
\begin{Lemma}\label{lem:sig-j-survive}
The only terms surviving $\term_1 - \term_2$ are those with $f, f'$ matching, where $\forall j \in[\dist], \dMap{\wElem_j} = \dMap{\wElem'_j}$.
\end{Lemma}
\AR{As I had mentioned last time, you should have the lemma state exactly what the value is when the ``matching condition" is satisfied as well-- i.e. the non-zero value is $1$.}
In proving \cref{lem:sig-j-survive}, we introduce another fact.
\begin{Lemma}\label{lem:exp-prod-rand-roots}
Given a $\prodsize^{th}$ root of unity $\rou$, the expectation of the product of $\rou^i \cdot \rou^j$ for $i, j \in [\prodsize]$ is zero.
\end{Lemma}
\AR{The lemma should be stated and proved for both the case of $w=w'$ and $w\ne w'$.}
\AR{You should be using \texttt{proof} environment to put your proofs in: i.e. \texttt{\textbackslash begin\{proof\} Blah \textbackslash end\{proof\}}.}
\begin{align*}
&\ex{\sine(\wElem)^i \conj{\sine(w')}^j}\\
@ -106,6 +119,16 @@ Given a $\prodsize^{th}$ root of unity $\rou$, the expectation of the product of
= &0
\end{align*}
In the above, since we have more than pairwise independence for $\wElem \neq \wElem'$, we can push the expectation into the product. Then by \cref{lem:exp-sine} we get 0 for both expectations.\newline
\AR{Proof environment for Lemma\cref{lem:sig-j-survive} should start here.}
\AR{Um, the proof below is bit of a mess. This needs to be re-written. Below are some suggestions.}
\AR{First some typos/things that are incorrect below-- note this is \textbf{not} an exhaustive list. (1) In the proof below the $w_i$ and $w'_i$ should be $\tilde{w}_i$ and $\tilde{w'}_i$ repectively. (2) The expression for $T_1$ below is incorrect since it seems to assume that all the pre-image sizes are $1$-- the expression for $T_2$ is fine except the $j_i$ terms are not defined. However, ``taking out" one term for $\tilde{w'}_{m'}$ for $T_2$ is incorrect since e.g. we could have the pre-image of $m'$ have size $>1$. (3) The proof below never explicitly argues why the condition $\dMap{\wElem_j} = \dMap{\wElem'_j}$ is needed.}
\AR{Here is how I recoommend that you re-write the proof. First as mentioned earlier, you should only consider the $T_1$ terms (as you account for the $T_2$ terms later on. Second you shoould first start off by re-stating the $T_1$ term like so. Consider the ``generic term"--
\[T_1(\tilde{w}_{f(1)},\dots, \tilde{w}_{f(m)}, \tilde{w'}_{f'(1)},\dots, \tilde{w'}_{f'(m')}).\]
Then re-write the what the above term is based on the exact definiion (BTW I'm dropping the $\mathbf{E}$ terms for convenience but they should be all there below.) In particular, the above term by definition is exactly
\[\prod_{i=1}^k s(\tilde{w}_{f(i)})\cdot s(\tilde{w'}_{f'(i)}).\]
}
To prove that \cref{lem:sig-j-survive} is true, consider what the expectation looks like when $f, f'$ are not matching. Looking at the first condition for $f, f'$ to be matching when $\dist \neq \dist'$ note that since $\dist \neq \dist'$ we know that one set of variables has at least one more distinct world than the other set of variables. Also, to be explicit, $\wElem_1\ldots\wElem_\dist, \wElem_1'\ldots\wElem_{\dist'}'$ are distinct world values sucht that $\forall i \neq j \in [\dist], \wElem_i = \wElem_i' \neq \wElem_j = \wElem_j'$. To make things easier, assume that $\dist < \dist'$. The opposite case of $\dist > \dist'$ has a symmetrical proof. Fixing variables $\wElem_1\ldots\wElem_\dist, \wElem_1'\ldots\wElem_\dist$, in both $\term_1$ and $\term_2$ we have one extra distinct value, $\wElem_{\dist'}'$. This distinct term cancels out all the other values in the expectations.
\begin{align}