Made pass till argument for (93)

This commit is contained in:
Atri Rudra 2020-04-16 22:57:03 -04:00
parent 3e99b18c77
commit 4b5edc2920

11
sop.tex
View file

@ -22,7 +22,7 @@ Let us show first that the expectation of the estimate does in fact yield the va
= &\ex{\sum_{\substack{\wElem_1,\ldots, \wElem_{\prodsize}\\ \in \wSet_j}} \prod_{i = 1}^{\prodsize}\vect_i(\wElem_i)\prod_{i = 1}^{\prodsize}\sine(\wElem_i)}\\
= &\sum_{\substack{\wElem_1,\ldots, \wElem_{\prodsize}\\ \in \wSet_j}} \prod_{i = 1}^{\prodsize}\vect_i(\wElem_i)\ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i)}
\end{align*}
Fix the variables $\wElem_1,\ldots, \wElem_{\prodsize}$. Define $\dist$ to be the number of distinct worlds in $\wElem_1,\ldots, \wElem_{\prodsize}$ and $e_l$ to be the number of repitions for the $l_{th}$ \AR{General typesetting comments. (1) You shoud laway use $\ell$ instead of $l$. (2) Typeset $l_{th}$ as $\ell^{\text{th}}$-- note that ``th" is in superscript and not in math mode.} distinct world value. For $\term_1^{\est_j} = \ex{\prod_{i = 1}^{\prodsize} \sine(\wElem_i)}$, \AR{Why are you defining the new notation $\term_1^{\est_j}$. You should always be wary of introducing new notation since it makes things hard to read.} we get
Fix the variables $\wElem_1,\ldots, \wElem_{\prodsize}$. Define $\dist$ to be the number of distinct worlds in $\wElem_1,\ldots, \wElem_{\prodsize}$ and $e_l$ to be the number of repetitions for the $l_{th}$ \AR{General typesetting comments. (1) You should always use $\ell$ instead of $l$. (2) Typeset $l_{th}$ as $\ell^{\text{th}}$-- note that ``th" is in superscript and not in math mode.} distinct world value. For $\term_1^{\est_j} = \ex{\prod_{i = 1}^{\prodsize} \sine(\wElem_i)}$, \AR{Why are you defining the new notation $\term_1^{\est_j}$. You should always be wary of introducing new notation since it makes things hard to read.} we get
\begin{align*}
\term_1^{\est_j} = &\ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i)}\\
= &\ex{\prod_{l = 1}^{\dist} \sine(\wElem_l)^{e_l}}\\
@ -65,6 +65,8 @@ Recall that we started this section out by seeking to prove \cref{eq:var-to-prov
One can see that \cref{eq:sigsq-jneqj} is composed of two addends. We now bound each of them separately.
\subsection{Bounding $\sum_{j \neq j'}\cvar{j, j'}$}
\AR{You need to re-write the stuff below. First in the 2nd equality suddenly the sum on $j\ne j'$ has vanished. Also I think you should first analyze $\lambda(j,j')$ for both $j=j'$ and $j\ne j'$ for as long as you can. Only when it is needed should you divide into the two cases-- do not do the division up front.}
\begin{align*}
\sum_{j \neq j'}\cvar{j, j'} &= \sum_{j \neq j'} \ex{\est_j \cdot \conj{\est_{j'}}} - \ex{\est_j}\cdot\ex{\conj{\est_{j'}}}\\
&=\ex{\prod_{i = 1}^{\prodsize}\sum_{\wElem \in W}v_i(\wElem)s(\wElem)\ind{h(\wElem) = j}\cdot \prod_{i = 1}^{\prodsize}\sum_{\wElem' \in W}v_i(\wElem')\conj{s(\wElem')}\ind{h(\wElem') = j'}} - \ex{\prod_{i = 1}^{\prodsize}\sum_{\wElem \in W}v_i(\wElem)s(\wElem)\ind{h(\wElem) = j}}\cdot \ex{\prod_{i = 1}^{\prodsize}\sum_{\wElem' \in W}v_i(\wElem')\conj{s(\wElem')}\ind{h(\wElem') = j'}}\\
@ -74,6 +76,7 @@ One can see that \cref{eq:sigsq-jneqj} is composed of two addends. We now bound
&= \sum_{\substack{\wElem_1,\cdots,\wElem_\prodsize,\\\wElem'_1,\cdots,\wElem'_\prodsize\\\in W}}\prod_{i = 1}^{\prodsize}v_i(\wElem_i)v_i(\wElem'_i)\left(\ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\conj{s(\wElem'_i)}\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}} - \ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)\ind{h(\wElem_i) = j}}\cdot\ex{\prod_{i = 1}^{\prodsize}\conj{s(\wElem'_i)}\ind{h(\wElem_i') = j'}} \right).
\end{align*}
\AH{Perhaps a formal proof is necessary below.}
\AR{Most definitely.}
For $\term_1^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}s(\wElem_i)s(\wElem'_i)\ind{h(\wElem_i) = j}\ind{h(\wElem'_i) = j'}}$, because hash function $h$ cannot bucket the same world to two different buckets, the only instance $\term_1^{\cvar{j, j'}} = 1$ occurs when there is no overlap between the $\wElem_i$ and $\wElem'_i$ variables. Given the condition of no overlap, $\term_1^{\cvar{j, j'}} = 1$ only with the further condition that $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem', \wElem \neq \wElem'$. Notice, however, given the conditions, the product of the remaining expectations will cancel this out. Looking at the remaining two expectations $\term_2^{\cvar{j, j'}} = \ex{\prod_{i = 1}^{\prodsize}\sine(\wElem_i) \ind{\hfunc(\wElem_i) = j}} \cdot \ex{\prod_{i = 1}^{\prodsize}\conj{\sine(\wElem'_i)} \ind{\hfunc(\wElem'_i) = j'}}$, that $\term_2^{\cvar{j, j'}} = 1$ only when $\forall i \in [\prodsize], \wElem_i = \wElem, \wElem'_i = \wElem'$. Taken together, the constraints leave us with only one possible case for $\term_1^{\cvar{j, j'}} - \term_2^{\cvar{j, j'}} \neq 0$, when all variables are the same world. Thus,
\begin{align}
&\sum_{j \neq j'}\cvar{j, j'} = - \frac{1}{B^2}\sum_{\wElem \in W}\prod_{i = 1}^{\prodsize}v_i^2(\wElem)\label{eq:cvar-bound}.
@ -97,6 +100,7 @@ We now move on to bound the variance of a $\prodsize$-way join.
\end{align}
Before proceeding, we introduce some notation and terminology that will aid in communicating the bounds we are about to establish. We refer to the leftmost expectation of \cref{eq:sig-j-last} in the following way:
\AR{dangling eq ref}
\[\term_1\left(\wElem_1,\ldots,\wElem_\prodsize, \wElem_1',\ldots, \wElem_\prodsize'\right) = \ex{\prod_{i = 1}^\prodsize s(w_i)\overline{s(w'_i)}\ind{h(w_i) = j}\ind{h(w'_i) = j}}.%\text{, and}
\]
%\[\term_2\left(\wElem_1,\ldots,\wElem_\prodsize, \wElem_1',\ldots, \wElem_\prodsize'\right) = \ex{\prod_{i = 1}^ks(w_i)\ind{h(w_i) = j}}\cdot \ex{\prod_{i = 1}^\prodsize\overline{s(w'_i)}\ind{h(w'_i) = j}}. \]
@ -118,6 +122,7 @@ We next describe the nonzero terms of \cref{eq:sig-j-last}.
Define and then fix a total ordering of the $\dist$ distinct world elements to follow the total order of the natural numbers in $[\dist]$, such that $\forall i, j \in [\dist], i < j \implies \dw_i < \dw_j, i.e. \wElem_1 \prec\ldots\prec\wElem_\prodsize$.
%Given a fixed order $\wSet_{\order}: \left(\wSet, \wSet\right)\mapsto \mathbb{B}$ of possible worlds, define the lexographical order of distinct worlds $\wSet_\dist$ to be the ordering which complies to the identity mapping of elements in $[\prodsize]$ to elements in $[\dist]$ up to $\dist$, such that . In other worlds, $\forall \wElem, \wElem' \in \wSet_\dist, \dw < \wElem' \leftrightarrow \wSet_{\order}\left(\wElem, \wElem'\right) = T$.
\end{Definition}
\AR{NO. The ordering $\prec$ has nothing to do with $m$. It is just ordering all the worlds in $W$.}
To help describe all possible world value matchings we introduce functions $f$ and $f'$.
\begin{Definition}
Functions f, f' are the set of surjective mappings from $\prodsize$ to $\dist$ elements: $f: [\prodsize] \rightarrow [\dist], f': [\prodsize] \rightarrow [\dist'].$
@ -137,7 +142,9 @@ We rewrite equation \eqref{eq:sig-j-last} in terms of $\dist$ distinct worlds, w
\sum_{\dist = 2}^{\prodsize}\sum_{\dist' = 2}^{\prodsize}\sum_{f, f'}\sum_{\substack{\dw_1, \ldots,\dw_\dist,\\ \dw'_{1},\ldots,\dw'_{\dist'}\\ \in W}}\prod_{i = 1}^{\prodsize}\vect_i(\dw_{_{f(i)}})\vect_i(\dw_{'_{f'(i)}})\cdot \term_1\left(\dw_{f(1)},\ldots,\dw_{f(\prodsize)}, \dw'_{f'(1)},\ldots, \dw'_{f'(\prodsize)}\right)
\label{eq:sig-j-distinct}
\end{equation}
Observe that the cartesian product of world values assigned to $\wElem_1,\ldots,\wElem_\prodsize$ throughout the summation can be rearranged into groups of variables with distinct values, for each distinct element $\dist$ in the set $[\prodsize]$. For each $\dist \in [\prodsize]$, all possible combinations of $\dist$ world values can be equivalently modeled by taking the set of surjective functions $f:[\prodsize]\mapsto [\dist]$ and computing all world value combinations based on the total ordering of $\dw_{f(1)}\prec\cdots\prec\dw_{f(m)}$. For any $\dist$, all surjective mappings $f$ constitute all unique mappings with their symmetrical counterparts. Combining that with the total order over $\dw_{f(1)},\ldots,\dw_{f(\dist)}$ yields exactly the world value combinations containing $\dist$ distinct values which appear in the cartesian product of the sum, without double counting. What this all boils down to is a rearrangement of addends in the sum.
\AR{Three comments on the above: (1) Why do the sums on $m$ and $m'$ start with $2$ and not $1$? (2) Also $\tilde{w}_1,\dots,\tilde{w}_m\in W$ should be replaced by $\tilde{w}_1\prec \cdots\prec \tilde{w}_m \in W$-- similarly for $\tilde{w'}_i$s as well. (3) Use $\widetilde{w_i}$ instead of $\tilde{w}_i$-- I had used the latter in my notes due to laziness.}
Observe that the cartesian product of world values assigned to $\wElem_1,\ldots,\wElem_\prodsize$ throughout the summation can be rearranged into groups of world variables with distinct world values, for each distinct element $\dist$ in the set $[\prodsize]$. For each $\dist \in [\prodsize]$, all possible combinations of $\dist$ world values can be equivalently modeled by taking the set of surjective functions $f:[\prodsize]\mapsto [\dist]$ and computing all world value combinations based on the total ordering of $\dw_{f(1)}\prec\cdots\prec\dw_{f(m)}$.\AR{Again total ordering is on worlds in $W$-- $\dw_{f(1)}\prec\cdots\prec\dw_{f(m)}$ does not make sense since some of these world values could be the same.} For any $\dist$, all surjective mappings $f$ constitute all unique mappings with their symmetrical counterparts \AR{I do not see what the ``symmetrical counterparts" comment adds here. Just remove it}. Combining that with the total order over $\dw_{f(1)},\ldots,\dw_{f(\dist)}$ yields exactly the world value combinations containing $\dist$ distinct values which appear in the cartesian product of the sum, without double counting \AR{Again not sure the ``double counting" comment adds anything here}.
\AR{Overall comments: (1) The main thing missing if explicitly stating that $(w_1,\dots,w_k)\mapsto (\dw_{f(1)},\ldots,\dw_{f(\dist)})$. (2) After stating the map you should argue in words why all distinct tuples with $m$ distinct world values are covered.} What this all boils down to is a rearrangement of addends in the sum.
\begin{Definition}
Functions $f:[\prodsize]\mapsto [\dist], f':[\prodsize]\mapsto [\dist']$ are said to be matching, denoted $\match{f}{f'}$, if and only if