Rewritten expectation, variance proofs for generalize v_t

2019-08-03 10:25:25 -04:00 · 2019-08-03 10:25:25 -04:00 · 503913f385
parent 448ed0ffef
commit 503913f385
2 changed files with 80 additions and 30 deletions
--- a/analysis.tex
+++ b/analysis.tex
@ -3,7 +3,8 @@
 \label{sec:analysis}
 We begin the analysis by showing that with high probability an estimate is approximately $\numWorldsP$, where $p$ is a tuple's probability measure for a given TIPD.  Note that 
 \begin{equation}
-\gVt{k\cdot}\numWorldsP = \numWorldsSum\label{eq:mu}.
+%\gVt{k\cdot}
+\numWorldsP = \numWorldsSum\label{eq:mu}.
 \end{equation}

 We begin by making the claim that the expectation of the estimate of a tuple t's membership across all worlds is $\sum\limits_{\wVec \in \pw}\kMapParam{\wVec}$, formally
@ -14,37 +15,50 @@ To verify this claim, we argue that the expectation of the estimate of a tuple's
 \begin{equation}
 \expect{\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}} = \kMapParam{\wVec} \label{eq:single-est}.
 \end{equation}
-
-%\AR{While the analysis below is correct, the way it is stated it seems to `come out of the blue.' I would recommend that you re-structure the argument below as follows. First argue that $\expect{\sketch[i][\sketchHash[\wVec]]\cdot s_i[\wVec]}=v_t[\wVec]$. From this the claim below just follows by linearity of expectation but this result is a good thing for the reader to realize. Also instead of summing over $j\in [B],\wVec|h_i[\wVec]=j,\wVec'|h_i[\wVec']=j$ it would be better to just write it as sum over all $\wVec,\wVec'\in W\text{ s.t. }h_i[\wVec]=h_i[\wVec']$-- the latter is bit more compact and it is easier to comprehend as well.}
-%\AH{Proof changed as suggested above.  I aired on the verbose side for the sake of clarity.}
 For a given $\wVec \in \pw$, substituting definitions we have
-\begin{align*}
+\setcounter{equation}{2}
+\begin{subequations}
+\begin{align}
 &\expect{\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} = \nonumber\\
 &\phantom{{}\sketchJParam{\sketchHashParam{\wVec}}}\expect{\big(\sum_{\substack{\wVecPrime \in \pw \st \\
-														\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }.
-\end{align*}
-Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$.  This yields 
-\[
-\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
+														\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }\label{eq:step-one}\\.
+%\end{align}
+%Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$.  This yields 
+%\[
+=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
 						\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
 						\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2 +
 	\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
 						\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
-						\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}
-\] which can be written as
-\[
-\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
+						\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-two}\\
+%\] which can be written as
+%\[
+=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
 						\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
 						\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2} +
 \expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
+						\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec} \\
+						\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-three}\\
+%\] from which the last term evaluates to $0$ and we have
+%\[
+=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
 						\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
-						\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}
-\] from which the last term evaluates to $0$ and we have
-\[
-\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
-						\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
-						\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2}
-\] which in turn
+						\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2}\label{eq:step-four}\\
+%\]
+=&~\kMapParam{\wVec}\label{eq:step-five}
+\end{align} 
+\end{subequations}
+\begin{Justification}
+\hfill
+	\begin{itemize}
+		\item \eq{\eqref{eq:step-one}} is a substitution of the definition of $\sketch$.
+		\item \eq{\eqref{eq:step-two}} uses the commutativity of addition to rearrange the sum.
+		\item \eq{\eqref{eq:step-three}} uses linearity of expectation to reduce the large expectation into smaller expectations.
+		\item \eq{\eqref{eq:step-four}} follows from the second term of \eq{eq:step-three} evaluating to zero.  This assumes pairwise independence of $\sketchPolar.$
+		\item \eq{\eqref{eq:step-five}} follows from the squaring of the $\sketchPolarParam{\wVec}$ term, which will always evaluate to 1.  Keep in mind that in the summation we trivially have only 1 $\wVecPrime$ which equals $\wVec$.
+	\end{itemize}
+\end{Justification}
+ %which in turn
 %\begin{multline*}
 %\mathbb{E}\big[\kMapParam{\wVecPrime_0}\cdot \sketchPolarParam{\wVecPrime_0} + \cdots \\
 %+\kMapParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}+ \cdots \\
@ -52,11 +66,19 @@ Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \
 %\end{multline*}
 %\AH{break it up into w' and w}
 %Due to the uniformity of $\sketchPolar$, we have
-\begin{equation*}
-= \kMapParam{\wVec},
-\end{equation*}
+%\begin{equation*}
+%= \kMapParam{\wVec},
+%\end{equation*}
 thus verifying \eqref{eq:single-est}.  

+\begin{Assumption}
+\hfill
+	\begin{itemize}
+		\item \eq{\eqref{eq:step-three}} assumes that $\sketchPolar$ is pairwise independent.
+		%\item $\sketchHash$ is uniformly distributed.
+	\end{itemize} 
+\end{Assumption}
+
 Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWorlds-est} also must hold.
 %We can now take \eqref{eq:single-est}, substitute it in for \eqref{eq:allWorlds-est} and show by linearity of expectation that \eqref{eq:allWorlds-est} holds.
 %\begin{align}
@ -93,13 +115,13 @@ Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWor
 %\AH{Thank you for clarifying this, as I have always wondered what the convention was for display equations.  Hopefully, I haven't missed any end display equations in this paper, and have them all fixed properly.}

 For the next step, we show that the variance of an estimate is small.%$$\varParam{\estimate}$$
-
+\begin{subequations}
 \begin{align}
-&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\nonumber\\
-=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ 												\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\nonumber\\%\estExpOne}\\
+&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\\%\nonumber\\
+=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ 												\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\label{eq:var_step-one}\\%\nonumber\\%\estExpOne}\\
 =~& \mathbb{E}\big[\big(\sum_{\substack{ \wVec, \wVecPrime \in \pw \st \\
 			 \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}\nonumber\\
-&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\nonumber\\
+&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\label{eq:var_step-two}\\%\nonumber\\
 =~&\mathbb{E}\big[\sum_{\substack{
 		\wVec_1, \wVec_2,\\
 		 \wVecPrime_1, \wVecPrime_2 \in \pw,\\
@ -108,6 +130,23 @@ For the next step, we show that the variance of an estimate is small.%$$\varPara
 		 }}\kMapParam{\wVec_1}  \kMapParam{\wVec_2}\sketchPolarParam{\wVec_1}\sketchPolarParam{\wVec_2}\sketchPolarParam{\wVecPrime_1}\sketchPolarParam{\wVecPrime_2}\big]\nonumber\\
 &\qquad - \left(\sum_{\wVec \in \pw}\kMapParam{\wVec}\right)^2 \label{eq:var-sum-w}.
 \end{align}
+\end{subequations}
+
+\begin{Justification}
+\hfill
+	\begin{itemize}
+		\item \eq{\eqref{eq:var_step-one}} follows from substituting the definition of $\sketch$ and the commutativity of addition.  Note the constraint on $\sketchHash$ hashing to the same bucket follows from the definition of $\sketch$.  Also, the sum can be rearranged to take each component item in the sum of each bucket and take its sum of products with each of the $\sketchPolar$ mapped to it.  This can be done as previously stated, using the commutativity of addition.
+		\item \eq{\eqref{eq:var_step-two}} by substituting the definition of variance.
+		\item \eq{\eqref{eq:var-sum-w}} results from the further evaluation of \eqref{eq:var_step-two}.
+	\end{itemize} 
+\end{Justification}
+\begin{Assumption}
+\hfill
+	\begin{itemize}
+		\item The subsequent evaluations of expectation assume 4-wise independence of $\sketchPolar$.
+	\end{itemize}
+\end{Assumption}
+Testing: $\norm{\kMap{t}}^2_2$.
 %\AR{The $-\mu^2$ term is missing in the above.}
 %\AH{$\mu^2$ added.}

@ -236,10 +275,12 @@ With only \eqref{eq:variantTwo} and \eqref{eq:variantThree} remaining, we have
 Our current analysis is limited to TIPDBs, where the annotations are in the boolean $\mathbb{B}$ set.  Because this is the case, the square of any element is itself.  Computing each term separately we have
 \begin{align}
 &\expect{\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\
-					 \wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =\numWorldsP \cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\
+					 \wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =%\numWorldsP
+\norm{\kMap{t}}^2_2\cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\
 &\expect{ \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\
 					\wVec \neq \wVecPrime,\\
-					\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = \numWorldsP \cdot  \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}.
+					\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = %\numWorldsP \cdot  \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}.
+\norm{\kMap{t}} \cdot \frac{\norm{\kMap{t}}\prob - \frac{\norm{\kMap{t}}}{\numWorlds}}{\sketchCols}\label{eq:spaceTwo}.
 \end{align}
 In both equations, the sum of $\kMapParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability. 

--- a/macros.tex
+++ b/macros.tex
@ -160,7 +160,16 @@
 \newtheorem{Corollary}{Corollary}
 \newtheorem{Example}{Example}
 \newtheorem{Axiom}{Axiom}
+\definecolor{db}{RGB}{23,20,119}
+\definecolor{dg}{RGB}{2,101,15}
+\newtheoremstyle{assumption}{}{}{\color{blue}\itshape}{}{\color{blue}\bfseries}{:}{\newline}{}
+\theoremstyle{assumption}
 \newtheorem{Assumption}{Assumption}
+\newtheoremstyle{justification}{}{}{\color{green}\itshape}{}{\color{green}\bfseries}{:}{\newline}{}
+\theoremstyle{justification}
+\newtheorem{Justification}{Justification}
+\newcommand{\eq}[1]{Equation {#1}}
+\newcommand{\norm}[1]{\|{#1}\|}


 \newcommand{\proofpara}[1]{\medskip\noindent\underline{{#1}:}}