Rewritten expectation, variance proofs for generalize v_t

This commit is contained in:
Aaron Huber 2019-08-03 10:25:25 -04:00
parent 448ed0ffef
commit 503913f385
2 changed files with 80 additions and 30 deletions

View file

@ -3,7 +3,8 @@
\label{sec:analysis}
We begin the analysis by showing that with high probability an estimate is approximately $\numWorldsP$, where $p$ is a tuple's probability measure for a given TIPD. Note that
\begin{equation}
\gVt{k\cdot}\numWorldsP = \numWorldsSum\label{eq:mu}.
%\gVt{k\cdot}
\numWorldsP = \numWorldsSum\label{eq:mu}.
\end{equation}
We begin by making the claim that the expectation of the estimate of a tuple t's membership across all worlds is $\sum\limits_{\wVec \in \pw}\kMapParam{\wVec}$, formally
@ -14,37 +15,50 @@ To verify this claim, we argue that the expectation of the estimate of a tuple's
\begin{equation}
\expect{\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}} = \kMapParam{\wVec} \label{eq:single-est}.
\end{equation}
%\AR{While the analysis below is correct, the way it is stated it seems to `come out of the blue.' I would recommend that you re-structure the argument below as follows. First argue that $\expect{\sketch[i][\sketchHash[\wVec]]\cdot s_i[\wVec]}=v_t[\wVec]$. From this the claim below just follows by linearity of expectation but this result is a good thing for the reader to realize. Also instead of summing over $j\in [B],\wVec|h_i[\wVec]=j,\wVec'|h_i[\wVec']=j$ it would be better to just write it as sum over all $\wVec,\wVec'\in W\text{ s.t. }h_i[\wVec]=h_i[\wVec']$-- the latter is bit more compact and it is easier to comprehend as well.}
%\AH{Proof changed as suggested above. I aired on the verbose side for the sake of clarity.}
For a given $\wVec \in \pw$, substituting definitions we have
\begin{align*}
\setcounter{equation}{2}
\begin{subequations}
\begin{align}
&\expect{\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} = \nonumber\\
&\phantom{{}\sketchJParam{\sketchHashParam{\wVec}}}\expect{\big(\sum_{\substack{\wVecPrime \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }.
\end{align*}
Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$. This yields
\[
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }\label{eq:step-one}\\.
%\end{align}
%Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$. This yields
%\[
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2 +
\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}
\] which can be written as
\[
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-two}\\
%\] which can be written as
%\[
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2} +
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec} \\
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-three}\\
%\] from which the last term evaluates to $0$ and we have
%\[
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}
\] from which the last term evaluates to $0$ and we have
\[
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2}
\] which in turn
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2}\label{eq:step-four}\\
%\]
=&~\kMapParam{\wVec}\label{eq:step-five}
\end{align}
\end{subequations}
\begin{Justification}
\hfill
\begin{itemize}
\item \eq{\eqref{eq:step-one}} is a substitution of the definition of $\sketch$.
\item \eq{\eqref{eq:step-two}} uses the commutativity of addition to rearrange the sum.
\item \eq{\eqref{eq:step-three}} uses linearity of expectation to reduce the large expectation into smaller expectations.
\item \eq{\eqref{eq:step-four}} follows from the second term of \eq{eq:step-three} evaluating to zero. This assumes pairwise independence of $\sketchPolar.$
\item \eq{\eqref{eq:step-five}} follows from the squaring of the $\sketchPolarParam{\wVec}$ term, which will always evaluate to 1. Keep in mind that in the summation we trivially have only 1 $\wVecPrime$ which equals $\wVec$.
\end{itemize}
\end{Justification}
%which in turn
%\begin{multline*}
%\mathbb{E}\big[\kMapParam{\wVecPrime_0}\cdot \sketchPolarParam{\wVecPrime_0} + \cdots \\
%+\kMapParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}+ \cdots \\
@ -52,11 +66,19 @@ Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \
%\end{multline*}
%\AH{break it up into w' and w}
%Due to the uniformity of $\sketchPolar$, we have
\begin{equation*}
= \kMapParam{\wVec},
\end{equation*}
%\begin{equation*}
%= \kMapParam{\wVec},
%\end{equation*}
thus verifying \eqref{eq:single-est}.
\begin{Assumption}
\hfill
\begin{itemize}
\item \eq{\eqref{eq:step-three}} assumes that $\sketchPolar$ is pairwise independent.
%\item $\sketchHash$ is uniformly distributed.
\end{itemize}
\end{Assumption}
Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWorlds-est} also must hold.
%We can now take \eqref{eq:single-est}, substitute it in for \eqref{eq:allWorlds-est} and show by linearity of expectation that \eqref{eq:allWorlds-est} holds.
%\begin{align}
@ -93,13 +115,13 @@ Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWor
%\AH{Thank you for clarifying this, as I have always wondered what the convention was for display equations. Hopefully, I haven't missed any end display equations in this paper, and have them all fixed properly.}
For the next step, we show that the variance of an estimate is small.%$$\varParam{\estimate}$$
\begin{subequations}
\begin{align}
&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\nonumber\\
=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\nonumber\\%\estExpOne}\\
&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\\%\nonumber\\
=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\label{eq:var_step-one}\\%\nonumber\\%\estExpOne}\\
=~& \mathbb{E}\big[\big(\sum_{\substack{ \wVec, \wVecPrime \in \pw \st \\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}\nonumber\\
&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\nonumber\\
&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\label{eq:var_step-two}\\%\nonumber\\
=~&\mathbb{E}\big[\sum_{\substack{
\wVec_1, \wVec_2,\\
\wVecPrime_1, \wVecPrime_2 \in \pw,\\
@ -108,6 +130,23 @@ For the next step, we show that the variance of an estimate is small.%$$\varPara
}}\kMapParam{\wVec_1} \kMapParam{\wVec_2}\sketchPolarParam{\wVec_1}\sketchPolarParam{\wVec_2}\sketchPolarParam{\wVecPrime_1}\sketchPolarParam{\wVecPrime_2}\big]\nonumber\\
&\qquad - \left(\sum_{\wVec \in \pw}\kMapParam{\wVec}\right)^2 \label{eq:var-sum-w}.
\end{align}
\end{subequations}
\begin{Justification}
\hfill
\begin{itemize}
\item \eq{\eqref{eq:var_step-one}} follows from substituting the definition of $\sketch$ and the commutativity of addition. Note the constraint on $\sketchHash$ hashing to the same bucket follows from the definition of $\sketch$. Also, the sum can be rearranged to take each component item in the sum of each bucket and take its sum of products with each of the $\sketchPolar$ mapped to it. This can be done as previously stated, using the commutativity of addition.
\item \eq{\eqref{eq:var_step-two}} by substituting the definition of variance.
\item \eq{\eqref{eq:var-sum-w}} results from the further evaluation of \eqref{eq:var_step-two}.
\end{itemize}
\end{Justification}
\begin{Assumption}
\hfill
\begin{itemize}
\item The subsequent evaluations of expectation assume 4-wise independence of $\sketchPolar$.
\end{itemize}
\end{Assumption}
Testing: $\norm{\kMap{t}}^2_2$.
%\AR{The $-\mu^2$ term is missing in the above.}
%\AH{$\mu^2$ added.}
@ -236,10 +275,12 @@ With only \eqref{eq:variantTwo} and \eqref{eq:variantThree} remaining, we have
Our current analysis is limited to TIPDBs, where the annotations are in the boolean $\mathbb{B}$ set. Because this is the case, the square of any element is itself. Computing each term separately we have
\begin{align}
&\expect{\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\
\wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =\numWorldsP \cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\
\wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =%\numWorldsP
\norm{\kMap{t}}^2_2\cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\
&\expect{ \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\
\wVec \neq \wVecPrime,\\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = \numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}.
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = %\numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}.
\norm{\kMap{t}} \cdot \frac{\norm{\kMap{t}}\prob - \frac{\norm{\kMap{t}}}{\numWorlds}}{\sketchCols}\label{eq:spaceTwo}.
\end{align}
In both equations, the sum of $\kMapParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability.

View file

@ -160,7 +160,16 @@
\newtheorem{Corollary}{Corollary}
\newtheorem{Example}{Example}
\newtheorem{Axiom}{Axiom}
\definecolor{db}{RGB}{23,20,119}
\definecolor{dg}{RGB}{2,101,15}
\newtheoremstyle{assumption}{}{}{\color{blue}\itshape}{}{\color{blue}\bfseries}{:}{\newline}{}
\theoremstyle{assumption}
\newtheorem{Assumption}{Assumption}
\newtheoremstyle{justification}{}{}{\color{green}\itshape}{}{\color{green}\bfseries}{:}{\newline}{}
\theoremstyle{justification}
\newtheorem{Justification}{Justification}
\newcommand{\eq}[1]{Equation {#1}}
\newcommand{\norm}[1]{\|{#1}\|}
\newcommand{\proofpara}[1]{\medskip\noindent\underline{{#1}:}}