Rewritten expectation, variance proofs for generalize v_t

This commit is contained in:
Aaron Huber 2019-08-03 10:25:25 -04:00
parent 448ed0ffef
commit 503913f385
2 changed files with 80 additions and 30 deletions

View file

@ -3,7 +3,8 @@
\label{sec:analysis} \label{sec:analysis}
We begin the analysis by showing that with high probability an estimate is approximately $\numWorldsP$, where $p$ is a tuple's probability measure for a given TIPD. Note that We begin the analysis by showing that with high probability an estimate is approximately $\numWorldsP$, where $p$ is a tuple's probability measure for a given TIPD. Note that
\begin{equation} \begin{equation}
\gVt{k\cdot}\numWorldsP = \numWorldsSum\label{eq:mu}. %\gVt{k\cdot}
\numWorldsP = \numWorldsSum\label{eq:mu}.
\end{equation} \end{equation}
We begin by making the claim that the expectation of the estimate of a tuple t's membership across all worlds is $\sum\limits_{\wVec \in \pw}\kMapParam{\wVec}$, formally We begin by making the claim that the expectation of the estimate of a tuple t's membership across all worlds is $\sum\limits_{\wVec \in \pw}\kMapParam{\wVec}$, formally
@ -14,37 +15,50 @@ To verify this claim, we argue that the expectation of the estimate of a tuple's
\begin{equation} \begin{equation}
\expect{\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}} = \kMapParam{\wVec} \label{eq:single-est}. \expect{\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}} = \kMapParam{\wVec} \label{eq:single-est}.
\end{equation} \end{equation}
%\AR{While the analysis below is correct, the way it is stated it seems to `come out of the blue.' I would recommend that you re-structure the argument below as follows. First argue that $\expect{\sketch[i][\sketchHash[\wVec]]\cdot s_i[\wVec]}=v_t[\wVec]$. From this the claim below just follows by linearity of expectation but this result is a good thing for the reader to realize. Also instead of summing over $j\in [B],\wVec|h_i[\wVec]=j,\wVec'|h_i[\wVec']=j$ it would be better to just write it as sum over all $\wVec,\wVec'\in W\text{ s.t. }h_i[\wVec]=h_i[\wVec']$-- the latter is bit more compact and it is easier to comprehend as well.}
%\AH{Proof changed as suggested above. I aired on the verbose side for the sake of clarity.}
For a given $\wVec \in \pw$, substituting definitions we have For a given $\wVec \in \pw$, substituting definitions we have
\begin{align*} \setcounter{equation}{2}
\begin{subequations}
\begin{align}
&\expect{\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} = \nonumber\\ &\expect{\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} = \nonumber\\
&\phantom{{}\sketchJParam{\sketchHashParam{\wVec}}}\expect{\big(\sum_{\substack{\wVecPrime \in \pw \st \\ &\phantom{{}\sketchJParam{\sketchHashParam{\wVec}}}\expect{\big(\sum_{\substack{\wVecPrime \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }. \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }\label{eq:step-one}\\.
\end{align*} %\end{align}
Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$. This yields %Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$. This yields
\[ %\[
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ =&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2 + \wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2 +
\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ \sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}} \wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-two}\\
\] which can be written as %\] which can be written as
\[ %\[
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ =&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2} + \wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2} +
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ \expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec} \\
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-three}\\
%\] from which the last term evaluates to $0$ and we have
%\[
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}} \wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2}\label{eq:step-four}\\
\] from which the last term evaluates to $0$ and we have %\]
\[ =&~\kMapParam{\wVec}\label{eq:step-five}
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ \end{align}
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ \end{subequations}
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2} \begin{Justification}
\] which in turn \hfill
\begin{itemize}
\item \eq{\eqref{eq:step-one}} is a substitution of the definition of $\sketch$.
\item \eq{\eqref{eq:step-two}} uses the commutativity of addition to rearrange the sum.
\item \eq{\eqref{eq:step-three}} uses linearity of expectation to reduce the large expectation into smaller expectations.
\item \eq{\eqref{eq:step-four}} follows from the second term of \eq{eq:step-three} evaluating to zero. This assumes pairwise independence of $\sketchPolar.$
\item \eq{\eqref{eq:step-five}} follows from the squaring of the $\sketchPolarParam{\wVec}$ term, which will always evaluate to 1. Keep in mind that in the summation we trivially have only 1 $\wVecPrime$ which equals $\wVec$.
\end{itemize}
\end{Justification}
%which in turn
%\begin{multline*} %\begin{multline*}
%\mathbb{E}\big[\kMapParam{\wVecPrime_0}\cdot \sketchPolarParam{\wVecPrime_0} + \cdots \\ %\mathbb{E}\big[\kMapParam{\wVecPrime_0}\cdot \sketchPolarParam{\wVecPrime_0} + \cdots \\
%+\kMapParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}+ \cdots \\ %+\kMapParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}+ \cdots \\
@ -52,11 +66,19 @@ Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \
%\end{multline*} %\end{multline*}
%\AH{break it up into w' and w} %\AH{break it up into w' and w}
%Due to the uniformity of $\sketchPolar$, we have %Due to the uniformity of $\sketchPolar$, we have
\begin{equation*} %\begin{equation*}
= \kMapParam{\wVec}, %= \kMapParam{\wVec},
\end{equation*} %\end{equation*}
thus verifying \eqref{eq:single-est}. thus verifying \eqref{eq:single-est}.
\begin{Assumption}
\hfill
\begin{itemize}
\item \eq{\eqref{eq:step-three}} assumes that $\sketchPolar$ is pairwise independent.
%\item $\sketchHash$ is uniformly distributed.
\end{itemize}
\end{Assumption}
Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWorlds-est} also must hold. Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWorlds-est} also must hold.
%We can now take \eqref{eq:single-est}, substitute it in for \eqref{eq:allWorlds-est} and show by linearity of expectation that \eqref{eq:allWorlds-est} holds. %We can now take \eqref{eq:single-est}, substitute it in for \eqref{eq:allWorlds-est} and show by linearity of expectation that \eqref{eq:allWorlds-est} holds.
%\begin{align} %\begin{align}
@ -93,13 +115,13 @@ Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWor
%\AH{Thank you for clarifying this, as I have always wondered what the convention was for display equations. Hopefully, I haven't missed any end display equations in this paper, and have them all fixed properly.} %\AH{Thank you for clarifying this, as I have always wondered what the convention was for display equations. Hopefully, I haven't missed any end display equations in this paper, and have them all fixed properly.}
For the next step, we show that the variance of an estimate is small.%$$\varParam{\estimate}$$ For the next step, we show that the variance of an estimate is small.%$$\varParam{\estimate}$$
\begin{subequations}
\begin{align} \begin{align}
&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\nonumber\\ &\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\\%\nonumber\\
=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\nonumber\\%\estExpOne}\\ =~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\label{eq:var_step-one}\\%\nonumber\\%\estExpOne}\\
=~& \mathbb{E}\big[\big(\sum_{\substack{ \wVec, \wVecPrime \in \pw \st \\ =~& \mathbb{E}\big[\big(\sum_{\substack{ \wVec, \wVecPrime \in \pw \st \\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}\nonumber\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}\nonumber\\
&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\nonumber\\ &\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\label{eq:var_step-two}\\%\nonumber\\
=~&\mathbb{E}\big[\sum_{\substack{ =~&\mathbb{E}\big[\sum_{\substack{
\wVec_1, \wVec_2,\\ \wVec_1, \wVec_2,\\
\wVecPrime_1, \wVecPrime_2 \in \pw,\\ \wVecPrime_1, \wVecPrime_2 \in \pw,\\
@ -108,6 +130,23 @@ For the next step, we show that the variance of an estimate is small.%$$\varPara
}}\kMapParam{\wVec_1} \kMapParam{\wVec_2}\sketchPolarParam{\wVec_1}\sketchPolarParam{\wVec_2}\sketchPolarParam{\wVecPrime_1}\sketchPolarParam{\wVecPrime_2}\big]\nonumber\\ }}\kMapParam{\wVec_1} \kMapParam{\wVec_2}\sketchPolarParam{\wVec_1}\sketchPolarParam{\wVec_2}\sketchPolarParam{\wVecPrime_1}\sketchPolarParam{\wVecPrime_2}\big]\nonumber\\
&\qquad - \left(\sum_{\wVec \in \pw}\kMapParam{\wVec}\right)^2 \label{eq:var-sum-w}. &\qquad - \left(\sum_{\wVec \in \pw}\kMapParam{\wVec}\right)^2 \label{eq:var-sum-w}.
\end{align} \end{align}
\end{subequations}
\begin{Justification}
\hfill
\begin{itemize}
\item \eq{\eqref{eq:var_step-one}} follows from substituting the definition of $\sketch$ and the commutativity of addition. Note the constraint on $\sketchHash$ hashing to the same bucket follows from the definition of $\sketch$. Also, the sum can be rearranged to take each component item in the sum of each bucket and take its sum of products with each of the $\sketchPolar$ mapped to it. This can be done as previously stated, using the commutativity of addition.
\item \eq{\eqref{eq:var_step-two}} by substituting the definition of variance.
\item \eq{\eqref{eq:var-sum-w}} results from the further evaluation of \eqref{eq:var_step-two}.
\end{itemize}
\end{Justification}
\begin{Assumption}
\hfill
\begin{itemize}
\item The subsequent evaluations of expectation assume 4-wise independence of $\sketchPolar$.
\end{itemize}
\end{Assumption}
Testing: $\norm{\kMap{t}}^2_2$.
%\AR{The $-\mu^2$ term is missing in the above.} %\AR{The $-\mu^2$ term is missing in the above.}
%\AH{$\mu^2$ added.} %\AH{$\mu^2$ added.}
@ -236,10 +275,12 @@ With only \eqref{eq:variantTwo} and \eqref{eq:variantThree} remaining, we have
Our current analysis is limited to TIPDBs, where the annotations are in the boolean $\mathbb{B}$ set. Because this is the case, the square of any element is itself. Computing each term separately we have Our current analysis is limited to TIPDBs, where the annotations are in the boolean $\mathbb{B}$ set. Because this is the case, the square of any element is itself. Computing each term separately we have
\begin{align} \begin{align}
&\expect{\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\ &\expect{\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\
\wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =\numWorldsP \cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\ \wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =%\numWorldsP
\norm{\kMap{t}}^2_2\cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\
&\expect{ \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\ &\expect{ \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\
\wVec \neq \wVecPrime,\\ \wVec \neq \wVecPrime,\\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = \numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}. \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = %\numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}.
\norm{\kMap{t}} \cdot \frac{\norm{\kMap{t}}\prob - \frac{\norm{\kMap{t}}}{\numWorlds}}{\sketchCols}\label{eq:spaceTwo}.
\end{align} \end{align}
In both equations, the sum of $\kMapParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability. In both equations, the sum of $\kMapParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability.

View file

@ -160,7 +160,16 @@
\newtheorem{Corollary}{Corollary} \newtheorem{Corollary}{Corollary}
\newtheorem{Example}{Example} \newtheorem{Example}{Example}
\newtheorem{Axiom}{Axiom} \newtheorem{Axiom}{Axiom}
\definecolor{db}{RGB}{23,20,119}
\definecolor{dg}{RGB}{2,101,15}
\newtheoremstyle{assumption}{}{}{\color{blue}\itshape}{}{\color{blue}\bfseries}{:}{\newline}{}
\theoremstyle{assumption}
\newtheorem{Assumption}{Assumption} \newtheorem{Assumption}{Assumption}
\newtheoremstyle{justification}{}{}{\color{green}\itshape}{}{\color{green}\bfseries}{:}{\newline}{}
\theoremstyle{justification}
\newtheorem{Justification}{Justification}
\newcommand{\eq}[1]{Equation {#1}}
\newcommand{\norm}[1]{\|{#1}\|}
\newcommand{\proofpara}[1]{\medskip\noindent\underline{{#1}:}} \newcommand{\proofpara}[1]{\medskip\noindent\underline{{#1}:}}