Rewritten expectation, variance proofs for generalize v_t
This commit is contained in:
parent
448ed0ffef
commit
503913f385
101
analysis.tex
101
analysis.tex
|
@ -3,7 +3,8 @@
|
|||
\label{sec:analysis}
|
||||
We begin the analysis by showing that with high probability an estimate is approximately $\numWorldsP$, where $p$ is a tuple's probability measure for a given TIPD. Note that
|
||||
\begin{equation}
|
||||
\gVt{k\cdot}\numWorldsP = \numWorldsSum\label{eq:mu}.
|
||||
%\gVt{k\cdot}
|
||||
\numWorldsP = \numWorldsSum\label{eq:mu}.
|
||||
\end{equation}
|
||||
|
||||
We begin by making the claim that the expectation of the estimate of a tuple t's membership across all worlds is $\sum\limits_{\wVec \in \pw}\kMapParam{\wVec}$, formally
|
||||
|
@ -14,37 +15,50 @@ To verify this claim, we argue that the expectation of the estimate of a tuple's
|
|||
\begin{equation}
|
||||
\expect{\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}} = \kMapParam{\wVec} \label{eq:single-est}.
|
||||
\end{equation}
|
||||
|
||||
%\AR{While the analysis below is correct, the way it is stated it seems to `come out of the blue.' I would recommend that you re-structure the argument below as follows. First argue that $\expect{\sketch[i][\sketchHash[\wVec]]\cdot s_i[\wVec]}=v_t[\wVec]$. From this the claim below just follows by linearity of expectation but this result is a good thing for the reader to realize. Also instead of summing over $j\in [B],\wVec|h_i[\wVec]=j,\wVec'|h_i[\wVec']=j$ it would be better to just write it as sum over all $\wVec,\wVec'\in W\text{ s.t. }h_i[\wVec]=h_i[\wVec']$-- the latter is bit more compact and it is easier to comprehend as well.}
|
||||
%\AH{Proof changed as suggested above. I aired on the verbose side for the sake of clarity.}
|
||||
For a given $\wVec \in \pw$, substituting definitions we have
|
||||
\begin{align*}
|
||||
\setcounter{equation}{2}
|
||||
\begin{subequations}
|
||||
\begin{align}
|
||||
&\expect{\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} = \nonumber\\
|
||||
&\phantom{{}\sketchJParam{\sketchHashParam{\wVec}}}\expect{\big(\sum_{\substack{\wVecPrime \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }.
|
||||
\end{align*}
|
||||
Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$. This yields
|
||||
\[
|
||||
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }\label{eq:step-one}\\.
|
||||
%\end{align}
|
||||
%Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$. This yields
|
||||
%\[
|
||||
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
|
||||
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2 +
|
||||
\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
|
||||
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}
|
||||
\] which can be written as
|
||||
\[
|
||||
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
||||
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-two}\\
|
||||
%\] which can be written as
|
||||
%\[
|
||||
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
|
||||
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2} +
|
||||
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec} \\
|
||||
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-three}\\
|
||||
%\] from which the last term evaluates to $0$ and we have
|
||||
%\[
|
||||
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
|
||||
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}
|
||||
\] from which the last term evaluates to $0$ and we have
|
||||
\[
|
||||
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
|
||||
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2}
|
||||
\] which in turn
|
||||
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2}\label{eq:step-four}\\
|
||||
%\]
|
||||
=&~\kMapParam{\wVec}\label{eq:step-five}
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
\begin{Justification}
|
||||
\hfill
|
||||
\begin{itemize}
|
||||
\item \eq{\eqref{eq:step-one}} is a substitution of the definition of $\sketch$.
|
||||
\item \eq{\eqref{eq:step-two}} uses the commutativity of addition to rearrange the sum.
|
||||
\item \eq{\eqref{eq:step-three}} uses linearity of expectation to reduce the large expectation into smaller expectations.
|
||||
\item \eq{\eqref{eq:step-four}} follows from the second term of \eq{eq:step-three} evaluating to zero. This assumes pairwise independence of $\sketchPolar.$
|
||||
\item \eq{\eqref{eq:step-five}} follows from the squaring of the $\sketchPolarParam{\wVec}$ term, which will always evaluate to 1. Keep in mind that in the summation we trivially have only 1 $\wVecPrime$ which equals $\wVec$.
|
||||
\end{itemize}
|
||||
\end{Justification}
|
||||
%which in turn
|
||||
%\begin{multline*}
|
||||
%\mathbb{E}\big[\kMapParam{\wVecPrime_0}\cdot \sketchPolarParam{\wVecPrime_0} + \cdots \\
|
||||
%+\kMapParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}+ \cdots \\
|
||||
|
@ -52,11 +66,19 @@ Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \
|
|||
%\end{multline*}
|
||||
%\AH{break it up into w' and w}
|
||||
%Due to the uniformity of $\sketchPolar$, we have
|
||||
\begin{equation*}
|
||||
= \kMapParam{\wVec},
|
||||
\end{equation*}
|
||||
%\begin{equation*}
|
||||
%= \kMapParam{\wVec},
|
||||
%\end{equation*}
|
||||
thus verifying \eqref{eq:single-est}.
|
||||
|
||||
\begin{Assumption}
|
||||
\hfill
|
||||
\begin{itemize}
|
||||
\item \eq{\eqref{eq:step-three}} assumes that $\sketchPolar$ is pairwise independent.
|
||||
%\item $\sketchHash$ is uniformly distributed.
|
||||
\end{itemize}
|
||||
\end{Assumption}
|
||||
|
||||
Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWorlds-est} also must hold.
|
||||
%We can now take \eqref{eq:single-est}, substitute it in for \eqref{eq:allWorlds-est} and show by linearity of expectation that \eqref{eq:allWorlds-est} holds.
|
||||
%\begin{align}
|
||||
|
@ -93,13 +115,13 @@ Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWor
|
|||
%\AH{Thank you for clarifying this, as I have always wondered what the convention was for display equations. Hopefully, I haven't missed any end display equations in this paper, and have them all fixed properly.}
|
||||
|
||||
For the next step, we show that the variance of an estimate is small.%$$\varParam{\estimate}$$
|
||||
|
||||
\begin{subequations}
|
||||
\begin{align}
|
||||
&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\nonumber\\
|
||||
=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\nonumber\\%\estExpOne}\\
|
||||
&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\\%\nonumber\\
|
||||
=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\label{eq:var_step-one}\\%\nonumber\\%\estExpOne}\\
|
||||
=~& \mathbb{E}\big[\big(\sum_{\substack{ \wVec, \wVecPrime \in \pw \st \\
|
||||
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}\nonumber\\
|
||||
&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\nonumber\\
|
||||
&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\label{eq:var_step-two}\\%\nonumber\\
|
||||
=~&\mathbb{E}\big[\sum_{\substack{
|
||||
\wVec_1, \wVec_2,\\
|
||||
\wVecPrime_1, \wVecPrime_2 \in \pw,\\
|
||||
|
@ -108,6 +130,23 @@ For the next step, we show that the variance of an estimate is small.%$$\varPara
|
|||
}}\kMapParam{\wVec_1} \kMapParam{\wVec_2}\sketchPolarParam{\wVec_1}\sketchPolarParam{\wVec_2}\sketchPolarParam{\wVecPrime_1}\sketchPolarParam{\wVecPrime_2}\big]\nonumber\\
|
||||
&\qquad - \left(\sum_{\wVec \in \pw}\kMapParam{\wVec}\right)^2 \label{eq:var-sum-w}.
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
|
||||
\begin{Justification}
|
||||
\hfill
|
||||
\begin{itemize}
|
||||
\item \eq{\eqref{eq:var_step-one}} follows from substituting the definition of $\sketch$ and the commutativity of addition. Note the constraint on $\sketchHash$ hashing to the same bucket follows from the definition of $\sketch$. Also, the sum can be rearranged to take each component item in the sum of each bucket and take its sum of products with each of the $\sketchPolar$ mapped to it. This can be done as previously stated, using the commutativity of addition.
|
||||
\item \eq{\eqref{eq:var_step-two}} by substituting the definition of variance.
|
||||
\item \eq{\eqref{eq:var-sum-w}} results from the further evaluation of \eqref{eq:var_step-two}.
|
||||
\end{itemize}
|
||||
\end{Justification}
|
||||
\begin{Assumption}
|
||||
\hfill
|
||||
\begin{itemize}
|
||||
\item The subsequent evaluations of expectation assume 4-wise independence of $\sketchPolar$.
|
||||
\end{itemize}
|
||||
\end{Assumption}
|
||||
Testing: $\norm{\kMap{t}}^2_2$.
|
||||
%\AR{The $-\mu^2$ term is missing in the above.}
|
||||
%\AH{$\mu^2$ added.}
|
||||
|
||||
|
@ -236,10 +275,12 @@ With only \eqref{eq:variantTwo} and \eqref{eq:variantThree} remaining, we have
|
|||
Our current analysis is limited to TIPDBs, where the annotations are in the boolean $\mathbb{B}$ set. Because this is the case, the square of any element is itself. Computing each term separately we have
|
||||
\begin{align}
|
||||
&\expect{\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\
|
||||
\wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =\numWorldsP \cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\
|
||||
\wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =%\numWorldsP
|
||||
\norm{\kMap{t}}^2_2\cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\
|
||||
&\expect{ \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\
|
||||
\wVec \neq \wVecPrime,\\
|
||||
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = \numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}.
|
||||
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = %\numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}.
|
||||
\norm{\kMap{t}} \cdot \frac{\norm{\kMap{t}}\prob - \frac{\norm{\kMap{t}}}{\numWorlds}}{\sketchCols}\label{eq:spaceTwo}.
|
||||
\end{align}
|
||||
In both equations, the sum of $\kMapParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability.
|
||||
|
||||
|
|
|
@ -160,7 +160,16 @@
|
|||
\newtheorem{Corollary}{Corollary}
|
||||
\newtheorem{Example}{Example}
|
||||
\newtheorem{Axiom}{Axiom}
|
||||
\definecolor{db}{RGB}{23,20,119}
|
||||
\definecolor{dg}{RGB}{2,101,15}
|
||||
\newtheoremstyle{assumption}{}{}{\color{blue}\itshape}{}{\color{blue}\bfseries}{:}{\newline}{}
|
||||
\theoremstyle{assumption}
|
||||
\newtheorem{Assumption}{Assumption}
|
||||
\newtheoremstyle{justification}{}{}{\color{green}\itshape}{}{\color{green}\bfseries}{:}{\newline}{}
|
||||
\theoremstyle{justification}
|
||||
\newtheorem{Justification}{Justification}
|
||||
\newcommand{\eq}[1]{Equation {#1}}
|
||||
\newcommand{\norm}[1]{\|{#1}\|}
|
||||
|
||||
|
||||
\newcommand{\proofpara}[1]{\medskip\noindent\underline{{#1}:}}
|
||||
|
|
Loading…
Reference in a new issue