diff --git a/analysis.tex b/analysis.tex index 4a99b9a..76dbb14 100644 --- a/analysis.tex +++ b/analysis.tex @@ -3,7 +3,8 @@ \label{sec:analysis} We begin the analysis by showing that with high probability an estimate is approximately $\numWorldsP$, where $p$ is a tuple's probability measure for a given TIPD. Note that \begin{equation} -\gVt{k\cdot}\numWorldsP = \numWorldsSum\label{eq:mu}. +%\gVt{k\cdot} +\numWorldsP = \numWorldsSum\label{eq:mu}. \end{equation} We begin by making the claim that the expectation of the estimate of a tuple t's membership across all worlds is $\sum\limits_{\wVec \in \pw}\kMapParam{\wVec}$, formally @@ -14,37 +15,50 @@ To verify this claim, we argue that the expectation of the estimate of a tuple's \begin{equation} \expect{\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}} = \kMapParam{\wVec} \label{eq:single-est}. \end{equation} - -%\AR{While the analysis below is correct, the way it is stated it seems to `come out of the blue.' I would recommend that you re-structure the argument below as follows. First argue that $\expect{\sketch[i][\sketchHash[\wVec]]\cdot s_i[\wVec]}=v_t[\wVec]$. From this the claim below just follows by linearity of expectation but this result is a good thing for the reader to realize. Also instead of summing over $j\in [B],\wVec|h_i[\wVec]=j,\wVec'|h_i[\wVec']=j$ it would be better to just write it as sum over all $\wVec,\wVec'\in W\text{ s.t. }h_i[\wVec]=h_i[\wVec']$-- the latter is bit more compact and it is easier to comprehend as well.} -%\AH{Proof changed as suggested above. I aired on the verbose side for the sake of clarity.} For a given $\wVec \in \pw$, substituting definitions we have -\begin{align*} +\setcounter{equation}{2} +\begin{subequations} +\begin{align} &\expect{\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} = \nonumber\\ &\phantom{{}\sketchJParam{\sketchHashParam{\wVec}}}\expect{\big(\sum_{\substack{\wVecPrime \in \pw \st \\ - \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }. -\end{align*} -Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$. This yields -\[ -\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ + \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }\label{eq:step-one}\\. +%\end{align} +%Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$. This yields +%\[ +=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ \wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2 + \sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ - \wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}} -\] which can be written as -\[ -\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ + \wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-two}\\ +%\] which can be written as +%\[ +=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ \wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2} + \expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ + \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec} \\ + \wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-three}\\ +%\] from which the last term evaluates to $0$ and we have +%\[ +=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ - \wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}} -\] from which the last term evaluates to $0$ and we have -\[ -\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\ - \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\ - \wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2} -\] which in turn + \wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2}\label{eq:step-four}\\ +%\] +=&~\kMapParam{\wVec}\label{eq:step-five} +\end{align} +\end{subequations} +\begin{Justification} +\hfill + \begin{itemize} + \item \eq{\eqref{eq:step-one}} is a substitution of the definition of $\sketch$. + \item \eq{\eqref{eq:step-two}} uses the commutativity of addition to rearrange the sum. + \item \eq{\eqref{eq:step-three}} uses linearity of expectation to reduce the large expectation into smaller expectations. + \item \eq{\eqref{eq:step-four}} follows from the second term of \eq{eq:step-three} evaluating to zero. This assumes pairwise independence of $\sketchPolar.$ + \item \eq{\eqref{eq:step-five}} follows from the squaring of the $\sketchPolarParam{\wVec}$ term, which will always evaluate to 1. Keep in mind that in the summation we trivially have only 1 $\wVecPrime$ which equals $\wVec$. + \end{itemize} +\end{Justification} + %which in turn %\begin{multline*} %\mathbb{E}\big[\kMapParam{\wVecPrime_0}\cdot \sketchPolarParam{\wVecPrime_0} + \cdots \\ %+\kMapParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}+ \cdots \\ @@ -52,11 +66,19 @@ Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \ %\end{multline*} %\AH{break it up into w' and w} %Due to the uniformity of $\sketchPolar$, we have -\begin{equation*} -= \kMapParam{\wVec}, -\end{equation*} +%\begin{equation*} +%= \kMapParam{\wVec}, +%\end{equation*} thus verifying \eqref{eq:single-est}. +\begin{Assumption} +\hfill + \begin{itemize} + \item \eq{\eqref{eq:step-three}} assumes that $\sketchPolar$ is pairwise independent. + %\item $\sketchHash$ is uniformly distributed. + \end{itemize} +\end{Assumption} + Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWorlds-est} also must hold. %We can now take \eqref{eq:single-est}, substitute it in for \eqref{eq:allWorlds-est} and show by linearity of expectation that \eqref{eq:allWorlds-est} holds. %\begin{align} @@ -93,13 +115,13 @@ Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWor %\AH{Thank you for clarifying this, as I have always wondered what the convention was for display equations. Hopefully, I haven't missed any end display equations in this paper, and have them all fixed properly.} For the next step, we show that the variance of an estimate is small.%$$\varParam{\estimate}$$ - +\begin{subequations} \begin{align} -&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\nonumber\\ -=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\nonumber\\%\estExpOne}\\ +&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\\%\nonumber\\ +=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\label{eq:var_step-one}\\%\nonumber\\%\estExpOne}\\ =~& \mathbb{E}\big[\big(\sum_{\substack{ \wVec, \wVecPrime \in \pw \st \\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}\nonumber\\ -&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\nonumber\\ +&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\label{eq:var_step-two}\\%\nonumber\\ =~&\mathbb{E}\big[\sum_{\substack{ \wVec_1, \wVec_2,\\ \wVecPrime_1, \wVecPrime_2 \in \pw,\\ @@ -108,6 +130,23 @@ For the next step, we show that the variance of an estimate is small.%$$\varPara }}\kMapParam{\wVec_1} \kMapParam{\wVec_2}\sketchPolarParam{\wVec_1}\sketchPolarParam{\wVec_2}\sketchPolarParam{\wVecPrime_1}\sketchPolarParam{\wVecPrime_2}\big]\nonumber\\ &\qquad - \left(\sum_{\wVec \in \pw}\kMapParam{\wVec}\right)^2 \label{eq:var-sum-w}. \end{align} +\end{subequations} + +\begin{Justification} +\hfill + \begin{itemize} + \item \eq{\eqref{eq:var_step-one}} follows from substituting the definition of $\sketch$ and the commutativity of addition. Note the constraint on $\sketchHash$ hashing to the same bucket follows from the definition of $\sketch$. Also, the sum can be rearranged to take each component item in the sum of each bucket and take its sum of products with each of the $\sketchPolar$ mapped to it. This can be done as previously stated, using the commutativity of addition. + \item \eq{\eqref{eq:var_step-two}} by substituting the definition of variance. + \item \eq{\eqref{eq:var-sum-w}} results from the further evaluation of \eqref{eq:var_step-two}. + \end{itemize} +\end{Justification} +\begin{Assumption} +\hfill + \begin{itemize} + \item The subsequent evaluations of expectation assume 4-wise independence of $\sketchPolar$. + \end{itemize} +\end{Assumption} +Testing: $\norm{\kMap{t}}^2_2$. %\AR{The $-\mu^2$ term is missing in the above.} %\AH{$\mu^2$ added.} @@ -236,10 +275,12 @@ With only \eqref{eq:variantTwo} and \eqref{eq:variantThree} remaining, we have Our current analysis is limited to TIPDBs, where the annotations are in the boolean $\mathbb{B}$ set. Because this is the case, the square of any element is itself. Computing each term separately we have \begin{align} &\expect{\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\ - \wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =\numWorldsP \cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\ + \wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =%\numWorldsP +\norm{\kMap{t}}^2_2\cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\ &\expect{ \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\ \wVec \neq \wVecPrime,\\ - \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = \numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}. + \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = %\numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}. +\norm{\kMap{t}} \cdot \frac{\norm{\kMap{t}}\prob - \frac{\norm{\kMap{t}}}{\numWorlds}}{\sketchCols}\label{eq:spaceTwo}. \end{align} In both equations, the sum of $\kMapParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability. diff --git a/macros.tex b/macros.tex index 50677f2..e8476ca 100644 --- a/macros.tex +++ b/macros.tex @@ -160,7 +160,16 @@ \newtheorem{Corollary}{Corollary} \newtheorem{Example}{Example} \newtheorem{Axiom}{Axiom} +\definecolor{db}{RGB}{23,20,119} +\definecolor{dg}{RGB}{2,101,15} +\newtheoremstyle{assumption}{}{}{\color{blue}\itshape}{}{\color{blue}\bfseries}{:}{\newline}{} +\theoremstyle{assumption} \newtheorem{Assumption}{Assumption} +\newtheoremstyle{justification}{}{}{\color{green}\itshape}{}{\color{green}\bfseries}{:}{\newline}{} +\theoremstyle{justification} +\newtheorem{Justification}{Justification} +\newcommand{\eq}[1]{Equation {#1}} +\newcommand{\norm}[1]{\|{#1}\|} \newcommand{\proofpara}[1]{\medskip\noindent\underline{{#1}:}}