347 lines
23 KiB
TeX
347 lines
23 KiB
TeX
% -*- root: main.tex -*-
|
|
\section{Analysis}
|
|
\label{sec:analysis}
|
|
|
|
\AR{This ia a notational nitpick but I would prefer it if this section was written for a function $v: W\to K$ and not neccessarily the special case of $v=v_t$. In particular, there is no nottion of probablitty $p$. At some point, we'll have to revisit this but I think it would be good to have the analysis in this section be for arbirary functuon $v$ and not the specific one from the TIDB. Note that this means that you should not have the first two equations in this section.}
|
|
|
|
We begin the analysis by showing that with high probability an estimate is approximately $\numWorldsP$, where $p$ is a tuple's probability measure for a given TIPD. Note that
|
|
\begin{equation}
|
|
%\gVt{k\cdot}
|
|
\numWorldsP = \numWorldsSum\label{eq:mu}.
|
|
\end{equation}
|
|
Furthermore, when $\kMap{t}$ is generalized to have elements in the range $\left[0, \infty\right]$, we obtain the result
|
|
\begin{equation}
|
|
\norm{\kMap{t}}\prob = \numWorldsSum\label{eq:gen-mu}.
|
|
\end{equation}
|
|
|
|
We start off by making the claim that the expectation of the estimate of a tuple t's membership across all worlds is $\sum\limits_{\wVec \in \pw}\kMapParam{\wVec}$, formally
|
|
\begin{equation}
|
|
\expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} = \sum_{\wVec \in \pw}\kMapParam{\wVec}\label{eq:allWorlds-est}.
|
|
\end{equation}
|
|
To verify this claim, we argue that the expectation of the estimate of a tuple's appearance in single world is its annotation,\AR{Again this claim should be for every $\mathbf{w}\in W$ and not related to whether $t$ appears in a world or not.} i.e.
|
|
\begin{equation}
|
|
\expect{\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}} = \kMapParam{\wVec} \label{eq:single-est}.
|
|
\end{equation}
|
|
For a given $\wVec \in \pw$, substituting definitions we have
|
|
\setcounter{equation}{2}
|
|
\begin{subequations}
|
|
\begin{align}
|
|
&\expect{\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} = \nonumber\\
|
|
&\phantom{{}\sketchJParam{\sketchHashParam{\wVec}}}\expect{\big(\sum_{\substack{\wVecPrime \in \pw \st \\
|
|
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime}\big) \cdot \sketchPolarParam{\wVec} }\label{eq:step-one}\\.
|
|
%\end{align}
|
|
%Since $\wVec \in \pw$, we know that for $\wVecPrime\in \pw, \exists \wVecPrime \st \wVecPrime = \wVec$. This yields
|
|
%\[
|
|
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
|
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
|
|
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2 +
|
|
\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
|
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
|
|
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-two}\\
|
|
%\] which can be written as
|
|
%\[
|
|
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
|
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
|
|
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2} +
|
|
\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
|
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec} \\
|
|
\wVecPrime \neq \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}}\label{eq:step-three}\\
|
|
%\] from which the last term evaluates to $0$ and we have
|
|
%\[
|
|
=&~\expect{\sum\limits_{\substack{\wVecPrime, \wVec \in \pw \st \\
|
|
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},\\
|
|
\wVecPrime = \wVec}}\kMapParam{\wVecPrime}\sketchPolarParam{\wVecPrime}^2}\label{eq:step-four}\\
|
|
%\]
|
|
=&~\kMapParam{\wVec}\label{eq:step-five}
|
|
\end{align}
|
|
\end{subequations}
|
|
|
|
\AR{The numbering of the equations above is a bit off: you go from (4) to (3a) and so on. Also for the case when $\mathbf{w}=\mathbf{w'}$ there is no need to sum over $\mathbf{w},\mathbf{w'}\in W$-- it just makes things confusing-- sjust sum over $\mathbf{w}'\in W$.}
|
|
\begin{Justification}
|
|
\hfill
|
|
\begin{itemize}
|
|
\item \eq{\eqref{eq:step-one}} is a substitution of the definition of $\sketch$.
|
|
\item \eq{\eqref{eq:step-two}} uses the commutativity of addition to rearrange the sum. \AR{Technically this is using associtivity but this is a nitpick.}
|
|
\item \eq{\eqref{eq:step-three}} uses linearity of expectation to reduce the large expectation into smaller expectations. \AR{I would puch the expectation further in so that they only deal with the $s_i$ terms.}
|
|
\item \eq{\eqref{eq:step-four}} follows from the second term of \eq{eq:step-three} evaluating to zero. This assumes pairwise independence of $\sketchPolar.$
|
|
\item \eq{\eqref{eq:step-five}} follows from the squaring of the $\sketchPolarParam{\wVec}$ term, which will always evaluate to 1. Keep in mind that in the summation we trivially have only 1 $\wVecPrime$ which equals $\wVec$.
|
|
\end{itemize}
|
|
\end{Justification}
|
|
%which in turn
|
|
%\begin{multline*}
|
|
%\mathbb{E}\big[\kMapParam{\wVecPrime_0}\cdot \sketchPolarParam{\wVecPrime_0} + \cdots \\
|
|
%+\kMapParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}\cdot \sketchPolarParam{\wVecPrime_j}+ \cdots \\
|
|
%+ \kMapParam{\wVecPrime_n}\sketchPolarParam{\wVecPrime_n}\big]
|
|
%\end{multline*}
|
|
%\AH{break it up into w' and w}
|
|
%Due to the uniformity of $\sketchPolar$, we have
|
|
%\begin{equation*}
|
|
%= \kMapParam{\wVec},
|
|
%\end{equation*}
|
|
thus verifying \eqref{eq:single-est}.
|
|
|
|
\begin{Assumption}
|
|
\hfill
|
|
\begin{itemize}
|
|
\item \eq{\eqref{eq:step-three}} assumes that $\sketchPolar$ is pairwise independent.
|
|
%\item $\sketchHash$ is uniformly distributed.
|
|
\end{itemize}
|
|
\end{Assumption}
|
|
|
|
Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWorlds-est} also must hold.
|
|
%We can now take \eqref{eq:single-est}, substitute it in for \eqref{eq:allWorlds-est} and show by linearity of expectation that \eqref{eq:allWorlds-est} holds.
|
|
%\begin{align}
|
|
%&\expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} \nonumber\\
|
|
%&= \expect{\sum_{\wVecPrime \in \pw}\kMapParam{\wVecPrime} \cdot \sketchPolarParam{\wVecPrime} \cdot \sum_{\substack{\wVec \in \pw \st \\
|
|
%\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\sketchPolarParam{\wVec}}\nonumber\\
|
|
%&= \sum_{\wVec \in \pw} \expect{\left( \sum_{\substack{\wVecPrime \in \pw \st \\
|
|
%\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\kMapParam{\wVecPrime}\cdot\sketchPolarParam{\wVecPrime}\right) \cdot \sketchPolarParam{\wVec}}\nonumber\\
|
|
%&= \sum_{\wVec \in \pw}\kMapParam{\wVec}\label{eq:estExpect}.
|
|
%\end{align}
|
|
|
|
%\begin{align}
|
|
%&\expect{\estimate}\\
|
|
%=&\expect{\estExpOne}\\
|
|
%=&\expect{\sum_{\substack{j \in [B],\\
|
|
% \wVec \in \pw~|~ \sketchHash{i}[\wVec] = j,\\
|
|
% \wVec[w']\in \pw~|~ \sketchHash{i}[\wVec[w']] = j} } v_t[\wVec] \cdot s_i[\wVec] \cdot s_i[\wVec[w']]}\\
|
|
%=&\multLineExpect\big[\sum_{\substack{j \in [B],\\
|
|
% \wVec~|~\sketchHashParam{\wVec}= j,\\
|
|
% \wVecPrime~|~\sketchHashParam{\wVecPrime} = j,\\
|
|
% \wVec = \wVecPrime}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime} + \nonumber \\
|
|
%&\phantom{{}\kMapParam{\wVec}}\sum_{\substack{j \in [B], \\
|
|
% \wVec~|~\sketchHashParam{\wVec} = j,\\
|
|
% \wVecPrime ~|~ \sketchHashParam{\wVecPrime} = j,\\ \wVec \neq \wVecPrime}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot\sketchPolarParam{\wVecPrime}\big]\textit{(by linearity of expectation)}\\
|
|
%=&\expect{\sum_{\substack{j \in [B],\\
|
|
% \wVec~|~\sketchHashParam{\wVec}= j,\\
|
|
% \wVecPrime~|~\sketchHashParam{\wVecPrime} = j,\\
|
|
% \wVec = \wVecPrime}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}} \nonumber \\
|
|
%&\phantom{{}\big[}\textit{(by uniform distribution in the second summation)}\\
|
|
%=& \estExp \label{eq:estExpect}
|
|
%\end{align}
|
|
|
|
%\AR{A general comment: The last display equation should have a period at the end. The idea is that display equations are considered part of a sentence and every sentence should end with a period.}
|
|
%\AH{Thank you for clarifying this, as I have always wondered what the convention was for display equations. Hopefully, I haven't missed any end display equations in this paper, and have them all fixed properly.}
|
|
|
|
For the next step, we show that the variance of an estimate is small.%$$\varParam{\estimate}$$
|
|
\begin{subequations}
|
|
\begin{align}
|
|
&\varParam{\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\\%\nonumber\\
|
|
=~&\varParam{\sum_{\wVec \in \pw}\kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\sketchPolarParam{\wVecPrime}}\label{eq:var_step-one}\\%\nonumber\\%\estExpOne}\\
|
|
=~& \mathbb{E}\big[\big(\sum_{\substack{ \wVec, \wVecPrime \in \pw \st \\
|
|
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}\nonumber\\
|
|
&\qquad - \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}}\big)^2\big]\label{eq:var_step-two}\\%\nonumber\\
|
|
=~&\mathbb{E}\big[\sum_{\substack{
|
|
\wVec_1, \wVec_2,\\
|
|
\wVecPrime_1, \wVecPrime_2 \in \pw,\\
|
|
\sketchHashParam{\wVec_1} = \sketchHashParam{\wVecPrime_1},\\
|
|
\sketchHashParam{\wVec_2} = \sketchHashParam{\wVecPrime_2}
|
|
}}\kMapParam{\wVec_1} \kMapParam{\wVec_2}\sketchPolarParam{\wVec_1}\sketchPolarParam{\wVec_2}\sketchPolarParam{\wVecPrime_1}\sketchPolarParam{\wVecPrime_2}\big]\nonumber\\
|
|
&\qquad - \left(\sum_{\wVec \in \pw}\kMapParam{\wVec}\right)^2 \label{eq:var-sum-w}.
|
|
\end{align}
|
|
\end{subequations}
|
|
|
|
\begin{Justification}
|
|
\hfill
|
|
\begin{itemize}
|
|
\item \eq{\eqref{eq:var_step-one}} follows from substituting the definition of $\sketch$ and the commutativity of addition. Note the constraint on $\sketchHash$ hashing to the same bucket follows from the definition of $\sketch$. Also, the sum can be rearranged to take each component item in the sum of each bucket and take its sum of products with each of the $\sketchPolar$ mapped to it. This can be done as previously stated, using the commutativity of addition.
|
|
\item \eq{\eqref{eq:var_step-two}} by substituting the definition of variance.
|
|
\item \eq{\eqref{eq:var-sum-w}} results from the further evaluation of \eqref{eq:var_step-two}.
|
|
\end{itemize}
|
|
\end{Justification}
|
|
\begin{Assumption}
|
|
\hfill
|
|
\begin{itemize}
|
|
\item The subsequent evaluations of expectation assume 4-wise independence of $\sketchPolar$.
|
|
\end{itemize}
|
|
\end{Assumption}
|
|
|
|
Note that four-wise independence is assumed across all four random variables of \eqref{eq:var-sum-w}. Zooming in on the products of the $\sketchPolar$ functions,
|
|
\begin{equation}
|
|
\sketchPolarParam{\wa}\cdot\sketchPolarParam{\wb}\cdot\sketchPolarParam{\wc}\cdot\sketchPolarParam{\wVecD} \label{eq:polar-product}
|
|
\end{equation}
|
|
we see that %it can be seen that for $\wOne, \wOneP \in \pw$ and $\wTwo, \wTwoP \in \pw'$, all four random variables in \eqref{eq:polar-product} take their values from $\pw$, although we have iteration over two separate sets $\pw$.
|
|
there are five possible sets of $\wVec$ variable combinations, namely for $a, b, c, d \in \{1, 1', 2, 2'\} \st a \neq b \neq c \neq d$:
|
|
\AR{This confused me a lot to start off with. I think it is better to use $a,b,c,d$ only in the definitions of $S_1$ to $S_5$ where it is needed. In particular, it is not the case in $S_1$ to $S_3$ that you look at all possible assignment of $a, b, c, d \in \{1, 1', 2, 2'\}$.}
|
|
\begin{align*}
|
|
&\distPattern{1}:&\forElems{\cOne}\\
|
|
&\distPattern{2}:&\forElems{\cTwo}\\
|
|
&\distPattern{3}:&\forElems{\cThree}\\
|
|
&\distPattern{4}:&\forElems{\cFour}\\
|
|
&\distPattern{5}:&\forElems{\cFive}
|
|
\end{align*}
|
|
\AR{I think the definitions above need more work and/or there needs to be a justification for why $S_1$ to $S_2$ partition all the possibilities.}
|
|
Note that each $\wVec$ is the preimage of the same $\sketchPolar$ function, meaning, that equal worlds produce the same element in the image of $\sketchPolar$. \AR{I am not sure what the senetence above is saying.}
|
|
|
|
We are interested in those particular cases whose expectation does not equal zero, since an expectation of zero will not add to the summation of \eqref{eq:var-sum-w}. In expectation we have that
|
|
\begin{align}
|
|
\forAllW{\distPattern{1}}&\rightarrow\expect{%\sum_{\substack{\elems \\
|
|
%\st \cOne}}
|
|
\polarProdEq} = 1 \label{eq:polar-prod-all}
|
|
\end{align}
|
|
since we have the same element of the image of $\sketchPolar$ being multiplied to itself an even number of times. Similarly,
|
|
\begin{align}
|
|
\forAllW{\distPattern{2}}&\rightarrow\expect{%\sum_{\substack{\elems \\
|
|
%\st \cTwo}}
|
|
\polarProdEq} = 1 \label{eq:polar-prod-two-and-two}
|
|
\end{align}
|
|
because the same element of the image of $\sketchPolar$ is being multiplied to itself for each equality, producing a polarity of 1 for each equality, and then a final product of 1. For $\distPattern{3}, \distPattern{4}, \distPattern{5}$, we have a final product of two, three or four independent variables $\in \{-1, 1\}$, thus producing the following results:
|
|
\begin{align}
|
|
\forAllW{\distPattern{3}}&\rightarrow\expect{%\sum_{\substack{\elems \\
|
|
%\st \cThree}}
|
|
\polarProdEq} = 0 \nonumber
|
|
\end{align}
|
|
\begin{align}
|
|
\forAllW{\distPattern{4}}&\rightarrow\expect{%\sum_{\substack{\elems \\
|
|
%\st \cFour}}
|
|
\polarProdEq} = 0 \nonumber
|
|
\end{align}
|
|
\begin{align}
|
|
\forAllW{\distPattern{5}}&\rightarrow\expect{%\sum_{\substack{\elems \\
|
|
%\st \cFive}}
|
|
\polarProdEq} = 0. \nonumber
|
|
\end{align}
|
|
|
|
|
|
Only equations \eqref{eq:polar-prod-all} and \eqref{eq:polar-prod-two-and-two} influence the $\var$ computation.
|
|
Considering $\distPattern{1}$ the variance results in
|
|
\begin{equation}
|
|
\distPatOne\label{eq:distPatOne}.
|
|
\end{equation}
|
|
This is the case because we have that
|
|
\begin{align*}
|
|
&\sum_{\substack{\wOne, \wOneP, \wTwo, \wTwoP \in \pw \st \\
|
|
\wOne = \wTwo = \wOneP = \wTwoP = \wVec}}
|
|
\kMapParam{\wVec} \cdot \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVec}\\
|
|
= &\sum_{\wVec \in \pw} \kMapParam{\wVec}\cdot \kMapParam{\wVec}\\
|
|
= &\sum_{\wVec \in \pw} \kMapParam{\wVec}^2.
|
|
\end{align*}
|
|
|
|
For the distribution pattern $\cTwo$, we have three subsets $\distPattern{21}, \distPattern{22}, \distPattern{23} \subseteq \distPattern{2}$ to consider.
|
|
\begin{align*}
|
|
&\distPattern{21}:&\cTwoV{\wOne}{\wOneP}{\wTwo}{\wTwoP} \\
|
|
&\distPattern{22}:&\cTwoV{\wOne}{\wTwo}{\wOneP}{\wTwoP}\\
|
|
&\distPattern{23}:&\cTwoV{\wOne}{\wTwoP}{\wOneP}{\wTwo}
|
|
\end{align*}
|
|
|
|
Considered separately, the subsets result in the following $\var$.
|
|
\begin{align}
|
|
&\wOne = \wOneP \neq \wTwo =\wTwoP \rightarrow\nonumber\\
|
|
&\qquad = \sum_{\substack{\wOne, \wOneP, \wTwo, \wTwoP \in \pw \st \\
|
|
\wOne = \wOneP = \wVec \neq\\
|
|
\wTwo = \wTwoP = \wVecPrime}}\kMapParam{\wVec}\kMapParam{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVecPrime} \label{eq:variantOne}\nonumber\\
|
|
&\qquad = \sum_{\wVec, \wVecPrime \in \pw \st \wVec \neq \wVecPrime}\kMapParam{\wVec}\kMapParam{\wVecPrime}\\
|
|
&\wOne = \wTwo \neq \wOneP = \wTwoP \rightarrow\nonumber\\
|
|
&\qquad = \sum_{\substack{\wOne, \wOneP, \wTwo, \wTwoP \in \pw \st \\
|
|
\wOne = \wTwo = \wVec \neq\\
|
|
\wOneP = \wTwoP = \wVecPrime,\\
|
|
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \kMapParam{\wVec}\kMapParam{\wVec}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\nonumber \\
|
|
&\qquad = \sum_{\wVec \in \pw}| \{\wVecPrime \st \wVecPrime \neq \wVec, \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2\label{eq:variantTwo} \\
|
|
&\wOne = \wTwoP \neq \wOneP =\wTwo \rightarrow \nonumber \\
|
|
&\qquad = \sum_{\substack{\wOne, \wOneP, \wTwo, \wTwoP \in \pw \st \\
|
|
\wOne = \wTwoP = \wVec \neq \\
|
|
\wOneP = \wTwo = \wVecPrime,\\
|
|
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \kMapParam{\wVec} \kMapParam{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec} \nonumber \\
|
|
&\qquad = \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\
|
|
\wVec \neq \wVecPrime,\\
|
|
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}\label{eq:variantThree}
|
|
\end{align}
|
|
Note that for $\distPattern{22}$, we have the cardinality of a bucket as a multiplicative factor for each squared annotation. This is because of the constraint that $\wOne \neq \wOneP$ coupled with the additional constraint that $\sketchHashParam{\wOne} = \sketchHashParam{\wOneP}$. Since $\wOneP$ must belong to the same bucket as $\wOne$, yet not equal to $\wOne$, we have that each operand of the sum must be the annotation squared for each $\wOneP$ that belongs to the same bucket but is not equal to $\wOne$.
|
|
|
|
Looking at $\distPattern{23}$, we have a similar case as $\distPattern{22}$, but this time there is no multiplicative factor since $\wOneP$ and $\wTwoP$ are constrained to equal their opposite $\wVec$ counterparts, which are the arguments for both $\kMap{t}$ terms.
|
|
|
|
|
|
Notice that the second term (expectation squared) of the $\var$ calculation is cancelled out by \eqref{eq:distPatOne} and \eqref{eq:variantOne}. %
|
|
|
|
\begin{equation*}
|
|
\big(\sum_{\wVec \in \pw}\kMapParam{\wVec}\big)^2 = \sum_{\wVec \in \pw}\kMapParam{\wVec}^2 +
|
|
\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\
|
|
\wVec \neq \wVecPrime}}\kMapParam{\wVec}\kMapParam{\wVecPrime}.%\distPatOne + \variantOne.
|
|
\end{equation*}
|
|
\begin{Justification}
|
|
\hfill
|
|
\begin{itemize}
|
|
\item The LHS is the expectation squared. We obtain the RHS by first squaring the sum, and then, using the commutative property of addition, rearranging the operands of the summation.
|
|
\end{itemize}
|
|
\end{Justification}
|
|
With only \eqref{eq:variantTwo} and \eqref{eq:variantThree} remaining, we have
|
|
|
|
\begin{multline*}
|
|
\varParam{\estimate} = \\
|
|
\expect{\sum_{\wVec, \wVecPrime \in \pw \st \wVec \neq \wVecPrime}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} ~+ \\
|
|
\expect{\sum_{\substack{\wVec, \wVecPrime \in \pw \st \\
|
|
\wVec \neq \wVecPrime,\\
|
|
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}}.
|
|
\end{multline*}
|
|
|
|
%Our current analysis is limited to TIPDBs, where the annotations are in the boolean $\mathbb{B}$ set. Because this is the case, the square of any element is itself.
|
|
|
|
Computing each term separately gives
|
|
\begin{align}
|
|
&\expect{\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\
|
|
\wVec \neq \wVecPrime}}| \{\wVecPrime \st \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \kMapParam{\wVec}^2} =%\numWorldsP
|
|
\norm{\kMap{t}}^2_2\prob\cdot \frac{\numWorlds}{\sketchCols} - 1\label{eq:spaceOne}\\
|
|
&\expect{ \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\
|
|
\wVec \neq \wVecPrime,\\
|
|
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\kMapParam{\wVec}\cdot\kMapParam{\wVecPrime}} = %\numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}.
|
|
\norm{\kMap{t}}\prob \cdot \frac{\norm{\kMap{t}}\prob - \frac{\norm{\kMap{t}}}{\numWorlds}}{\sketchCols}\label{eq:spaceTwo}.
|
|
\end{align}
|
|
%In both equations, the sum of $\kMapParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability.
|
|
\AR{the above two need more work. Let's discuss more in the Aug 7 meeting.}
|
|
|
|
In equation \eqref{eq:spaceOne} we have the multiplicative factor which in expectation turns out to be the number of worlds $\numWorlds$ divided evenly across the number of buckets $\sketchCols$ minus the one tuple that $\wVecPrime$ cannot be. This factor is multiplied to sum of squares of each of the $\numWorldsP$ worlds that $t$ appears in.
|
|
|
|
Equation \eqref{eq:spaceTwo} has each of the $\numWorldsP$ worlds times all the rest of the worlds that tuple $t$ appears in within that bucket. This factor is represented by $\frac{\numWorldsP - 1}{\sketchCols}$, i.e. we have a world in a given bucket $j$ in which tuple $t$ appears, being summed over each of its products with other worlds in which it is present in bucket $j$.
|
|
%\AR{Again, argue why the above claims are true.}
|
|
%\AH{All my arguing is plain English. Is there a better way to go about this?}
|
|
\eqref{eq:spaceOne} and \eqref{eq:spaceTwo} further reduce to
|
|
\begin{equation}
|
|
%\frac{2^{2N}(\prob + \prob^2)}{\sketchCols} - \numWorlds(\frac{\prob}{\sketchCols} + \prob)\label{eq:variance}
|
|
\norm{\kMap{t}}^2_2\prob\left(\numWorlds - 1\right) + \norm{\kMap{t}}\left(\norm{\kMap{t}} - \frac{\sketchCols}{\numWorlds}\right)\label{eq:variance}
|
|
\end{equation}
|
|
By \eqref{eq:variance} we have then
|
|
\begin{align*}
|
|
%\varSym &< 2^{2N}\big(\frac{2\prob}{\sketchCols}\big) \\
|
|
\varSym &< \norm{\kMap{t}}^2_2\prob(\numWorlds - 1) + (\norm{\kMap{t}}\prob)^2 \\
|
|
%\sd &<\sdEq\\
|
|
\sd &< \sqrt{\norm{\kMap{t}}^2_2\prob(\numWorlds - 1) + (\norm{\kMap{t}}\prob)^2} \\
|
|
%\sdRel& < \sqrt{\frac{2}{\sketchCols\prob}}.
|
|
\sdRel &< \frac{\sqrt{\norm{\kMap{t}}^2_2\prob(\numWorlds - 1) + (\norm{\kMap{t}}\prob)^2} }{\norm{\kMap{t}}\prob}
|
|
\end{align*}
|
|
Recall that $\sdRel = \frac{\sd}{\mu}$ where $\mu$ is defined as $\numWorldsP$ in \eqref{eq:mu} for TIDB and $\norm{\kMap{t}}\prob$ for general $\kMap{t}$ in \eqref{eq:gen-mu}.
|
|
|
|
Since the sketch has multiple trials, a probability of exceeding error bound $\errB$ smaller than one half guarantees an estimate that is less than or equal to the error bound when taking the median of all trials. Expressing the error relative to $\mu$ in Chebyshev's Inequality yields
|
|
\begin{equation*}
|
|
Pr\left[~|X - \mu|~> \Delta\right] < \frac{1}{3}.
|
|
%\cheby.
|
|
\end{equation*}
|
|
Substituting $\Delta = k\sigma \rightarrow k = \frac{\Delta}{\sigma} \rightarrow k^2 = \frac{\Delta^2}{\sigma^2}$ we have
|
|
\begin{equation*}
|
|
Pr\left[~|X - \mu|~> \Delta~\right] < \frac{\sigma^2}{\Delta^2}
|
|
\end{equation*}
|
|
%\AR{It would be better to state the deviation as say $\Delta$ instead of $\epsilon\mu$. Then derive the expression for $B$ in terms of $N,p,\Delta$. Then you can state as consequences what values of $B$ you get for the special cases of $\Delta=\epsilon\cdot 2^N$ and $\Delta=\epsilon\mu$.}
|
|
%\AH{Done.}
|
|
For the case when $\Delta = \mu\epsilon$, taking both Chebyshev bounds, setting them equal to each other, simplifying and solving for $\sketchCols$ results in
|
|
\begin{align*}
|
|
\frac{\sigma^2}{\Delta^2} &= \frac{1}{3}\\
|
|
\frac{ 2^{2N}\big(\frac{2\prob}{\sketchCols}\big)}{\mu^2\epsilon^2} &= \frac{1}{3}\\
|
|
\frac{2^{2N + 1}\prob}{\mu^2\epsilon^2\sketchCols} &= \frac{1}{3}\\
|
|
\frac{6 \cdot 2^{2N}\prob}{\mu^2\epsilon^2} &= \sketchCols \\
|
|
\frac{6}{p\epsilon^2} &= \sketchCols.
|
|
\end{align*}
|
|
In the above, recall that $\mu$ or the expectation of an estimate is $\numWorldsP$ as seen in equations \eqref{eq:mu} and \eqref{eq:allWorlds-est}.
|
|
|
|
Setting $\Delta = \epsilon\numWorlds$ gives
|
|
\begin{align*}
|
|
\frac{ 2^{2N}\big(\frac{2\prob}{\sketchCols}\big)}{\epsilon^22^{2N}} &= \frac{1}{3}\\
|
|
\frac{2^{2N+ 1}\prob}{\epsilon^22^{2N}\sketchCols} &= \frac{1}{3}\\
|
|
\frac{6 \cdot 2^{2N}\prob}{\epsilon^22^{2N}} &= \sketchCols \\
|
|
\frac{6\prob}{\epsilon^2} &= \sketchCols.
|
|
\end{align*}
|
|
|
|
Other cases for $\Delta$ can be solved similarly.
|
|
|
|
|
|
|
|
|