diff --git a/analysis.tex b/analysis.tex index abb7930..eb66246 100644 --- a/analysis.tex +++ b/analysis.tex @@ -1,6 +1,9 @@ % -*- root: main.tex -*- \section{Analysis} \label{sec:analysis} + +\AR{This ia a notational nitpick but I would prefer it if this section was written for a function $v: W\to K$ and not neccessarily the special case of $v=v_t$. In particular, there is no nottion of probablitty $p$. At some point, we'll have to revisit this but I think it would be good to have the analysis in this section be for arbirary functuon $v$ and not the specific one from the TIDB. Note that this means that you should not have the first two equations in this section.} + We begin the analysis by showing that with high probability an estimate is approximately $\numWorldsP$, where $p$ is a tuple's probability measure for a given TIPD. Note that \begin{equation} %\gVt{k\cdot} @@ -15,7 +18,7 @@ We start off by making the claim that the expectation of the estimate of a tuple \begin{equation} \expect{\sum_{\wVec \in \pw} \sketchJParam{\sketchHashParam{\wVec}} \cdot \sketchPolarParam{\wVec}} = \sum_{\wVec \in \pw}\kMapParam{\wVec}\label{eq:allWorlds-est}. \end{equation} -To verify this claim, we argue that the expectation of the estimate of a tuple's appearance in single world is its annotation, i.e. +To verify this claim, we argue that the expectation of the estimate of a tuple's appearance in single world is its annotation,\AR{Again this claim should be for every $\mathbf{w}\in W$ and not related to whether $t$ appears in a world or not.} i.e. \begin{equation} \expect{\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}} = \kMapParam{\wVec} \label{eq:single-est}. \end{equation} @@ -52,12 +55,14 @@ For a given $\wVec \in \pw$, substituting definitions we have =&~\kMapParam{\wVec}\label{eq:step-five} \end{align} \end{subequations} + +\AR{The numbering of the equations above is a bit off: you go from (4) to (3a) and so on. Also for the case when $\mathbf{w}=\mathbf{w'}$ there is no need to sum over $\mathbf{w},\mathbf{w'}\in W$-- it just makes things confusing-- sjust sum over $\mathbf{w}'\in W$.} \begin{Justification} \hfill \begin{itemize} \item \eq{\eqref{eq:step-one}} is a substitution of the definition of $\sketch$. - \item \eq{\eqref{eq:step-two}} uses the commutativity of addition to rearrange the sum. - \item \eq{\eqref{eq:step-three}} uses linearity of expectation to reduce the large expectation into smaller expectations. + \item \eq{\eqref{eq:step-two}} uses the commutativity of addition to rearrange the sum. \AR{Technically this is using associtivity but this is a nitpick.} + \item \eq{\eqref{eq:step-three}} uses linearity of expectation to reduce the large expectation into smaller expectations. \AR{I would puch the expectation further in so that they only deal with the $s_i$ terms.} \item \eq{\eqref{eq:step-four}} follows from the second term of \eq{eq:step-three} evaluating to zero. This assumes pairwise independence of $\sketchPolar.$ \item \eq{\eqref{eq:step-five}} follows from the squaring of the $\sketchPolarParam{\wVec}$ term, which will always evaluate to 1. Keep in mind that in the summation we trivially have only 1 $\wVecPrime$ which equals $\wVec$. \end{itemize} @@ -157,6 +162,7 @@ Note that four-wise independence is assumed across all four random variables of \end{equation} we see that %it can be seen that for $\wOne, \wOneP \in \pw$ and $\wTwo, \wTwoP \in \pw'$, all four random variables in \eqref{eq:polar-product} take their values from $\pw$, although we have iteration over two separate sets $\pw$. there are five possible sets of $\wVec$ variable combinations, namely for $a, b, c, d \in \{1, 1', 2, 2'\} \st a \neq b \neq c \neq d$: +\AR{This confused me a lot to start off with. I think it is better to use $a,b,c,d$ only in the definitions of $S_1$ to $S_5$ where it is needed. In particular, it is not the case in $S_1$ to $S_3$ that you look at all possible assignment of $a, b, c, d \in \{1, 1', 2, 2'\}$.} \begin{align*} &\distPattern{1}:&\forElems{\cOne}\\ &\distPattern{2}:&\forElems{\cTwo}\\ @@ -164,7 +170,8 @@ there are five possible sets of $\wVec$ variable combinations, namely for $a, b, &\distPattern{4}:&\forElems{\cFour}\\ &\distPattern{5}:&\forElems{\cFive} \end{align*} -Note that each $\wVec$ is the preimage of the same $\sketchPolar$ function, meaning, that equal worlds produce the same element in the image of $\sketchPolar$. +\AR{I think the definitions above need more work and/or there needs to be a justification for why $S_1$ to $S_2$ partition all the possibilities.} +Note that each $\wVec$ is the preimage of the same $\sketchPolar$ function, meaning, that equal worlds produce the same element in the image of $\sketchPolar$. \AR{I am not sure what the senetence above is saying.} We are interested in those particular cases whose expectation does not equal zero, since an expectation of zero will not add to the summation of \eqref{eq:var-sum-w}. In expectation we have that \begin{align} @@ -280,6 +287,7 @@ Computing each term separately gives \norm{\kMap{t}}\prob \cdot \frac{\norm{\kMap{t}}\prob - \frac{\norm{\kMap{t}}}{\numWorlds}}{\sketchCols}\label{eq:spaceTwo}. \end{align} %In both equations, the sum of $\kMapParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability. +\AR{the above two need more work. Let's discuss more in the Aug 7 meeting.} In equation \eqref{eq:spaceOne} we have the multiplicative factor which in expectation turns out to be the number of worlds $\numWorlds$ divided evenly across the number of buckets $\sketchCols$ minus the one tuple that $\wVecPrime$ cannot be. This factor is multiplied to sum of squares of each of the $\numWorldsP$ worlds that $t$ appears in.