\AR{This ia a notational nitpick but I would prefer it if this section was written for a function $v: W\to K$ and not neccessarily the special case of $v=v_t$. In particular, there is no nottion of probablitty $p$. At some point, we'll have to revisit this but I think it would be good to have the analysis in this section be for arbirary functuon $v$ and not the specific one from the TIDB. Note that this means that you should not have the first two equations in this section.}
We begin the analysis by showing that with high probability an estimate is approximately $\numWorldsP$, where $p$ is a tuple's probability measure for a given TIPD. Note that
We start off by making the claim that the expectation of the estimate of a tuple t's membership across all worlds is $\sum\limits_{\wVec\in\pw}\kMapParam{\wVec}$, formally
To verify this claim, we argue that the expectation of the estimate of a tuple's appearance in single world is its annotation,\AR{Again this claim should be for every $\mathbf{w}\in W$ and not related to whether $t$ appears in a world or not.} i.e.
\AR{The numbering of the equations above is a bit off: you go from (4) to (3a) and so on. Also for the case when $\mathbf{w}=\mathbf{w'}$ there is no need to sum over $\mathbf{w},\mathbf{w'}\in W$-- it just makes things confusing-- sjust sum over $\mathbf{w}'\in W$.}
\item\eq{\eqref{eq:step-two}} uses the commutativity of addition to rearrange the sum. \AR{Technically this is using associtivity but this is a nitpick.}
\item\eq{\eqref{eq:step-three}} uses linearity of expectation to reduce the large expectation into smaller expectations. \AR{I would puch the expectation further in so that they only deal with the $s_i$ terms.}
\item\eq{\eqref{eq:step-four}} follows from the second term of \eq{eq:step-three} evaluating to zero. This assumes pairwise independence of $\sketchPolar.$
\item\eq{\eqref{eq:step-five}} follows from the squaring of the $\sketchPolarParam{\wVec}$ term, which will always evaluate to 1. Keep in mind that in the summation we trivially have only 1 $\wVecPrime$ which equals $\wVec$.
Since \eqref{eq:single-est} holds, by linearity of expectation, \eqref{eq:allWorlds-est} also must hold.
%We can now take \eqref{eq:single-est}, substitute it in for \eqref{eq:allWorlds-est} and show by linearity of expectation that \eqref{eq:allWorlds-est} holds.
%\AR{A general comment: The last display equation should have a period at the end. The idea is that display equations are considered part of a sentence and every sentence should end with a period.}
%\AH{Thank you for clarifying this, as I have always wondered what the convention was for display equations. Hopefully, I haven't missed any end display equations in this paper, and have them all fixed properly.}
\item\eq{\eqref{eq:var_step-one}} follows from substituting the definition of $\sketch$ and the commutativity of addition. Note the constraint on $\sketchHash$ hashing to the same bucket follows from the definition of $\sketch$. Also, the sum can be rearranged to take each component item in the sum of each bucket and take its sum of products with each of the $\sketchPolar$ mapped to it. This can be done as previously stated, using the commutativity of addition.
\item\eq{\eqref{eq:var_step-two}} by substituting the definition of variance.
\item\eq{\eqref{eq:var-sum-w}} results from the further evaluation of \eqref{eq:var_step-two}.
\end{itemize}
\end{Justification}
\begin{Assumption}
\hfill
\begin{itemize}
\item The subsequent evaluations of expectation assume 4-wise independence of $\sketchPolar$.
Note that four-wise independence is assumed across all four random variables of \eqref{eq:var-sum-w}. Zooming in on the products of the $\sketchPolar$ functions,
we see that %it can be seen that for $\wOne, \wOneP \in \pw$ and $\wTwo, \wTwoP \in \pw'$, all four random variables in \eqref{eq:polar-product} take their values from $\pw$, although we have iteration over two separate sets $\pw$.
there are five possible sets of $\wVec$ variable combinations, namely for $a, b, c, d \in\{1, 1', 2, 2'\}\st a \neq b \neq c \neq d$:
\AR{This confused me a lot to start off with. I think it is better to use $a,b,c,d$ only in the definitions of $S_1$ to $S_5$ where it is needed. In particular, it is not the case in $S_1$ to $S_3$ that you look at all possible assignment of $a, b, c, d \in\{1, 1', 2, 2'\}$.}
\AR{I think the definitions above need more work and/or there needs to be a justification for why $S_1$ to $S_2$ partition all the possibilities.}
Note that each $\wVec$ is the preimage of the same $\sketchPolar$ function, meaning, that equal worlds produce the same element in the image of $\sketchPolar$. \AR{I am not sure what the senetence above is saying.}
We are interested in those particular cases whose expectation does not equal zero, since an expectation of zero will not add to the summation of \eqref{eq:var-sum-w}. In expectation we have that
because the same element of the image of $\sketchPolar$ is being multiplied to itself for each equality, producing a polarity of 1 for each equality, and then a final product of 1. For $\distPattern{3}, \distPattern{4}, \distPattern{5}$, we have a final product of two, three or four independent variables $\in\{-1, 1\}$, thus producing the following results:
For the distribution pattern $\cTwo$, we have three subsets $\distPattern{21}, \distPattern{22}, \distPattern{23}\subseteq\distPattern{2}$ to consider.
Note that for $\distPattern{22}$, we have the cardinality of a bucket as a multiplicative factor for each squared annotation. This is because of the constraint that $\wOne\neq\wOneP$ coupled with the additional constraint that $\sketchHashParam{\wOne}=\sketchHashParam{\wOneP}$. Since $\wOneP$ must belong to the same bucket as $\wOne$, yet not equal to $\wOne$, we have that each operand of the sum must be the annotation squared for each $\wOneP$ that belongs to the same bucket but is not equal to $\wOne$.
Looking at $\distPattern{23}$, we have a similar case as $\distPattern{22}$, but this time there is no multiplicative factor since $\wOneP$ and $\wTwoP$ are constrained to equal their opposite $\wVec$ counterparts, which are the arguments for both $\kMap{t}$ terms.
\item The LHS is the expectation squared. We obtain the RHS by first squaring the sum, and then, using the commutative property of addition, rearranging the operands of the summation.
%Our current analysis is limited to TIPDBs, where the annotations are in the boolean $\mathbb{B}$ set. Because this is the case, the square of any element is itself.
%In both equations, the sum of $\kMapParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability.
In equation \eqref{eq:spaceOne} we have the multiplicative factor which in expectation turns out to be the number of worlds $\numWorlds$ divided evenly across the number of buckets $\sketchCols$ minus the one tuple that $\wVecPrime$ cannot be. This factor is multiplied to sum of squares of each of the $\numWorldsP$ worlds that $t$ appears in.
Equation \eqref{eq:spaceTwo} has each of the $\numWorldsP$ worlds times all the rest of the worlds that tuple $t$ appears in within that bucket. This factor is represented by $\frac{\numWorldsP-1}{\sketchCols}$, i.e. we have a world in a given bucket $j$ in which tuple $t$ appears, being summed over each of its products with other worlds in which it is present in bucket $j$.
Recall that $\sdRel=\frac{\sd}{\mu}$ where $\mu$ is defined as $\numWorldsP$ in \eqref{eq:mu} for TIDB and $\norm{\kMap{t}}\prob$ for general $\kMap{t}$ in \eqref{eq:gen-mu}.
Since the sketch has multiple trials, a probability of exceeding error bound $\errB$ smaller than one half guarantees an estimate that is less than or equal to the error bound when taking the median of all trials. Expressing the error relative to $\mu$ in Chebyshev's Inequality yields
%\AR{It would be better to state the deviation as say $\Delta$ instead of $\epsilon\mu$. Then derive the expression for $B$ in terms of $N,p,\Delta$. Then you can state as consequences what values of $B$ you get for the special cases of $\Delta=\epsilon\cdot 2^N$ and $\Delta=\epsilon\mu$.}
For the case when $\Delta=\mu\epsilon$, taking both Chebyshev bounds, setting them equal to each other, simplifying and solving for $\sketchCols$ results in