Chernoff Bound for M_sketch(delta)
parent
1c5b930b9c
commit
9a0049cd0a
11
analysis.tex
11
analysis.tex
|
@ -482,6 +482,17 @@ Setting $\Delta = \epsilon\numWorlds$ gives
|
|||
\end{align*}
|
||||
|
||||
Other cases for $\Delta$ can be solved similarly.
|
||||
|
||||
|
||||
|
||||
Spacing...\newline
|
||||
you\newline
|
||||
can\newline
|
||||
get
|
||||
rid
|
||||
of
|
||||
this
|
||||
later.
|
||||
\finOld
|
||||
|
||||
|
||||
|
|
|
@ -119,7 +119,7 @@ The case for an odd number of sketches can likewise be reduced to the even case
|
|||
\end{align*}
|
||||
We desire an expectation which yields the ground truth. Thus we seek to find sketch products whose expectation computes to the extraneous terms above in order to cancel them out.
|
||||
|
||||
One potential work around would be to store additional sketches with independent $\pol$ functions. Assuming independent $\pol$ functions between the $\mathcal{S}_1, \mathcal{S}_2$ and $\mathcal{S}_3, \mathcal{S}_4$ pairs allows us to use linearity of expectations resulting in
|
||||
One potential work around would be to store additional sketches with independent $\pol$ functions. For the first unwanted term, assuming independent $\pol$ functions between the $\mathcal{S}_1, \mathcal{S}_2$ and $\mathcal{S}_3, \mathcal{S}_4$ pairs allows us to use linearity of expectations resulting in
|
||||
|
||||
\begin{align*}
|
||||
&\expect{\sum_{j \in \sketchCols}\sCom{1}{j}\sCom{2}{j}\sCom{3}{j}\sCom{4}{j}}\\
|
||||
|
@ -132,9 +132,9 @@ which reduces by \eqref{eq:two-sk-prod} to
|
|||
\begin{equation*}
|
||||
\sum_{\wOne, \wTwo \in \pw}\gVP{1}{\wOne}\gVP{2}{\wOne}\cdot \sum_{\wThree, \wFour \in \pw}\gVP{3}{\wThree}\gVP{4}{\wFour}.
|
||||
\end{equation*}
|
||||
The remaining additional terms can be analogously found.
|
||||
\newline
|
||||
To compute variance, the independence of $\pol$ functions can be exploited as follows:
|
||||
|
||||
|
||||
To compute variance of the above product, the independence of $\pol$ functions can be exploited as follows:
|
||||
\begin{align}
|
||||
&\varParam{\sum_{j \in \sketchCols}\sCom{1}{j}\sCom{2}{j}\sCom{3}{j}\sCom{4}{j}}\nonumber\\
|
||||
&= \varParam{\sum_{j \in \sketchCols}\sum_{\substack{\wOne, \wTwo,\\ \wThree, \wFour \in \pw \st\\\hashP{\wOne} =\hashP{\wTwo}\\=\hashP{\wThree} = \hashP{\wFour}}}\gVP{1}{\wOne}\polI{1}{\wOne}\gVP{2}{\wTwo}\polI{1}{\wTwo}\gVP{3}{\wThree}\polI{2}{\wThree}\gVP{4}{\wFour}\polI{2}{\wFour}}\nonumber \\
|
||||
|
@ -149,8 +149,11 @@ Expanding the first term, we have
|
|||
\begin{align}
|
||||
&\sum_{j \in \sketchCols}\expect{\sum_{\substack{\wOne, \wTwo \in \pw \st \\ \hashP{\wOne} = \hashP{\wTwo}}}\gVP{1}{\wOne}\polI{1}{\wOne}\gVP{1}{\wOne'}\polI{1}{\wOne'}\gVP{2}{\wTwo}\polI{1}{\wTwo}\gVP{2}{\wTwo'}\polI{1}{\wTwo'}}\nonumber \\
|
||||
&= \sum_{\wOne \in \pw}\gVP{1}{\wOne}^2\gVP{2}{\wOne}^2 +\sum_{\substack{\wOne, \wTwo \in \pw \st\\ \wOne \neq \wTwo}}\gVP{1}{\wOne}^2\gVP{2}{\wTwo}^2 + \nonumber \\
|
||||
&\qquad 2 \cdot \left(\sum_{\substack{\wOne, \wTwo \in \pw \st\\ \wOne \neq \wTwo}}\gVP{1}{\wOne}\gVP{2}{\wOne}\gVP{1}{\wTwo}\gVP{2}{\wTwo}\right) \nonumber
|
||||
&\qquad 2 \cdot \left(\sum_{\substack{\wOne, \wTwo \in \pw \st\\ \wOne \neq \wTwo}}\gVP{1}{\wOne}\gVP{2}{\wOne}\gVP{1}{\wTwo}\gVP{2}{\wTwo}\right). \nonumber
|
||||
\end{align}
|
||||
The second term expands analogously, leaving the product of the two expansions minus the expectation squared term.
|
||||
|
||||
The expectation and variance calculations for the remaining additional terms can be analogously found.
|
||||
\startOld{Evaluating Estimate 2}
|
||||
\newline For $\est{2}$, this would result in
|
||||
\begin{align*}
|
||||
|
|
|
@ -0,0 +1,34 @@
|
|||
% -*- root: main.tex -*-
|
||||
|
||||
\section{Bounding the Estimates}
|
||||
|
||||
\newcommand{\bMu}{\epsilon\mu_{\sketchCols_{sum}}}
|
||||
|
||||
For a $\sketchCols$ estimate, denoted $\sketchCols_{est}$, and given the following:
|
||||
|
||||
\begin{align*}
|
||||
&\bMu \text{ is the expectation for the sum of estimates.}\\
|
||||
&X = \sum_{i = 1}^{\sketchRows}X_i \\
|
||||
&X_i\text{ is i.i.d. r.v.} \in [0, 1], i \in \sketchRows \\
|
||||
&X_i = \begin{cases}
|
||||
0 &\sketchCols_{est} > \bMu\\
|
||||
1 &\sketchCols_{est} \leq \bMu
|
||||
\end{cases}\\
|
||||
&p[X_i = 1] \geq \frac{2}{3}\\
|
||||
&p[X_i = 0] \leq \frac{1}{3}\\
|
||||
&\mu = \frac{2}{3}\sketchRows\\
|
||||
&\epsilon = 0.5
|
||||
\end{align*}
|
||||
|
||||
Because Chebyshev bounds tell us that the probability of a bad row estimate is $\leq \frac{1}{3}$, we set epsilon to the value that, when multiplied to $\mu$, outputs $\frac{1}{3}$. We then derive bounds for $\sketchRows$.
|
||||
Note, because we are only concerned with the left side of the tail, we can use the generic Chernoff bounds for the left tail,
|
||||
\begin{equation*}
|
||||
Pr[|X - \mu| \leq (1 - \epsilon)\mu] \leq e^{-\frac{\epsilon^2}{2 + \epsilon}\mu}.
|
||||
\end{equation*}
|
||||
Solving for $\delta$,
|
||||
\begin{align*}
|
||||
\delta \geq e^{-\frac{(\frac{1}{3})^2}{2 + \frac{1}{3}}\frac{2}{3}\sketchRows}\\
|
||||
\delta \geq e^{-\frac{63}{2}\sketchRows}\\
|
||||
e^{\frac{63}{2}\sketchRows} \geq \frac{1}{\delta}\\
|
||||
\sketchRows \geq \frac{63}{2}ln(\frac{1}{\delta})
|
||||
\end{align*}
|
Loading…
Reference in New Issue