Chernoff Bound for M_sketch(delta)

master
Aaron Huber 2019-10-18 10:08:50 -04:00
parent 1c5b930b9c
commit 9a0049cd0a
4 changed files with 54 additions and 5 deletions

View File

@ -482,6 +482,17 @@ Setting $\Delta = \epsilon\numWorlds$ gives
\end{align*}
Other cases for $\Delta$ can be solved similarly.
Spacing...\newline
you\newline
can\newline
get
rid
of
this
later.
\finOld

View File

@ -119,7 +119,7 @@ The case for an odd number of sketches can likewise be reduced to the even case
\end{align*}
We desire an expectation which yields the ground truth. Thus we seek to find sketch products whose expectation computes to the extraneous terms above in order to cancel them out.
One potential work around would be to store additional sketches with independent $\pol$ functions. Assuming independent $\pol$ functions between the $\mathcal{S}_1, \mathcal{S}_2$ and $\mathcal{S}_3, \mathcal{S}_4$ pairs allows us to use linearity of expectations resulting in
One potential work around would be to store additional sketches with independent $\pol$ functions. For the first unwanted term, assuming independent $\pol$ functions between the $\mathcal{S}_1, \mathcal{S}_2$ and $\mathcal{S}_3, \mathcal{S}_4$ pairs allows us to use linearity of expectations resulting in
\begin{align*}
&\expect{\sum_{j \in \sketchCols}\sCom{1}{j}\sCom{2}{j}\sCom{3}{j}\sCom{4}{j}}\\
@ -132,9 +132,9 @@ which reduces by \eqref{eq:two-sk-prod} to
\begin{equation*}
\sum_{\wOne, \wTwo \in \pw}\gVP{1}{\wOne}\gVP{2}{\wOne}\cdot \sum_{\wThree, \wFour \in \pw}\gVP{3}{\wThree}\gVP{4}{\wFour}.
\end{equation*}
The remaining additional terms can be analogously found.
\newline
To compute variance, the independence of $\pol$ functions can be exploited as follows:
To compute variance of the above product, the independence of $\pol$ functions can be exploited as follows:
\begin{align}
&\varParam{\sum_{j \in \sketchCols}\sCom{1}{j}\sCom{2}{j}\sCom{3}{j}\sCom{4}{j}}\nonumber\\
&= \varParam{\sum_{j \in \sketchCols}\sum_{\substack{\wOne, \wTwo,\\ \wThree, \wFour \in \pw \st\\\hashP{\wOne} =\hashP{\wTwo}\\=\hashP{\wThree} = \hashP{\wFour}}}\gVP{1}{\wOne}\polI{1}{\wOne}\gVP{2}{\wTwo}\polI{1}{\wTwo}\gVP{3}{\wThree}\polI{2}{\wThree}\gVP{4}{\wFour}\polI{2}{\wFour}}\nonumber \\
@ -149,8 +149,11 @@ Expanding the first term, we have
\begin{align}
&\sum_{j \in \sketchCols}\expect{\sum_{\substack{\wOne, \wTwo \in \pw \st \\ \hashP{\wOne} = \hashP{\wTwo}}}\gVP{1}{\wOne}\polI{1}{\wOne}\gVP{1}{\wOne'}\polI{1}{\wOne'}\gVP{2}{\wTwo}\polI{1}{\wTwo}\gVP{2}{\wTwo'}\polI{1}{\wTwo'}}\nonumber \\
&= \sum_{\wOne \in \pw}\gVP{1}{\wOne}^2\gVP{2}{\wOne}^2 +\sum_{\substack{\wOne, \wTwo \in \pw \st\\ \wOne \neq \wTwo}}\gVP{1}{\wOne}^2\gVP{2}{\wTwo}^2 + \nonumber \\
&\qquad 2 \cdot \left(\sum_{\substack{\wOne, \wTwo \in \pw \st\\ \wOne \neq \wTwo}}\gVP{1}{\wOne}\gVP{2}{\wOne}\gVP{1}{\wTwo}\gVP{2}{\wTwo}\right) \nonumber
&\qquad 2 \cdot \left(\sum_{\substack{\wOne, \wTwo \in \pw \st\\ \wOne \neq \wTwo}}\gVP{1}{\wOne}\gVP{2}{\wOne}\gVP{1}{\wTwo}\gVP{2}{\wTwo}\right). \nonumber
\end{align}
The second term expands analogously, leaving the product of the two expansions minus the expectation squared term.
The expectation and variance calculations for the remaining additional terms can be analogously found.
\startOld{Evaluating Estimate 2}
\newline For $\est{2}$, this would result in
\begin{align*}

34
est_bounds.tex Normal file
View File

@ -0,0 +1,34 @@
% -*- root: main.tex -*-
\section{Bounding the Estimates}
\newcommand{\bMu}{\epsilon\mu_{\sketchCols_{sum}}}
For a $\sketchCols$ estimate, denoted $\sketchCols_{est}$, and given the following:
\begin{align*}
&\bMu \text{ is the expectation for the sum of estimates.}\\
&X = \sum_{i = 1}^{\sketchRows}X_i \\
&X_i\text{ is i.i.d. r.v.} \in [0, 1], i \in \sketchRows \\
&X_i = \begin{cases}
0 &\sketchCols_{est} > \bMu\\
1 &\sketchCols_{est} \leq \bMu
\end{cases}\\
&p[X_i = 1] \geq \frac{2}{3}\\
&p[X_i = 0] \leq \frac{1}{3}\\
&\mu = \frac{2}{3}\sketchRows\\
&\epsilon = 0.5
\end{align*}
Because Chebyshev bounds tell us that the probability of a bad row estimate is $\leq \frac{1}{3}$, we set epsilon to the value that, when multiplied to $\mu$, outputs $\frac{1}{3}$. We then derive bounds for $\sketchRows$.
Note, because we are only concerned with the left side of the tail, we can use the generic Chernoff bounds for the left tail,
\begin{equation*}
Pr[|X - \mu| \leq (1 - \epsilon)\mu] \leq e^{-\frac{\epsilon^2}{2 + \epsilon}\mu}.
\end{equation*}
Solving for $\delta$,
\begin{align*}
\delta \geq e^{-\frac{(\frac{1}{3})^2}{2 + \frac{1}{3}}\frac{2}{3}\sketchRows}\\
\delta \geq e^{-\frac{63}{2}\sketchRows}\\
e^{\frac{63}{2}\sketchRows} \geq \frac{1}{\delta}\\
\sketchRows \geq \frac{63}{2}ln(\frac{1}{\delta})
\end{align*}

View File

@ -126,6 +126,7 @@
\input{notation}
\input{analysis}
\input{est_bounds}
\input{combining}
\input{instantiation}