Chernoff Bound for M_sketch(delta)

2019-10-18 10:08:50 -04:00 · 2019-10-18 10:08:50 -04:00 · 9a0049cd0a
parent 1c5b930b9c
commit 9a0049cd0a
4 changed files with 54 additions and 5 deletions
--- a/analysis.tex
+++ b/analysis.tex
@ -482,6 +482,17 @@ Setting $\Delta = \epsilon\numWorlds$ gives
 \end{align*}

 Other cases for $\Delta$ can be solved similarly.
+
+
+
+Spacing...\newline
+you\newline
+can\newline
+get
+rid 
+of 
+this 
+later.
 \finOld


--- a/combining.tex
+++ b/combining.tex
@ -119,7 +119,7 @@ The case for an odd number of sketches can likewise be reduced to the even case
 \end{align*}
 We desire an expectation which yields the ground truth.  Thus we seek to find sketch products whose expectation computes to the extraneous terms above in order to cancel them out.

-One potential work around would be to store additional sketches with independent $\pol$ functions.  Assuming independent $\pol$ functions between the $\mathcal{S}_1, \mathcal{S}_2$ and $\mathcal{S}_3, \mathcal{S}_4$ pairs allows us to use linearity of expectations resulting in
+One potential work around would be to store additional sketches with independent $\pol$ functions.  For the first unwanted term, assuming independent $\pol$ functions between the $\mathcal{S}_1, \mathcal{S}_2$ and $\mathcal{S}_3, \mathcal{S}_4$ pairs allows us to use linearity of expectations resulting in

 \begin{align*}
 &\expect{\sum_{j \in \sketchCols}\sCom{1}{j}\sCom{2}{j}\sCom{3}{j}\sCom{4}{j}}\\
@ -132,9 +132,9 @@ which reduces by \eqref{eq:two-sk-prod} to
 \begin{equation*}
 \sum_{\wOne, \wTwo \in \pw}\gVP{1}{\wOne}\gVP{2}{\wOne}\cdot \sum_{\wThree, \wFour \in \pw}\gVP{3}{\wThree}\gVP{4}{\wFour}.
 \end{equation*}
-The remaining additional terms can be analogously found.  
-\newline
-To compute variance, the independence of $\pol$ functions can be exploited as follows:
+
+
+To compute variance of the above product, the independence of $\pol$ functions can be exploited as follows:
 \begin{align}
 &\varParam{\sum_{j \in \sketchCols}\sCom{1}{j}\sCom{2}{j}\sCom{3}{j}\sCom{4}{j}}\nonumber\\
 &= \varParam{\sum_{j \in \sketchCols}\sum_{\substack{\wOne, \wTwo,\\ \wThree, \wFour \in \pw \st\\\hashP{\wOne} =\hashP{\wTwo}\\=\hashP{\wThree} = \hashP{\wFour}}}\gVP{1}{\wOne}\polI{1}{\wOne}\gVP{2}{\wTwo}\polI{1}{\wTwo}\gVP{3}{\wThree}\polI{2}{\wThree}\gVP{4}{\wFour}\polI{2}{\wFour}}\nonumber \\ 
@ -149,8 +149,11 @@ Expanding the first term, we have
 \begin{align}
 &\sum_{j \in \sketchCols}\expect{\sum_{\substack{\wOne, \wTwo \in \pw \st \\ \hashP{\wOne} = \hashP{\wTwo}}}\gVP{1}{\wOne}\polI{1}{\wOne}\gVP{1}{\wOne'}\polI{1}{\wOne'}\gVP{2}{\wTwo}\polI{1}{\wTwo}\gVP{2}{\wTwo'}\polI{1}{\wTwo'}}\nonumber \\
 &= \sum_{\wOne \in \pw}\gVP{1}{\wOne}^2\gVP{2}{\wOne}^2 +\sum_{\substack{\wOne, \wTwo \in \pw \st\\ \wOne \neq \wTwo}}\gVP{1}{\wOne}^2\gVP{2}{\wTwo}^2 + \nonumber \\
-&\qquad  2 \cdot \left(\sum_{\substack{\wOne, \wTwo \in \pw \st\\ \wOne \neq \wTwo}}\gVP{1}{\wOne}\gVP{2}{\wOne}\gVP{1}{\wTwo}\gVP{2}{\wTwo}\right) \nonumber
+&\qquad  2 \cdot \left(\sum_{\substack{\wOne, \wTwo \in \pw \st\\ \wOne \neq \wTwo}}\gVP{1}{\wOne}\gVP{2}{\wOne}\gVP{1}{\wTwo}\gVP{2}{\wTwo}\right). \nonumber
 \end{align}
+The second term expands analogously, leaving the product of the two expansions minus the expectation squared term.
+
+The expectation and variance calculations  for the remaining additional terms can be analogously found.  
 \startOld{Evaluating Estimate 2}
 \newline For $\est{2}$, this would result in
 \begin{align*}
--- a/est_bounds.tex
+++ b/est_bounds.tex
@ -0,0 +1,34 @@
+% -*- root: main.tex -*-
+
+\section{Bounding the Estimates}
+
+\newcommand{\bMu}{\epsilon\mu_{\sketchCols_{sum}}}
+
+For a $\sketchCols$ estimate, denoted $\sketchCols_{est}$, and given the following:
+
+\begin{align*}
+&\bMu \text{ is the expectation for the sum of estimates.}\\
+&X = \sum_{i = 1}^{\sketchRows}X_i \\
+&X_i\text{ is i.i.d. r.v.} \in [0, 1], i \in \sketchRows \\
+&X_i = \begin{cases}
+	0	&\sketchCols_{est} > \bMu\\
+	1	&\sketchCols_{est} \leq \bMu
+	\end{cases}\\
+&p[X_i = 1] \geq \frac{2}{3}\\
+&p[X_i = 0] \leq \frac{1}{3}\\
+&\mu = \frac{2}{3}\sketchRows\\
+&\epsilon = 0.5 
+\end{align*}
+
+Because Chebyshev bounds tell us that the probability of a bad row estimate is $\leq \frac{1}{3}$, we set epsilon to the value that, when multiplied to $\mu$, outputs $\frac{1}{3}$.  We then derive bounds for $\sketchRows$.
+Note, because we are only concerned with the left side of the tail, we can use the generic Chernoff bounds for the left tail,
+\begin{equation*}
+Pr[|X - \mu| \leq (1 - \epsilon)\mu] \leq e^{-\frac{\epsilon^2}{2 + \epsilon}\mu}.
+\end{equation*}
+Solving for $\delta$,
+\begin{align*}
+\delta \geq e^{-\frac{(\frac{1}{3})^2}{2 + \frac{1}{3}}\frac{2}{3}\sketchRows}\\
+\delta \geq e^{-\frac{63}{2}\sketchRows}\\
+e^{\frac{63}{2}\sketchRows} \geq \frac{1}{\delta}\\
+\sketchRows \geq \frac{63}{2}ln(\frac{1}{\delta})
+\end{align*}
--- a/main.tex
+++ b/main.tex
@ -126,6 +126,7 @@

 \input{notation}
 \input{analysis}
+\input{est_bounds}
 \input{combining}

 \input{instantiation}