A few mathematical corrections.

2019-08-21 10:05:40 -04:00 · 2019-08-21 10:05:40 -04:00 · 41fd3f62a5
parent 689ced732f
commit 41fd3f62a5
2 changed files with 52 additions and 31 deletions
--- a/analysis.tex
+++ b/analysis.tex
@ -333,7 +333,7 @@ Note that when $\genV$ is positive, the bound is tight.
 In equation \eqref{eq:spaceOne} we have the multiplicative factor which in expectation turns out to be the number of worlds $|\pw|$ divided evenly across the number of buckets $\sketchCols$ minus the one tuple that $\wVecPrime$ cannot be.  This factor is multiplied to the sum of squares of each of the  world values.


-Equation \eqref{eq:spaceTwo} has each of the $|\pw|$ worlds times all the rest of the worlds appearing in the corresponding bucket.  The equation is first rearranged, by allowing the duplicating the $\wVec$ in the second summation and subsequently subtracting the product afterwards.  The product in the expectation yiellds two factors.  The first factor is simply the sum of vector values.  The latter is the same sum divided by bucket size.  Finally, we subtract the quantity that shouldn't be there, specifically when $\wVecPrime = \wVec$, which is the sum of squares within a bucket.
+Equation \eqref{eq:spaceTwo} has each of the $|\pw|$ worlds times all the rest of the worlds appearing in the corresponding bucket.  The equation is first rearranged, by allowing the duplicating of $\wVec$ in the second summation and subsequently subtracting the product afterwards.  The product in the expectation yields two factors.  The first factor is simply the sum of vector values.  The latter is the same sum divided by bucket size.  Finally, we subtract the quantity that shouldn't be there, specifically when $\wVecPrime = \wVec$, which is the sum of squares within a bucket.


 \eqref{eq:spaceOne} and \eqref{eq:spaceTwo} together form
@ -343,7 +343,7 @@ Equation \eqref{eq:spaceTwo} has each of the $|\pw|$ worlds times all the rest o
 & < \frac{\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_1^2}{\sketchCols} \label{eq:variance}
 \end{align}

-By \eqref{eq:variance} we have then
+By \eqref{eq:variance} we have
 \begin{align*}
 %\varSym &< 2^{2N}\big(\frac{2\prob}{\sketchCols}\big) \\
 \varSym &< \frac{\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_1^2}{\sketchCols} \\
@ -368,16 +368,25 @@ For the case when $\Delta = \mu\epsilon$, taking both Chebyshev bounds, setting

 \begin{align}
 \frac{\sigma^2}{\Delta^2} &= \frac{1}{3}\\
-\frac{\norm{\genV}_2^2 \cdot \left(|\pw|\right) + \norm{\genV}_1^2}{\sketchCols \norm{\genV}_1^2 \cdot \epsilon^2} &= \frac{1}{3}\\
-\frac{3\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_1^2}{\norm{\genV}_1^2 \cdot \epsilon^2} &= \sketchCols\label{eq:bucket-bounds-no-sub}
+\frac{\norm{\genV}_2^2 \cdot \left(|\pw|\right) + \norm{\genV}_1^2}{\sketchCols \norm{\genV}_1^2 \cdot \epsilon^2} &= \frac{1}{3}\label{eq:b-bnd-no-sub1}\\
+\frac{3\left(\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_1^2\right)}{\norm{\genV}_1^2 \cdot \epsilon^2} &= \sketchCols\label{eq:bucket-bounds-no-sub}
 \end{align}

+\begin{Justification}
+\hfill
+	\begin{itemize}
+		\item \eqref{eq:b-bnd-no-sub1} is the substitution of values for $\Delta^2$ and $\mu^2$.
+		\item \eqref{eq:bucket-bounds-no-sub} is derived by rearranging terms through mulitplying each side by $3\sketchCols$.
+	\end{itemize}
+\end{Justification}
 A brief digression is desirable for the purpose of simplifying the above bounds.  Recall the Cauchy Schwarts inequality which states:
 \[\sum_i a_i \cdot b_i \leq \norm{a}_2 \cdot \norm{b}_2.\]
 The L1 norm can be expanded to the following expression, 
-\[\norm{\genV}_1 = \sum_{\wVec \in \pw} 1 \cdot \genVParam{\wVec}.\]
+\begin{equation}
+\norm{\genV}_1 = \sum_{\wVec \in \pw} 1 \cdot \genVParam{\wVec}\label{eq:expandL1}.
+\end{equation}
 Notice that the constant term can be viewed as a vector of $1$'s with size $n$ (the size of $\genV$).  Calling this vector $x$ and taking the L2 norm gives
-\SR{Simplify further with L0 'norm', although that makes the simplification more difficult.}
+\SR{Tighten the bounds further with L0 'norm', although that makes the simplification more difficult.}
 \begin{align}
 \norm{x} &= \sqrt{1_1^2 + 1_2^2 + \cdots + 1_n^2}\nonumber\\
 &= \sqrt{n * 1} \nonumber\\
@ -392,47 +401,54 @@ which squared yields
 \begin{equation}
 \norm{\genV}_1^2 \leq |\pw| \cdot \norm{\genV}_2^2\label{eq:norm1-sq-cauchy}.
 \end{equation}
-Note that \eqref{eq:norm1-sq-cauchy} can be further tightened to
-\begin{equation}
-\norm{\genV}_1^2 \leq \norm{\genV}_0 \cdot \norm{\genV}_2^2
-\end{equation}
+Note that \eqref{eq:expandL1} can be further tightened by using a vector with ones appearing only in places where $\genV_i > 0$.  This tightens \eqref{eq:norm1-cauchy} and \eqref{eq:norm1-sq-cauchy} by replacing the $|\pw|$ factor with $\norm{\genV}_0$.
+%\begin{equation}
+%\norm{\genV}_1^2 \leq \norm{\genV}_0 \cdot \norm{\genV}_2^2
+%\end{equation}
+\AH{Did not use L0 here because it was easier to reduce terms with the $|\pw|$ factor.}
 Substituting the Cauchy Schwarts bounds into the Chebyshev calculations gives 
 \begin{align}
-&\sketchCols \leq \frac{3\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_2^2\left(|\pw|\right)}{\norm{\genV}_2\sqrt{|\pw|}\cdot \epsilon^2}\nonumber\\
-&\sketchCols \leq \frac{4\norm{\genV}_2^2\left(|\pw|\right)}{\norm{\genV}_2\sqrt{|\pw|}\cdot \epsilon^2}\nonumber\\
-&\sketchCols \leq \frac{4\norm{\genV}_2\sqrt{|\pw|}}{\epsilon^2}\label{eq:b-cauchy}
+&\sketchCols \leq \frac{3\left(\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_2^2\left(|\pw|\right)\right)}{\norm{\genV}_2\sqrt{|\pw|}\cdot \epsilon^2}\label{eq:cheb-cauch1}\\
+&\sketchCols \leq \frac{3\left(2\norm{\genV}_2^2\left(|\pw|\right)\right)}{\norm{\genV}_2\sqrt{|\pw|}\cdot \epsilon^2}\label{eq:cheb-cauch2}\\
+&\sketchCols \leq \frac{6\norm{\genV}_2\sqrt{|\pw|}}{\epsilon^2}\label{eq:b-cauchy}
 \end{align}
-\AH{Justify this.}
+
 \begin{Justification}
 \hfill
 	\begin{itemize}
-		\item stuff goes here.
+		\item \eqref{eq:cheb-cauch1} substitutes \eqref{eq:norm1-sq-cauchy} and \eqref{eq:norm1-cauchy} for the numerator and denominator terms respectively.
+		\item \eqref{eq:cheb-cauch2} combines common terms in the numerator.
+		\item \eqref{eq:b-cauchy} multiplies constant terms and cancels out common factors on the numerator and denominator.
 	\end{itemize}
 \end{Justification}
 To further tighten the bounds calculations above, we can bound the square of the L2 norm.
 \begin{align}
-\norm{\genV}_2^2 &= \sum_{i = 1}^{n}|\genV|^2 \\
-&\leq \sum_{i = 1}^{n}\left(max_{i}|\genV_i|\right)\left(\genV_i\right)\\
-&\leq \sum_{i = 1}^{n}\norm{\genV}_\infty |\genV_i|\\
-&\leq \norm{\genV}_\infty \sum_{i = 1}^{n}|\genV_i|\\
+\norm{\genV}_2^2 &= \sum_{i = 1}^{n}|\genV|^2 \label{eq:l2-bnd1} \\
+&\leq \sum_{i = 1}^{n}\left(max_{i}|\genV_i|\right)\left|\genV_i\right|\label{eq:l2-bnd2}\\
+&\leq \sum_{i = 1}^{n}\norm{\genV}_\infty |\genV_i|\label{eq:l2-bnd3}\\
+&\leq \norm{\genV}_\infty \sum_{i = 1}^{n}|\genV_i|\label{eq:l2-bnd4}\\
 &\leq \norm{\genV}_\infty \cdot \norm{\genV}_1 \label{eq:l2-bounds}
 \end{align}
-\AH{Justify this.}
+
 \begin{Justification}
 \hfill
 	\begin{itemize}
-		\item stuff goes here.
+		\item \eqref{eq:l2-bnd1} is the defintion of L2 norm squared.
+		\item \eqref{eq:l2-bnd2} is an upper bound of the L2 norm, and it is true because the max of a vector $\genV$ is always greater than or equal to all the other elements in $\genV$, which implies that unless the max value is in every element, this is a strict upper bound.
+		\item \eqref{eq:l2-bnd3} is given by a simple substitution of notation.
+		\item \eqref{eq:l2-bnd4} is obtained by the equivalence of pushing the summation inside the product.
+		\item \eqref{eq:l2-bounds} is the result of substituting the definition of L1 norm.
 	\end{itemize}
 \end{Justification}
 Going back to equation \eqref{eq:bucket-bounds-no-sub} and substituting in the above bounds obtains the following.
 \begin{align}
-\sketchCols &= \frac{3\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_1^2}{\norm{\genV}_1^2 \cdot \epsilon^2} \\
-&\leq \frac{3\norm{\genV}_\infty\norm{\genV}_1 \left(|\pw|\right) + \norm{\genV}_1^2}{\norm{\genV}_1^2\cdot \epsilon^2}\label{eq:sub-bounds1}\\
-&\leq \frac{3\norm{\genV}_\infty\sqrt{\norm{\genV}_0}\norm{\genV}_2\left(|\pw|\right) + \norm{\genV}_0\norm{\genV}_2^2}{\norm{\genV}_0\norm{\genV}_2^2 \cdot \epsilon^2}\label{eq:sub-bounds2}\\
-&\leq \frac{3\norm{\genV}_\infty \sqrt{\norm{\genV}_0} \sqrt{\norm{\genV}_\infty\norm{\genV}_1}\left(|\pw|\right) + \norm{\genV}_0\norm{\genV}_\infty\norm{\genV}_1}{\norm{\genV}_0\norm{\genV}_\infty\norm{\genV}_1\epsilon^2}\label{eq:sub-bounds3}\\
-&\leq \frac{\norm{\genV}_\infty \sqrt{\norm{\genV}_0\norm{\genV}_1}\left(3\sqrt{\norm{\genV}_\infty}\left(|\pw|\right) + \sqrt{\norm{\genV}_0\norm{\genV}_1}\right)}{\norm{\genV}_0\norm{\genV}_\infty\norm{\genV}_1\epsilon^2}\label{eq:sub-bounds4}\\
-&\leq \frac{3\sqrt{\norm{\genV}_\infty}\left(|\pw|\right) + \sqrt{\norm{\genV}_0\norm{\genV}_1}}{\sqrt{\norm{\genV}_0\norm{\genV}_1} \epsilon^2} \label{eq:sub-bounds5}\\
-&\leq \frac{3\sqrt{\norm{\genV}_\infty}\left(|\pw|\right)}{\sqrt{\norm{\genV}_0\norm{\genV}_1} \epsilon^2} + \frac{1}{\epsilon^2}\label{eq:sub-bounds-final}
+\sketchCols &= \frac{3\left(\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_1^2\right)}{\norm{\genV}_1^2 \cdot \epsilon^2} \\
+&\leq \frac{3\left(\norm{\genV}_\infty\norm{\genV}_1 \left(|\pw|\right) + \norm{\genV}_1^2\right)}{\norm{\genV}_1^2\cdot \epsilon^2}\label{eq:sub-bounds1}\\
+&\leq \frac{3\left(\norm{\genV}_\infty\sqrt{\norm{\genV}_0}\norm{\genV}_2\left(|\pw|\right) + \norm{\genV}_0\norm{\genV}_2^2\right)}{\norm{\genV}_0\norm{\genV}_2^2 \cdot \epsilon^2}\label{eq:sub-bounds2}\\
+&\leq \frac{3\left(\norm{\genV}_\infty \sqrt{\norm{\genV}_0} \sqrt{\norm{\genV}_\infty\norm{\genV}_1}\left(|\pw|\right) + \norm{\genV}_0\norm{\genV}_\infty\norm{\genV}_1\right)}{\norm{\genV}_0\norm{\genV}_\infty\norm{\genV}_1\epsilon^2}\label{eq:sub-bounds3}\\
+&\leq \frac{3\norm{\genV}_\infty \sqrt{\norm{\genV}_0\norm{\genV}_1}\left(\sqrt{\norm{\genV}_\infty}\left(|\pw|\right) + \sqrt{\norm{\genV}_0\norm{\genV}_1}\right)}{\norm{\genV}_0\norm{\genV}_\infty\norm{\genV}_1\epsilon^2}\label{eq:sub-bounds4}\\
+&\leq \frac{3\left(\sqrt{\norm{\genV}_\infty}\left(|\pw|\right) + \sqrt{\norm{\genV}_0\norm{\genV}_1}\right)}{\sqrt{\norm{\genV}_0\norm{\genV}_1} \epsilon^2} \label{eq:sub-bounds5}\\
+&\leq \frac{3\sqrt{\norm{\genV}_\infty}\left(|\pw|\right)}{\sqrt{\norm{\genV}_0\norm{\genV}_1} \epsilon^2} + \frac{3}{\epsilon^2}\label{eq:sub-bounds-final}
 \end{align}

 \begin{Justification}
--- a/combining.tex
+++ b/combining.tex
@ -6,9 +6,9 @@ When assuming that the variables are independent, as in the TIDB model, it is a
 \[
 \varParam{X + Y} = \varParam{X} + \varParam{Y}.
 \]
-It then immediately follows that adding $n$ base sketches results in the following variance:
+By \eqref{eq:sub-bounds-final} it immediately follows that adding $n$ base (base meaning a sketch that has not previously been added to another sketch) sketches results in the following variance:
 \[
-n \cdot 4\norm{\genV}_2 |\pw|^{1/2}.
+3n\left(\frac{\sqrt{\norm{\genV}_\infty}\left(|\pw|\right)}{\sqrt{\norm{\genV}_0\norm{\genV}_1} \epsilon^2} + \frac{1}{\epsilon^2}\right).
 \]

 \subsection{Multiplying Sketches}
@ -32,18 +32,23 @@ It is necessary then to calculate the expectation of the square of the sum of es
 \hfill
 	\begin{itemize}
 		\item Starting out with \eqref{eq:rand-sq} since we need to know the expectation of the square of the sum of estimates.
-		\item \eqref{eq:rand-sq-ex-push} pushes the expectation inside the summation by linearity of expectation.
+		\item \eqref{eq:rand-sq-ex-push} is the sum of weighted squares, or alternatively, pushes the expectation inside the summation by linearity of expectation.
 		\item \eqref{eq:rand-sq-equiv} substitutes the definition of a sketch bucket.
 		\item \eqref{eq:rand-sq-assoc} uses associativity to rearrange the operands of the sum.
 		\item \eqref{eq:rand-sq-reduce} reduces the second term of \eqref{eq:rand-sq-assoc} to $0$ by the property of uniform distribution of $\sketchPolar$.
 		\item \eqref{eq:rand-sq-final} is obtained by the fact that the expectation of $\genVParam{\wVec}$ is simply itself.
 	\end{itemize}
 \end{Justification}
+\begin{Assumption}
+\hfill
+	\begin{itemize}\item Uniform distribution of both $\sketchHash$ and $\sketchPolar$.\end{itemize}
+\end{Assumption}
 It then follows that the variance corresponding to the muliplication of two base sketches is
 \begin{align}
 &\sum_{\wVec \in \pw}\genV_1\paramBox{\wVec}^2\sum_{\wVec \in \pw}\genV_2\paramBox{\wVec}^2 - \left(\sum_{\wVec \in \pw} \genV_1\paramBox{\wVec}\right)^2\left(\sum_{\wVec \in \pw} \genV_2\paramBox{\wVec}\right)^2\\
 =&\norm{\genV_1}_2^2\cdot\norm{\genV_2}_2^2 - \norm{\genV_1}_1^2\cdot\norm{\genV_2}_1^2.
 \end{align}
+\AH{I don't think this equation makes sense.  Where am I missing it?}
 The subscript notation for $\genV$ is used to denote sketch identity.  Substituting upper bounds obtained for the L1 norm squared from \eqref{eq:norm1-sq-cauchy} results in 
 \[
 \norm{\genV_1}_2^2\cdot\norm{\genV_2}_2^2 - \left(|\pw|\right)\norm{\genV_1}_2^2 \cdot \left(|\pw|\right)\norm{\genV_2}_2^2.