2019-08-19 11:01:36 -04:00
% -*- root: main.tex -*-
\section { Combining Sketches}
\label { sec:combining}
\subsection { Adding Sketches}
When assuming that the variables are independent, as in the TIDB model, it is a known result that
\[
\varParam { X + Y} = \varParam { X} + \varParam { Y} .
\]
It then immediately follows that adding $ n $ base sketches results in the following variance:
\[
n \cdot 4\norm { \genV } _ 2 |\pw |^ { 1/2} .
\]
\subsection { Multiplying Sketches}
2019-08-20 08:57:34 -04:00
For the case of multiplication, when assumming independent variables, it is a known result that
2019-08-19 11:01:36 -04:00
\[
\varParam { X \cdot Y} = \expect { X^ 2} \expect { Y^ 2} - (\expect { X} )^ 2 (\expect { Y} )^ 2.
\]
2019-08-20 08:57:34 -04:00
It is necessary then to calculate the expectation of the square of the sum of estimates. Assuming discreet variables the expectation of the square of a random variable is simply the sum of its weighted squares. This yields
2019-08-19 11:01:36 -04:00
\begin { align}
2019-08-20 08:57:34 -04:00
& \expect { \left (\sum _ { \wVec \in \pw } \sketchJParam { \sketchHashParam { \wVec } } \cdot \sketchPolarParam { \wVec } \right )^ 2} \label { eq:rand-sq} \\
=& \sum _ { \wVec \in \pw } \expect { \left (\sketchJParam { \sketchHashParam { \wVec } } \cdot \sketchPolarParam { \wVec } \right )^ 2} \label { eq:rand-sq-ex-push} \\
2019-08-19 11:01:36 -04:00
=& \sum _ { \wVec \in \pw } \expect { \left (\sum _ { \substack { \wVecPrime \in \pw \st \\
2019-08-20 08:57:34 -04:00
\sketchHashParam { \wVecPrime } = \sketchHashParam { \wVec } } } \genVParam { \wVecPrime } \sketchPolarParam { \wVecPrime } \sketchPolarParam { \wVec } \right )^ 2} \label { eq:rand-sq-equiv} \\
2019-08-19 11:01:36 -04:00
=& \sum _ { \wVec \in \pw } \expect { \left (\genVParam { \wVec } ^ 2\sketchPolarParam { \wVec } ^ 2 + \sum _ { \substack { \wVecPrime \in \pw \st \\
\sketchHashParam { \wVecPrime } = \sketchHashParam { \wVec } ,
2019-08-20 08:57:34 -04:00
\wVecPrime \neq \wVec } } \genVParam { \wVecPrime } \sketchPolarParam { \wVecPrime } \sketchPolarParam { \wVec } \right )^ 2} \label { eq:rand-sq-assoc} \\
=& \sum _ { \wVec \in \pw } \expect { \genVParam { \wVec } ^ 2} \label { eq:rand-sq-reduce} \\
=& \sum _ { \wVec \in \pw } \genVParam { \wVec } ^ 2\label { eq:rand-sq-final} .
2019-08-19 11:01:36 -04:00
\end { align}
\begin { Justification}
\hfill
\begin { itemize}
2019-08-20 08:57:34 -04:00
\item Starting out with \eqref { eq:rand-sq} since we need to know the expectation of the square of the sum of estimates.
\item \eqref { eq:rand-sq-ex-push} pushes the expectation inside the summation by linearity of expectation.
\item \eqref { eq:rand-sq-equiv} substitutes the definition of a sketch bucket.
\item \eqref { eq:rand-sq-assoc} uses associativity to rearrange the operands of the sum.
\item \eqref { eq:rand-sq-reduce} reduces the second term of \eqref { eq:rand-sq-assoc} to $ 0 $ by the property of uniform distribution of $ \sketchPolar $ .
\item \eqref { eq:rand-sq-final} is obtained by the fact that the expectation of $ \genVParam { \wVec } $ is simply itself.
2019-08-19 11:01:36 -04:00
\end { itemize}
\end { Justification}
2019-08-20 08:57:34 -04:00
It then follows that the variance corresponding to the muliplication of two base sketches is
2019-08-19 11:01:36 -04:00
\begin { align}
& \sum _ { \wVec \in \pw } \genV _ 1\paramBox { \wVec } ^ 2\sum _ { \wVec \in \pw } \genV _ 2\paramBox { \wVec } ^ 2 - \left (\sum _ { \wVec \in \pw } \genV _ 1\paramBox { \wVec } \right )^ 2\left (\sum _ { \wVec \in \pw } \genV _ 2\paramBox { \wVec } \right )^ 2\\
2019-08-20 08:57:34 -04:00
=& \norm { \genV _ 1} _ 2^ 2\cdot \norm { \genV _ 2} _ 2^ 2 - \norm { \genV _ 1} _ 1^ 2\cdot \norm { \genV _ 2} _ 1^ 2.
\end { align}
The subscript notation for $ \genV $ is used to denote sketch identity. Substituting upper bounds obtained for the L1 norm squared from \eqref { eq:norm1-sq-cauchy} results in
\[
\norm { \genV _ 1} _ 2^ 2\cdot \norm { \genV _ 2} _ 2^ 2 - \left (|\pw |\right )\norm { \genV _ 1} _ 2^ 2 \cdot \left (|\pw |\right )\norm { \genV _ 2} _ 2^ 2.
\]