Combining Sketches

This commit is contained in:
Aaron Huber 2019-08-19 11:01:36 -04:00
parent 10413cade1
commit bafa80d6c4
4 changed files with 55 additions and 2 deletions

View file

@ -394,7 +394,7 @@ Substituting the Cauchy Schwarts bounds into the Chebyshev calculations gives
\begin{align}
&\sketchCols \leq \frac{3\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_2^2\left(|\pw|\right)}{\norm{\genV}_2\sqrt{|\pw|}}\nonumber\\
&\sketchCols \leq \frac{4\norm{\genV}_2^2\left(|\pw|\right)}{\norm{\genV}_2\sqrt{|\pw|}}\nonumber\\
&\sketchCols \leq 4\norm{\genV}_2\sqrt{|\pw|}
&\sketchCols \leq 4\norm{\genV}_2\sqrt{|\pw|}\label{eq:b-cauchy}
\end{align}
\AH{\textbf{BEGIN}: Old Bound calculations}
\begin{align*}

41
combining.tex Normal file
View file

@ -0,0 +1,41 @@
% -*- root: main.tex -*-
\section{Combining Sketches}
\label{sec:combining}
\subsection{Adding Sketches}
When assuming that the variables are independent, as in the TIDB model, it is a known result that
\[
\varParam{X + Y} = \varParam{X} + \varParam{Y}.
\]
It then immediately follows that adding $n$ base sketches results in the following variance:
\[
n \cdot 4\norm{\genV}_2 |\pw|^{1/2}.
\]
\subsection{Multiplying Sketches}
For the case of multiplication it is a known result that
\[
\varParam{X \cdot Y} = \expect{X^2}\expect{Y^2} - (\expect{X})^2 (\expect{Y})^2.
\]
Assuming discreet variables the expectation of the square of a random variable is simply the sum of its weighted squares. This yields
\begin{align}
&\expect{\left(\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}\right)^2}\\
=& \sum_{\wVec \in \pw}\expect{\left(\sketchJParam{\sketchHashParam{\wVec}}\cdot\sketchPolarParam{\wVec}\right)^2}\\
=& \sum_{\wVec \in \pw}\expect{\left(\sum_{\substack{\wVecPrime \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}} \genVParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}\right)^2}\\
=& \sum_{\wVec \in \pw}\expect{\left(\genVParam{\wVec}^2\sketchPolarParam{\wVec}^2 + \sum_{\substack{\wVecPrime \in \pw \st \\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},
\wVecPrime \neq \wVec}} \genVParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}\right)^2}\\
=& \sum_{\wVec \in \pw}\expect{\genVParam{\wVec}^2}\\
=& \sum_{\wVec \in \pw}\genVParam{\wVec}^2.
\end{align}
\begin{Justification}
\hfill
\begin{itemize}
\item stuff goes here
\end{itemize}
\end{Justification}
It then follows that the muliplication of two base sketches results in
\begin{align}
&\sum_{\wVec \in \pw}\genV_1\paramBox{\wVec}^2\sum_{\wVec \in \pw}\genV_2\paramBox{\wVec}^2 - \left(\sum_{\wVec \in \pw} \genV_1\paramBox{\wVec}\right)^2\left(\sum_{\wVec \in \pw} \genV_2\paramBox{\wVec}\right)^2\\
=&\norm{\genV_1}_2^2\norm{\genV_2}_2^2 - \norm{\genV_1}_1^2\norm{\genV_2}_1^2.
\end{align}

View file

@ -4,11 +4,22 @@
\AH{This section has been started, but needs to be completed.}
\subsection{TIDB}
Consider the case of a TIDB with $\numTup$ tuples, with $\prob = \frac{1}{2}$ for given tuple $t$. The vector $\genV$ can then be defined as
Consider the case of a TIDB with $\numTup$ tuples, with $\prob = \frac{1}{2}$ for given tuple $t$. Because TIDB has the property of set semantics, the vector $\genV$ can then be defined as a binary bit vector $\{0, 1\}^\numTup$, whose value represents a possible world, and, where each index represents a specific tuple $t$ id. Under these semantics, with $w_t$ representing the index mapped to a a tuple $t$'s identity, $\genV$ can alternatively be viewed as a function
\begin{equation*}
\genV = \begin{cases}
1, &w_t = 1\\
0, &otherwise
\end{cases}
\end{equation*}
where a value of $1$ indicates that the tuple is present in a given world, and $0$ denotes that the tuple is absent in the world represented by the binary bit string.
In this representation, a few properties of $\genV$ immediately stand out. First, the length of $\genV$ is the same as the number of tuples in the TIDB, $|\genV| = \numTup$. This combined with the assumption of $\prob = \frac{1}{2}$ implies that the L1 norm of $\genV$ is $\frac{\numTup}{2}$ and that the L2 norm of $\genV$ squared is also the same value,
\begin{equation*}
|\genV| = \numTup \wedge \prob = \frac{1}{2} \Rightarrow \norm{\genV}_1 = \norm{\genV}_2^2.
\end{equation*}
By \eqref{eq:b-cauchy} this yields a bucket size of
\begin{equation*}
\sketchCols \leq 4\sqrt{\frac{\numTup}{2}} \cdot 2^{(\numTup/2)}.
\end{equation*}

View file

@ -126,6 +126,7 @@
\input{notation}
\input{analysis}
\input{combining}
\input{instantiation}
\input{hash_const}
\input{exact}