Combining Sketches
This commit is contained in:
parent
10413cade1
commit
bafa80d6c4
|
@ -394,7 +394,7 @@ Substituting the Cauchy Schwarts bounds into the Chebyshev calculations gives
|
|||
\begin{align}
|
||||
&\sketchCols \leq \frac{3\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_2^2\left(|\pw|\right)}{\norm{\genV}_2\sqrt{|\pw|}}\nonumber\\
|
||||
&\sketchCols \leq \frac{4\norm{\genV}_2^2\left(|\pw|\right)}{\norm{\genV}_2\sqrt{|\pw|}}\nonumber\\
|
||||
&\sketchCols \leq 4\norm{\genV}_2\sqrt{|\pw|}
|
||||
&\sketchCols \leq 4\norm{\genV}_2\sqrt{|\pw|}\label{eq:b-cauchy}
|
||||
\end{align}
|
||||
\AH{\textbf{BEGIN}: Old Bound calculations}
|
||||
\begin{align*}
|
||||
|
|
41
combining.tex
Normal file
41
combining.tex
Normal file
|
@ -0,0 +1,41 @@
|
|||
% -*- root: main.tex -*-
|
||||
\section{Combining Sketches}
|
||||
\label{sec:combining}
|
||||
\subsection{Adding Sketches}
|
||||
When assuming that the variables are independent, as in the TIDB model, it is a known result that
|
||||
\[
|
||||
\varParam{X + Y} = \varParam{X} + \varParam{Y}.
|
||||
\]
|
||||
It then immediately follows that adding $n$ base sketches results in the following variance:
|
||||
\[
|
||||
n \cdot 4\norm{\genV}_2 |\pw|^{1/2}.
|
||||
\]
|
||||
|
||||
\subsection{Multiplying Sketches}
|
||||
For the case of multiplication it is a known result that
|
||||
\[
|
||||
\varParam{X \cdot Y} = \expect{X^2}\expect{Y^2} - (\expect{X})^2 (\expect{Y})^2.
|
||||
\]
|
||||
Assuming discreet variables the expectation of the square of a random variable is simply the sum of its weighted squares. This yields
|
||||
\begin{align}
|
||||
&\expect{\left(\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}\right)^2}\\
|
||||
=& \sum_{\wVec \in \pw}\expect{\left(\sketchJParam{\sketchHashParam{\wVec}}\cdot\sketchPolarParam{\wVec}\right)^2}\\
|
||||
=& \sum_{\wVec \in \pw}\expect{\left(\sum_{\substack{\wVecPrime \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}} \genVParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}\right)^2}\\
|
||||
=& \sum_{\wVec \in \pw}\expect{\left(\genVParam{\wVec}^2\sketchPolarParam{\wVec}^2 + \sum_{\substack{\wVecPrime \in \pw \st \\
|
||||
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec},
|
||||
\wVecPrime \neq \wVec}} \genVParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}\right)^2}\\
|
||||
=& \sum_{\wVec \in \pw}\expect{\genVParam{\wVec}^2}\\
|
||||
=& \sum_{\wVec \in \pw}\genVParam{\wVec}^2.
|
||||
\end{align}
|
||||
\begin{Justification}
|
||||
\hfill
|
||||
\begin{itemize}
|
||||
\item stuff goes here
|
||||
\end{itemize}
|
||||
\end{Justification}
|
||||
It then follows that the muliplication of two base sketches results in
|
||||
\begin{align}
|
||||
&\sum_{\wVec \in \pw}\genV_1\paramBox{\wVec}^2\sum_{\wVec \in \pw}\genV_2\paramBox{\wVec}^2 - \left(\sum_{\wVec \in \pw} \genV_1\paramBox{\wVec}\right)^2\left(\sum_{\wVec \in \pw} \genV_2\paramBox{\wVec}\right)^2\\
|
||||
=&\norm{\genV_1}_2^2\norm{\genV_2}_2^2 - \norm{\genV_1}_1^2\norm{\genV_2}_1^2.
|
||||
\end{align}
|
|
@ -4,11 +4,22 @@
|
|||
|
||||
\AH{This section has been started, but needs to be completed.}
|
||||
\subsection{TIDB}
|
||||
Consider the case of a TIDB with $\numTup$ tuples, with $\prob = \frac{1}{2}$ for given tuple $t$. The vector $\genV$ can then be defined as
|
||||
Consider the case of a TIDB with $\numTup$ tuples, with $\prob = \frac{1}{2}$ for given tuple $t$. Because TIDB has the property of set semantics, the vector $\genV$ can then be defined as a binary bit vector $\{0, 1\}^\numTup$, whose value represents a possible world, and, where each index represents a specific tuple $t$ id. Under these semantics, with $w_t$ representing the index mapped to a a tuple $t$'s identity, $\genV$ can alternatively be viewed as a function
|
||||
|
||||
\begin{equation*}
|
||||
\genV = \begin{cases}
|
||||
1, &w_t = 1\\
|
||||
0, &otherwise
|
||||
\end{cases}
|
||||
\end{equation*}
|
||||
where a value of $1$ indicates that the tuple is present in a given world, and $0$ denotes that the tuple is absent in the world represented by the binary bit string.
|
||||
|
||||
In this representation, a few properties of $\genV$ immediately stand out. First, the length of $\genV$ is the same as the number of tuples in the TIDB, $|\genV| = \numTup$. This combined with the assumption of $\prob = \frac{1}{2}$ implies that the L1 norm of $\genV$ is $\frac{\numTup}{2}$ and that the L2 norm of $\genV$ squared is also the same value,
|
||||
\begin{equation*}
|
||||
|\genV| = \numTup \wedge \prob = \frac{1}{2} \Rightarrow \norm{\genV}_1 = \norm{\genV}_2^2.
|
||||
\end{equation*}
|
||||
|
||||
By \eqref{eq:b-cauchy} this yields a bucket size of
|
||||
\begin{equation*}
|
||||
\sketchCols \leq 4\sqrt{\frac{\numTup}{2}} \cdot 2^{(\numTup/2)}.
|
||||
\end{equation*}
|
Loading…
Reference in a new issue