Expectation of multiplying sketches started

This commit is contained in:
Aaron Huber 2019-08-23 11:50:19 -04:00
parent 8cefb009aa
commit be6764acc0
3 changed files with 45 additions and 3 deletions

View file

@ -15,7 +15,45 @@ By \eqref{eq:sub-bounds-final} it immediately follows that adding $n$ base (base
\subsection{Multiplying Sketches}
There are various ways we can consider the multiplication of sketches. First, estimates might be multiplied, second, the sketches can be multiplied pointwise, taking then the estimate of the resultant sketch, and finally we consider an estimate simply as the multiplication of corresponding buckets. Stated formally the above is
\begin{align*}
estimate(1) =
&\est{1} = \sum_{\wVec \in \pw}\sCom{1}{\sketchHashParam{\wVec}}\sketchPolarParam{\wVec} \cdot \sCom{2}{\sketchHashParam{\wVec}}\sketchPolarParam{\wVec}\\
&\est{2} = \sum_{\wVec \in \pw }\left(\sCom{1}{\sketchHashParam{\wVec}} \cdot \sCom{2}{\sketchHashParam{\wVec}}\right)\sketchPolarParam{\wVec}\\
&\est{3} = \sum_{j \in \sketchCols}\sCom{1}{j} \cdot \sCom{2}{j}.
\end{align*}
Calculating the expectation for $\est{1}$ evaluates to
\begin{align*}
&\expect{\sum_{\wVec \in \pw}\sCom{1}{\sketchHashParam{\wVec}}\sketchPolarParam{\wVec} \cdot \sCom{2}{\sketchHashParam{\wVec}}\sketchPolarParam{\wVec}}\\
=& \expect{\sum_{\wVec \in \pw}\sketchPolarParam{\wVec}\sketchPolarParam{\wVec}\sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}} \genV_1\paramBox{\wVecPrime}\sketchPolarParam{\wVecPrime} \sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\genV_2\paramBox{\wVecPrime}\sketchPolarParam{\wVecPrime}}\\
=& \mathbb{E}\big[\sum_{\wVec \in \pw}\sketchPolarParam{\wVec}\sketchPolarParam{\wVec}\left(\sum_{\substack{\wVecPrime \in \pw \st\\
\wVecPrime \neq \wVec}} \genV_1\paramBox{\wVecPrime}\sketchPolarParam{\wVecPrime} + \genV_1\paramBox{\wVec}\sketchPolarParam{\wVec}\right)\\
& \qquad \left(\sum_{\substack{\wVecPrime \in \pw \st\\
\wVecPrime \neq \wVec}} \genV_2\paramBox{\wVecPrime}\sketchPolarParam{\wVecPrime} + \genV_2\paramBox{\wVec}\sketchPolarParam{\wVec}\right)\big]\\
=& \expect{\sum_{\wVec \in \pw}\sketchPolarParam{\wVec}\sketchPolarParam{\wVec}\genV_1\paramBox{\wVec}\sketchPolarParam{\wVec}\genV_2\paramBox{\wVec}\sketchPolarParam{\wVec}}\\
=& \genV_1\paramBox{\wVec}\genV_2\paramBox{\wVec}.
\end{align*}
This result is consistent for an arbitrary number of sketches in the product.
In expectation $\est{2}$ results in
\begin{align*}
&\expect{\sum_{\wVec \in \pw }\left(\sCom{1}{\sketchHashParam{\wVec}} \cdot \sCom{2}{\sketchHashParam{\wVec}}\right)\sketchPolarParam{\wVec}}\\
= &\expect{\sum_{\wVec \in \pw}\left(\sum_{\substack{\wVecPrime \in \pw \st\\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}} \genV_1\paramBox{\wVecPrime}\sketchPolarParam{\wVecPrime}\sum_{\substack{\wVecPrime \in \pw \st\\
\sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}}\genV_2\paramBox{\wVecPrime}\sketchPolarParam{\wVecPrime}\right)\sketchPolarParam{\wVec}}\\
= &\mathbb{E}\big[\sum_{\wVec \in \pw}\sketchPolarParam{\wVec}\left(\sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}\\\wVecPrime \neq \wVec}}\genV_1\paramBox{\wVecPrime}\sketchPolarParam{\wVecPrime} + \genV_1\paramBox{\wVec}\sketchPolarParam{\wVec}\right)\\
&\qquad\left(\sum_{\substack{\wVecPrime \in \pw \st\\ \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}\\\wVecPrime \neq \wVec}}\genV_2\paramBox{\wVecPrime}\sketchPolarParam{\wVecPrime} + \genV_2\paramBox{\wVec}\sketchPolarParam{\wVec}\right)\big]\\
= &\expect{\sum_{\wVec \in \pw}\sketchPolarParam{\wVec}\genV_1\paramBox{\wVec}\sketchPolarParam{\wVec}\genV_2\paramBox{\wVec}\sketchPolarParam{\wVec}}\\
= & 0.
\end{align*}
Note that with an odd number of sketches being multiplied, such as 3, we would get an expectation equal to the ground truth
\begin{align*}
= &\expect{\sum_{\wVec \in \pw}\sketchPolarParam{\wVec}\genV_1\paramBox{\wVec}\sketchPolarParam{\wVec}\genV_2\paramBox{\wVec}\sketchPolarParam{\wVec}\genV_3\paramBox{\wVec}\sketchPolarParam{\wVec}}\\
= &\genV_1\paramBox{\wVec}\genV_2\paramBox{\wVec}\genV_3\paramBox{\wVec}.
\end{align*}
For $\est{3}$, multiplying an even number of sketches yields
\begin{align*}
&\expect{\sum_{j \in \sketchCols}\sCom{1}{j} \cdot \sCom{2}{j}}\\
=&\expect{\sum_{j \in \sketchCols}\left(\sum_{\substack{\wVec \in \pw \st\\\sketchHashParam{\wVec} = j}}\gVP{1}{\wVec}\sketchPolarParam{\wVec}\cdot \sum_{\substack{\wVecPrime \in \pw \st\\\sketchHashParam{\wVecPrime} = j}}\gVP{2}{\wVecPrime}\sketchPolarParam{\wVecPrime}\right)}\\
=&\expect{\sum_{j \in \sketchCols}\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\\sketchHashParam{\wVec} = j\\\wVec = \wVecPrime}}\gVP{1}{\wVec}\gVP{2}{\wVec}\sketchPolarParam{\wVec}\sketchPolarParam{\wVec}\sum_{\substack{\wVec, \wVecPrime \in \pw \st\\\sketchHashParam{\wVec} = j\\\wVec \neq \wVecPrime}}\gVP{1}{\wVec}\gVP{2}{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}}\\
=&\expect{\sum_{\wVec \in \pw}\gVP{1}{\wVec}\gVP{2}{\wVec}}\\
=&\gVP{1}{\wVec}\gVP{2}{\wVec}
\end{align*}
For the case of multiplication, when assumming independent variables, it is a known result that
\[
@ -33,6 +71,7 @@ It is necessary then to calculate the expectation of the square of the sum of es
=& \sum_{\wVec \in \pw}\expect{\genVParam{\wVec}^2}\label{eq:rand-sq-reduce}\\
=& \sum_{\wVec \in \pw}\genVParam{\wVec}^2\label{eq:rand-sq-final}.
\end{align}
\begin{Justification}
\hfill
\begin{itemize}

View file

@ -5,9 +5,11 @@
%
%SKETCH
%
\newcommand{\est}[1]{est\left({#1}\right)}
\newcommand{\sketch}{\mathcal{S}_t}
\newcommand{\sketchIj}{\sketch[i][j]}
\newcommand{\sketchJParam}[1]{\sketch\paramBox{i}\paramBox{#1}}
\newcommand{\sCom}[2]{\mathcal{S}_{#1}\paramBox{i}\paramBox{#2}}
\newcommand{\sketchCols}{B}
\newcommand{\sketchRows}{M}
\newcommand{\sketchHash}[1][i]{h_{#1}}
@ -48,6 +50,7 @@
\newcommand{\kMap}[1]{v_{#1}}
\newcommand{\kMapParam}[1]{\kMap{t}\paramBox{#1}}
\newcommand{\genV}{v}
\newcommand{\gVP}[2]{\genV_{#1}\paramBox{#2}}
\newcommand{\genVParam}[1]{\genV\paramBox{#1}}
\newcommand{\genKMap}[1]{v\paramBox{#1}}
\newcommand{\gVt}[1]{\textcolor{blue}{#1}}

View file

@ -7,10 +7,10 @@ The following notation is used to reason about the sketching of world membership
To facilitate binning the $\kDom$ values for a given world $\wVec$, each of the $\sketchRows$ rows has two pairwise independent hash functions $\sketchHash[i]:\pw \to [B]$ and $\sketchPolar[i]:\pw \to \{-1,1\}$, where all functions are independent of one another. Finally, $\genV \in \pwK$ is simply a vector whose values are from the set $\kDom$, each of which denote the annotation of the tuple $t$ in its corresponding world.%defined as $\kMap{t} : \{0, 1\}^\numTup \rightarrow \kDom$ is used to determine the tuple's $\kDom$ annotation for a given world.
When a world $\wVec$'s $\kDom$ value is updated, it's $\kDom$ value is first retrieved via $\kMap{t}$ and then multiplied by the output of the $i^{th}$ row's polarity function $\sketchPolar$. The resulting computation is then added to the current value contained in the bin mapping. Formally:
$$\sketch[i][\sketchHashParam{\wVec}] ~+=~ \sketchPolarParam{\wVec} \times \kMapParam{\wVec}$$
$$\sketchJParam{\sketchHashParam{\wVec}} ~+=~ \sketchPolarParam{\wVec} \times \kMapParam{\wVec}$$
After initialization is complete we have that
$$\sketch[i][j] = \sum_{\{\wVec \st \sketchHashParam{\wVec} = j\}}\genVParam{\wVec} \sketchPolarParam{\wVec}.$$
$$\sketchIj = \sum_{\{\wVec \st \sketchHashParam{\wVec} = j\}}\genVParam{\wVec} \sketchPolarParam{\wVec}.$$
When referring to Tuple Independent Databases (TIDB), a database $\relation$ contains $\numTup$ tuples, with $\numWorlds$ possible worlds $\pw$. $\pw$ is denoted as $\{0, 1\}^\numTup$, where a specific world $\wVec$ is defined as $\wVec \in \{0, 1\}^\numTup$.