Variance Computations for 4-way cases

master
Aaron Huber 2019-06-07 15:38:01 -04:00
parent cdea3f15dd
commit 06c5001235
4 changed files with 109 additions and 30 deletions

View File

@ -6,38 +6,88 @@ We begin the analysis by showing that with high probability an estimate is appro
The first step is to show that the expectation of the estimate of a tuple t's membership across all worlds is $\numWorldsSum$.
\begin{align}
&\expect \big[\estimate\big]\\
=&\expect \big[\estExpOne\big]\\
=&\expect \big[\sum_{\substack{j \in [B],\\
&\expect{\estimate}\\
=&\expect{\estExpOne}\\
=&\expect{\sum_{\substack{j \in [B],\\
\wVec \in \pw~|~ \sketchHash{i}[\wVec] = j,\\
\wVec[w']\in \pw~|~ \sketchHash{i}[\wVec[w']] = j} } v_t[\wVec] \cdot s_i[\wVec] \cdot s_i[\wVec[w']]\big]\\
=&\expect \big[ \sum_{\substack{j \in [B],\\
\wVec[w']\in \pw~|~ \sketchHash{i}[\wVec[w']] = j} } v_t[\wVec] \cdot s_i[\wVec] \cdot s_i[\wVec[w']]}\\
=&\multLineExpect\big[\sum_{\substack{j \in [B],\\
\wVec~|~\sketchHashParam{\wVec}= j,\\
\wVecPrime~|~\sketchHashParam{\wVecPrime} = j,\\
\wVec = \wVecPrime}} \wIndParam{\wVec} \cdot \polarFunc{\wVec} \cdot \polarFunc{\wVecPrime} + \nonumber \\
&\phantom{{}\wIndParam{\wVec}}\sum_{\substack{j \in [B], \\
\wVec = \wVecPrime}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime} + \nonumber \\
&\phantom{{}\kMapParam{\wVec}}\sum_{\substack{j \in [B], \\
\wVec~|~\sketchHashParam{\wVec} = j,\\
\wVecPrime ~|~ \sketchHashParam{\wVecPrime} = j,\\ \wVec \neq \wVecPrime}} \wIndParam{\wVec} \cdot \polarFunc{\wVec} \cdot\polarFunc{\wVecPrime}\big]\textit{(by linearity of expectation)}\\
=&\expect \big[ \sum_{\substack{j \in [B],\\
\wVecPrime ~|~ \sketchHashParam{\wVecPrime} = j,\\ \wVec \neq \wVecPrime}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot\sketchPolarParam{\wVecPrime}\big]\textit{(by linearity of expectation)}\\
=&\expect{\sum_{\substack{j \in [B],\\
\wVec~|~\sketchHashParam{\wVec}= j,\\
\wVecPrime~|~\sketchHashParam{\wVecPrime} = j,\\
\wVec = \wVecPrime}} \wIndParam{\wVec} \cdot \polarFunc{\wVec} \cdot \polarFunc{\wVecPrime}\big] \nonumber \\
\wVec = \wVecPrime}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}} \nonumber \\
&\phantom{{}\big[}\textit{(by uniform distribution in the second summation)}\\
&= \sum_{\substack{j \in [B],\\
\wVec~|~\sketchHashParam{\wVec}= j,\\}} \wIndParam{\wVec}
=& \sum_{\substack{j \in [B],\\
\wVec~|~\sketchHashParam{\wVec}= j,\\}} \kMapParam{\wVec}
\end{align}
For the next step, we show that the variance of an estimate is small.$$\var{\estimate}$$
For the next step, we show that the variance of an estimate is small.$$\varParam{\estimate}$$
\begin{align}
&=\var{\estExpOne}\\
&= \big(\estTwo\big)^2\\
&=\sum_{\substack{
&=\varParam{\estExpOne}\\
&= \expect{\big(\estTwo\big)^2}\\
&=\expect{\sum_{\substack{
\wVec_1, \wVec_2,\\
\wVecPrime_1, \wVecPrime_2 \in \pw,\\
\sketchHashParam{\wVec_1} = \sketchHashParam{\wVecPrime_1},\\
\sketchHashParam{\wVec_2} = \sketchHashParam{\wVecPrime_2}
}}\wIndParam{\wVec_1} \cdot \wIndParam{\wVec_2}\cdot\polarFunc{\wVec_1}\cdot\polarFunc{\wVec_2}\cdot\polarFunc{\wVecPrime_1}\cdot\polarFunc{\wVecPrime_2}
}}\kMapParam{\wVec_1} \cdot \kMapParam{\wVec_2}\cdot\sketchPolarParam{\wVec_1}\cdot\sketchPolarParam{\wVec_2}\cdot\sketchPolarParam{\wVecPrime_1}\cdot\sketchPolarParam{\wVecPrime_2} }\label{eq:var-sum-w}
\end{align}
Note that four-wise independence is assumed across all four random variables of \eqref{eq:var-sum-w}. Zooming in on the inner products of the $\sketchPolar$ functions,
\begin{equation}
\polarProdEq \label{eq:polar-product}
\end{equation}
note that all four random variables in \eqref{eq:polar-product} take their values from the same set of possible worlds $\pw$. Thus, there are four possible patterns of distribution between the $\wVec$ variables, namely:
\begin{align*}
&\distPattern{1}:&\cOne\\
&\distPattern{2}:&\cTwo \textit{*} \\
&\distPattern{3}:&\cThree \textit{*} \\
&\distPattern{4}:&\cFour \textit{*}\\
&\distPattern{5}:&\cFive
\end{align*}
$$\text{ }^*\textit{(and all variants of the respective pattern)}$$
We are interested in those particular cases whose expecation does not equal zero, since an expectation of zero will not add to the summation of \eqref{eq:var-sum-w}. In expectation we have that
\begin{align}
&\expect{\sum_{\substack{\elems \\
\st \cOne}} \polarProdEq} = 1 \label{eq:polar-prod-all}\\
&\expect{\sum_{\substack{\elems \\
\st \cTwo}} \polarProdEq} = 1 \label{eq:polar-prod-two-and-two}\\
&\expect{\sum_{\substack{\elems \\
\st \cThree}} \polarProdEq} = 0 \nonumber \\
&\expect{\sum_{\substack{\elems \\
\st \cFour}} \polarProdEq} = 0 \nonumber \\
&\expect{\sum_{\substack{\elems \\
\st \cFive}} \polarProdEq} = 0 \nonumber
\end{align}
Only equation \eqref{eq:polar-prod-all} (which maps to $\cOne$) and \eqref{eq:polar-prod-two-and-two} (mapping to $\cTwo$) affect the $\var$ computation.
Thus, when considering $\distPattern{1}$ the variance results in
\begin{equation}
\sum_{\wVec \in \pw} \kMapParam{\wVec}^2
\end{equation}
For the distribution pattern $\cTwo$, we have three variants to consider.
\begin{align*}
&\vCase{1}:&\cTwo \\
&\vCase{2}:&\cTwoV{\wOne}{\wTwo}{\wOneP}{\wTwoP}\\
&\vCase{3}:&\cTwoV{\wOne}{\wTwoP}{\wOneP}{\wTwo}
\end{align*}
When considered separately, the variants have the following $\var$.
\begin{align}
\cTwo&=\sum_{\wOne \neq \wTwo}\kMapParam{\wOne} \cdot \kMapParam{\wTwo}\\
\cTwoV{\wOne}{\wTwo}{\wOneP}{\wTwoP}&=\sum_{\substack{\wOne \neq \wOneP,\\
\wOne = \wTwo,\\
\sketchHashParam{\wOne} = \sketchHashParam{\wOneP}}} \big| \sketchHashParam{\wOne}\neq \sketchHashParam{\wOneP} \big|\cdot \kMapParam{\wOne}\cdot \kMapParam{\wTwo}\\
\cTwoV{\wOne}{\wTwoP}{\wOneP}{\wTwo}&=\sum_{\wOne \neq \wTwo} \kMapParam{\wOne} \cdot \kMapParam{\wTwo}
\end{align}

View File

@ -12,30 +12,58 @@
\newcommand{\sketchHash}[1][i]{h_{#1}}
\newcommand{\sketchHashParam}[1]{\sketchHash\paramBox{#1}}
\newcommand{\sketchPolar}[1][i]{s_{#1}}
\newcommand{\polarFunc}[1]{\sketchPolar\paramBox{#1}}
\newcommand{\sketchPolarParam}[1]{\sketchPolar\paramBox{#1}}
%
%TIDB
%
\newcommand{\paramBox}[1]{\left[{#1}\right]}
\newcommand{\bigParamBox}[1]{\big[{#1}\big]}
\newcommand{\st}{~|~}
\newcommand{\pw}{W}
\newcommand{\numWorlds}{2^N}
\newcommand{\numWorldsP}{\numWorlds \cdot p}
\newcommand{\numWorldsSum}{\sum_{\wVec \in \pw}\wIndicator{t}[\wVec]}
\newcommand{\numWorldsSum}{\sum_{\wVec \in \pw}\kMap{t}[\wVec]}
\newcommand{\numTup}{N}
%\newcommand{\wIndicator}{v_t}
\newcommand{\wIndicator}[1]{v_{#1}}
\newcommand{\wIndParam}[1]{\wIndicator{t}\paramBox{#1}}
%\newcommand{\kMap}{v_t}
\newcommand{\kMap}[1]{v_{#1}}
\newcommand{\kMapParam}[1]{\kMap{t}\paramBox{#1}}
\newcommand{\wVec}[1][w]{\textbf{#1}}
\newcommand{\wVecPrime}{\wVec[w']}
%%%%%%%%%%%%%%%%
%maybe easier this way:
%WVector Notation
%%%%%%%%%%%%%%%%
\newcommand{\w}{\wVec}
\newcommand{\wOneP}{\wVecPrime_1}
\newcommand{\wOne}{\wVec_1}
\newcommand{\wTwoP}{\wVecPrime_2}
\newcommand{\wTwo}{\wVec_2}
%%%%%%%%%%%%%%%%
%4-way cases
%%%%%%%%%%%%%%%%
\newcommand{\polarProdEq}{\sketchPolarParam{\wVec_1}\cdot\sketchPolarParam{\wVec_2}\cdot\sketchPolarParam{\wVecPrime_1}\cdot\sketchPolarParam{\wVecPrime_2}}
\newcommand{\elems}{\wOne, \wOneP, \wTwo, \wTwoP \in \pw}
\newcommand{\lab}[1]{\textit{#1}}
\newcommand{\distPattern}[1]{\lab{Pattern}{\textit{ {#1}}}}
\newcommand{\vCase}[1]{\lab{Variant }{#1}}
\newcommand{\cOne}{\wOne = \wOneP = \wTwo = \wTwoP}
\newcommand{\cTwo}{\wOne = \wOneP \neq \wTwo = \wTwoP}
\newcommand{\cThree}{\wOne = \wOneP = \wTwo \neq \wTwoP}
\newcommand{\cFour}{\wOne = \wOneP \neq \wTwo \neq \wTwoP}
\newcommand{\cFive}{\wOne \neq \wOneP \neq \wTwo \neq \wTwoP}
\newcommand{\cTwoV}[4]{{#1} = {#2} \neq {#3} = {#4}}
\newcommand{\relation}{R}
\newcommand{\expect}{\mathop{\mathbb{E}}}
\newcommand{\var}[1]{Var\big[{#1}\big]}
\newcommand{\expect}[1]{\mathop{\mathbb{E}}\bigParamBox{#1}}
\newcommand{\multLineExpect}{\mathop{\mathbb{E}}}
\newcommand{\var}{Var}
\newcommand{\varParam}[1]{Var\bigParamBox{#1}}
\newcommand{\polarFuncSum}[1][]{\sum_{\substack{\wVecPrime ~|~ \\
\sketchHash\left[\wVecPrime\right] = j\\
{#1}}}\polarFunc{\wVecPrime}}
{#1}}}\sketchPolarParam{\wVecPrime}}
\newcommand{\estimate}{\sum_{j \in \sketchCols} \sketchIj \cdot \polarFuncSum }
\newcommand{\estExpOne}{\sum_{\substack{j \in \sketchCols, \\
\wVec \in \pw ~|~\sketchHash\left[\wVec\right] = j}} \wIndicator{t} \cdot\polarFunc{\wVec} \cdot \polarFuncSum}
\wVec \in \pw ~|~\sketchHash\left[\wVec\right] = j}} \kMap{t} \cdot\sketchPolarParam{\wVec} \cdot \polarFuncSum}
\newcommand{\estTwo}{\sum_{\substack{j \in [B],\\
\wVec \in \pw~|~ \sketchHash{[\wVec]} = j,\\
\wVec[w']\in \pw~|~ \sketchHash{[\wVec[w']]} = j} } v_t[\wVec] \cdot s_i[\wVec] \cdot s_i[\wVec[w']]}

View File

@ -9,6 +9,7 @@
\usepackage{amsthm}
\usepackage{mathtools}
\usepackage{etoolbox}
\usepackage{xstring} %for conditionals in \newcommand
\usepackage{stmaryrd}
\usepackage[normalem]{ulem}

View File

@ -4,7 +4,7 @@
The following notation is used to reason about the sketching of world membership for a given tuple. We denote the set of all possible worlds as $\pw$. A given sketch $\sketch$ can be viewed as an $\sketchRows \times \sketchCols$ matrix, i.e. a matrix with $\sketchRows$ rows and $\sketchCols$ columns. Each row of $\sketch$ is an estimation of the of $\kDom$ frequency for the given tuple represented by $\sketch$ across all possible worlds.
To facilitate binning the $\kDom$ values for a given world $\wVec$, each row has two pairwise independent hash functions $\sketchHash{i}:\pw \to [B]$ and $\sketchPolar{i}:\pw \to \{-1,1\}$, where all functions are independent of one another. Finally, the function $\wIndicator{t}$ defined as $\wIndicator{t} : \{0, 1\}^\numTup \rightarrow \kDom$ is used to determine the tuple's $\kDom$ annotation for a given world.
To facilitate binning the $\kDom$ values for a given world $\wVec$, each row has two pairwise independent hash functions $\sketchHash{i}:\pw \to [B]$ and $\sketchPolar{i}:\pw \to \{-1,1\}$, where all functions are independent of one another. Finally, the function $\kMap{t}$ defined as $\kMap{t} : \{0, 1\}^\numTup \rightarrow \kDom$ is used to determine the tuple's $\kDom$ annotation for a given world.
\AR{I do not like this notation. I prefer vectors being typeset in bold, i.e. $\mathbf{w}$. $\wVec$ is good for writing on the board but it is more standard to bold vectors in linear algebra. Also the $\kDom$ values are not binned by $\sketchHash{i}$ but the actual $\wVec$s are.}
\AH{Done.}
@ -14,11 +14,11 @@ To facilitate binning the $\kDom$ values for a given world $\wVec$, each row has
\AR{While in general I'm a fan of using English to define things, one of the exceptions if when you are defining a function. It would be better to explicit state that $\sketchHash{i}:W\to [B]$ and $\sketchPolar{i}:W\to \{-1,1\}$. Of course for these definitions you need to define $W$ upfront.}
\AH{Done}
When a world value $\wVec$'s $\kDom$ value is updated, it's $\kDom$ value is first retrieved via $\wIndicator{t}$ and then multiplied by the output of the $i^{th}$ row's polarity function $\sketchPolar{i}$. The resulting computation is then added to the current value contained in the bin mapping. Formally:
$$\sketch[\sketchHash{i}(\wVec)] ~+=~ \sketchPolar{i}(\wVec) \times \wIndicator{t}(\wVec)$$
When a world value $\wVec$'s $\kDom$ value is updated, it's $\kDom$ value is first retrieved via $\kMap{t}$ and then multiplied by the output of the $i^{th}$ row's polarity function $\sketchPolar{i}$. The resulting computation is then added to the current value contained in the bin mapping. Formally:
$$\sketch[\sketchHash{i}(\wVec)] ~+=~ \sketchPolar{i}(\wVec) \times \kMap{t}(\wVec)$$
When referring to Tuple Independent Databases (TIDB), a database $\relation$ contains $\numTup$ tuples, with $\numWorlds$ possible worlds $\pw$. $\pw$ is denoted as $\{0, 1\}^\numTup$, where a specific world $\wVec$ is defined as $\wVec \in \{0, 1\}^\numTup$.
\AR{I'm fine $\wIndicator{t}$ defined as a function instead of a vector in $\kDom^W$ but I'm not sure if one would be easier than the other to write arguments. I guess we can re-consider this later as it is defined as a macro.}
\AR{I'm fine $\kMap{t}$ defined as a function instead of a vector in $\kDom^W$ but I'm not sure if one would be easier than the other to write arguments. I guess we can re-consider this later as it is defined as a macro.}
\AH{I too am unsure of which way would be best to go on this. I think originally we had proposed to define $\wVec$ as a mapping to the tuple's $\kDom$ annotation.}