Variance Computations for 4-way cases

2019-06-07 15:38:01 -04:00 · 2019-06-07 15:38:01 -04:00 · 06c5001235
parent cdea3f15dd
commit 06c5001235
4 changed files with 109 additions and 30 deletions
--- a/analysis.tex
+++ b/analysis.tex
@ -6,38 +6,88 @@ We begin the analysis by showing that with high probability an estimate is appro
 The first step is to show that the expectation of the estimate of a tuple t's membership across all worlds is $\numWorldsSum$.

 \begin{align}
-&\expect \big[\estimate\big]\\
-=&\expect \big[\estExpOne\big]\\
-=&\expect \big[\sum_{\substack{j \in [B],\\
+&\expect{\estimate}\\
+=&\expect{\estExpOne}\\
+=&\expect{\sum_{\substack{j \in [B],\\
 			 \wVec \in \pw~|~ \sketchHash{i}[\wVec] = j,\\
-			 \wVec[w']\in \pw~|~ \sketchHash{i}[\wVec[w']] = j} } v_t[\wVec] \cdot s_i[\wVec] \cdot s_i[\wVec[w']]\big]\\
-=&\expect \big[ \sum_{\substack{j \in [B],\\
+			 \wVec[w']\in \pw~|~ \sketchHash{i}[\wVec[w']] = j} } v_t[\wVec] \cdot s_i[\wVec] \cdot s_i[\wVec[w']]}\\
+=&\multLineExpect\big[\sum_{\substack{j \in [B],\\
 				\wVec~|~\sketchHashParam{\wVec}= j,\\
 				\wVecPrime~|~\sketchHashParam{\wVecPrime} = j,\\
-				\wVec = \wVecPrime}} \wIndParam{\wVec} \cdot \polarFunc{\wVec} \cdot \polarFunc{\wVecPrime} +  \nonumber \\
-&\phantom{{}\wIndParam{\wVec}}\sum_{\substack{j \in [B], \\
+				\wVec = \wVecPrime}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime} +  \nonumber \\
+&\phantom{{}\kMapParam{\wVec}}\sum_{\substack{j \in [B], \\
 				\wVec~|~\sketchHashParam{\wVec} = j,\\
-				\wVecPrime ~|~ \sketchHashParam{\wVecPrime} = j,\\ \wVec \neq \wVecPrime}} \wIndParam{\wVec} \cdot \polarFunc{\wVec} \cdot\polarFunc{\wVecPrime}\big]\textit{(by linearity of expectation)}\\
-=&\expect \big[ \sum_{\substack{j \in [B],\\
+				\wVecPrime ~|~ \sketchHashParam{\wVecPrime} = j,\\ \wVec \neq \wVecPrime}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot\sketchPolarParam{\wVecPrime}\big]\textit{(by linearity of expectation)}\\
+=&\expect{\sum_{\substack{j \in [B],\\
 				\wVec~|~\sketchHashParam{\wVec}= j,\\
 				\wVecPrime~|~\sketchHashParam{\wVecPrime} = j,\\
-				\wVec = \wVecPrime}} \wIndParam{\wVec} \cdot \polarFunc{\wVec} \cdot \polarFunc{\wVecPrime}\big] \nonumber \\
+				\wVec = \wVecPrime}} \kMapParam{\wVec} \cdot \sketchPolarParam{\wVec} \cdot \sketchPolarParam{\wVecPrime}} \nonumber \\
 &\phantom{{}\big[}\textit{(by uniform distribution in the second summation)}\\
-&=  \sum_{\substack{j \in [B],\\
-				\wVec~|~\sketchHashParam{\wVec}= j,\\}} \wIndParam{\wVec}
+=&  \sum_{\substack{j \in [B],\\
+				\wVec~|~\sketchHashParam{\wVec}= j,\\}} \kMapParam{\wVec}
 \end{align}

-For the next step, we show that the variance of an estimate is small.$$\var{\estimate}$$
+For the next step, we show that the variance of an estimate is small.$$\varParam{\estimate}$$

 \begin{align}
-&=\var{\estExpOne}\\
-&= \big(\estTwo\big)^2\\
-&=\sum_{\substack{
+&=\varParam{\estExpOne}\\
+&= \expect{\big(\estTwo\big)^2}\\
+&=\expect{\sum_{\substack{
 		\wVec_1, \wVec_2,\\
 		 \wVecPrime_1, \wVecPrime_2 \in \pw,\\
 		 \sketchHashParam{\wVec_1} = \sketchHashParam{\wVecPrime_1},\\
 		 \sketchHashParam{\wVec_2} = \sketchHashParam{\wVecPrime_2}
-		 }}\wIndParam{\wVec_1} \cdot \wIndParam{\wVec_2}\cdot\polarFunc{\wVec_1}\cdot\polarFunc{\wVec_2}\cdot\polarFunc{\wVecPrime_1}\cdot\polarFunc{\wVecPrime_2}
+		 }}\kMapParam{\wVec_1} \cdot \kMapParam{\wVec_2}\cdot\sketchPolarParam{\wVec_1}\cdot\sketchPolarParam{\wVec_2}\cdot\sketchPolarParam{\wVecPrime_1}\cdot\sketchPolarParam{\wVecPrime_2} }\label{eq:var-sum-w}
+\end{align}
+
+Note that four-wise independence is assumed across all four random variables of \eqref{eq:var-sum-w}.  Zooming in on the inner products of the $\sketchPolar$ functions,
+\begin{equation}
+\polarProdEq \label{eq:polar-product}
+\end{equation}
+note that all four random variables in \eqref{eq:polar-product} take their values from the same set of possible worlds $\pw$.  Thus, there are four possible patterns of distribution between the $\wVec$ variables, namely:
+\begin{align*}
+&\distPattern{1}:&\cOne\\
+&\distPattern{2}:&\cTwo \textit{*} \\
+&\distPattern{3}:&\cThree \textit{*} \\
+&\distPattern{4}:&\cFour \textit{*}\\
+&\distPattern{5}:&\cFive
+\end{align*}
+$$\text{ }^*\textit{(and all variants of the respective pattern)}$$
+
+We are interested in those particular cases whose expecation does not equal zero, since an expectation of zero will not add to the summation of \eqref{eq:var-sum-w}.  In expectation we have that
+\begin{align}
+&\expect{\sum_{\substack{\elems \\
+			\st \cOne}} \polarProdEq} = 1 \label{eq:polar-prod-all}\\
+&\expect{\sum_{\substack{\elems \\
+			\st \cTwo}} \polarProdEq} = 1 \label{eq:polar-prod-two-and-two}\\
+&\expect{\sum_{\substack{\elems \\
+			\st \cThree}} \polarProdEq} = 0 \nonumber \\
+&\expect{\sum_{\substack{\elems \\
+			\st \cFour}} \polarProdEq} = 0 \nonumber \\
+&\expect{\sum_{\substack{\elems \\
+			\st \cFive}} \polarProdEq} = 0 \nonumber 
+\end{align}
+
+Only equation \eqref{eq:polar-prod-all} (which maps to $\cOne$) and \eqref{eq:polar-prod-two-and-two} (mapping to $\cTwo$) affect the $\var$ computation. 
+
+Thus, when considering $\distPattern{1}$ the variance results in
+\begin{equation}
+\sum_{\wVec \in \pw} \kMapParam{\wVec}^2
+\end{equation} 
+
+For the distribution pattern $\cTwo$, we have three variants to consider.
+\begin{align*}
+&\vCase{1}:&\cTwo \\
+&\vCase{2}:&\cTwoV{\wOne}{\wTwo}{\wOneP}{\wTwoP}\\
+&\vCase{3}:&\cTwoV{\wOne}{\wTwoP}{\wOneP}{\wTwo}
+\end{align*}
+When considered separately, the variants have the following $\var$.
+\begin{align}
+\cTwo&=\sum_{\wOne \neq \wTwo}\kMapParam{\wOne} \cdot \kMapParam{\wTwo}\\
+\cTwoV{\wOne}{\wTwo}{\wOneP}{\wTwoP}&=\sum_{\substack{\wOne \neq \wOneP,\\
+											\wOne = \wTwo,\\
+											\sketchHashParam{\wOne} = \sketchHashParam{\wOneP}}} \big| \sketchHashParam{\wOne}\neq \sketchHashParam{\wOneP} \big|\cdot \kMapParam{\wOne}\cdot \kMapParam{\wTwo}\\
+\cTwoV{\wOne}{\wTwoP}{\wOneP}{\wTwo}&=\sum_{\wOne \neq \wTwo} \kMapParam{\wOne} \cdot \kMapParam{\wTwo}
 \end{align}


--- a/macros.tex
+++ b/macros.tex
@ -12,30 +12,58 @@
 \newcommand{\sketchHash}[1][i]{h_{#1}}
 \newcommand{\sketchHashParam}[1]{\sketchHash\paramBox{#1}}
 \newcommand{\sketchPolar}[1][i]{s_{#1}}
-\newcommand{\polarFunc}[1]{\sketchPolar\paramBox{#1}}
+\newcommand{\sketchPolarParam}[1]{\sketchPolar\paramBox{#1}}
 %
 %TIDB
 %
 \newcommand{\paramBox}[1]{\left[{#1}\right]}
+\newcommand{\bigParamBox}[1]{\big[{#1}\big]}
+\newcommand{\st}{~|~}
 \newcommand{\pw}{W}
 \newcommand{\numWorlds}{2^N}
 \newcommand{\numWorldsP}{\numWorlds \cdot p}
-\newcommand{\numWorldsSum}{\sum_{\wVec \in \pw}\wIndicator{t}[\wVec]}
+\newcommand{\numWorldsSum}{\sum_{\wVec \in \pw}\kMap{t}[\wVec]}
 \newcommand{\numTup}{N}
-%\newcommand{\wIndicator}{v_t}
-\newcommand{\wIndicator}[1]{v_{#1}}
-\newcommand{\wIndParam}[1]{\wIndicator{t}\paramBox{#1}}
+%\newcommand{\kMap}{v_t}
+\newcommand{\kMap}[1]{v_{#1}}
+\newcommand{\kMapParam}[1]{\kMap{t}\paramBox{#1}}
 \newcommand{\wVec}[1][w]{\textbf{#1}}
 \newcommand{\wVecPrime}{\wVec[w']}
+%%%%%%%%%%%%%%%%
+%maybe easier this way:
+%WVector Notation
+%%%%%%%%%%%%%%%%
+\newcommand{\w}{\wVec}
+\newcommand{\wOneP}{\wVecPrime_1}
+\newcommand{\wOne}{\wVec_1}
+\newcommand{\wTwoP}{\wVecPrime_2}
+\newcommand{\wTwo}{\wVec_2}
+%%%%%%%%%%%%%%%%
+%4-way cases
+%%%%%%%%%%%%%%%%
+\newcommand{\polarProdEq}{\sketchPolarParam{\wVec_1}\cdot\sketchPolarParam{\wVec_2}\cdot\sketchPolarParam{\wVecPrime_1}\cdot\sketchPolarParam{\wVecPrime_2}}
+\newcommand{\elems}{\wOne, \wOneP, \wTwo, \wTwoP \in \pw}
+\newcommand{\lab}[1]{\textit{#1}}
+\newcommand{\distPattern}[1]{\lab{Pattern}{\textit{ {#1}}}}
+\newcommand{\vCase}[1]{\lab{Variant }{#1}}
+\newcommand{\cOne}{\wOne = \wOneP = \wTwo = \wTwoP}
+\newcommand{\cTwo}{\wOne = \wOneP \neq \wTwo = \wTwoP}
+\newcommand{\cThree}{\wOne = \wOneP = \wTwo \neq \wTwoP}
+\newcommand{\cFour}{\wOne = \wOneP \neq \wTwo \neq \wTwoP}
+\newcommand{\cFive}{\wOne \neq \wOneP \neq \wTwo \neq \wTwoP}
+\newcommand{\cTwoV}[4]{{#1} =  {#2} \neq {#3} = {#4}}
+
 \newcommand{\relation}{R}
-\newcommand{\expect}{\mathop{\mathbb{E}}}
-\newcommand{\var}[1]{Var\big[{#1}\big]}
+\newcommand{\expect}[1]{\mathop{\mathbb{E}}\bigParamBox{#1}}
+\newcommand{\multLineExpect}{\mathop{\mathbb{E}}}
+\newcommand{\var}{Var}
+\newcommand{\varParam}[1]{Var\bigParamBox{#1}}
 \newcommand{\polarFuncSum}[1][]{\sum_{\substack{\wVecPrime ~|~ \\
 													\sketchHash\left[\wVecPrime\right] = j\\
-													{#1}}}\polarFunc{\wVecPrime}}
+													{#1}}}\sketchPolarParam{\wVecPrime}}
 \newcommand{\estimate}{\sum_{j \in \sketchCols} \sketchIj \cdot \polarFuncSum }
 \newcommand{\estExpOne}{\sum_{\substack{j \in \sketchCols, \\
-								\wVec \in \pw ~|~\sketchHash\left[\wVec\right] = j}} \wIndicator{t} \cdot\polarFunc{\wVec} \cdot \polarFuncSum}
+								\wVec \in \pw ~|~\sketchHash\left[\wVec\right] = j}} \kMap{t} \cdot\sketchPolarParam{\wVec} \cdot \polarFuncSum}
 \newcommand{\estTwo}{\sum_{\substack{j \in [B],\\
 			 \wVec \in \pw~|~ \sketchHash{[\wVec]} = j,\\
 			 \wVec[w']\in \pw~|~ \sketchHash{[\wVec[w']]} = j} } v_t[\wVec] \cdot s_i[\wVec] \cdot s_i[\wVec[w']]}
--- a/main.tex
+++ b/main.tex
@ -9,6 +9,7 @@
 \usepackage{amsthm}
 \usepackage{mathtools}
 \usepackage{etoolbox}
+\usepackage{xstring} %for conditionals in \newcommand

 \usepackage{stmaryrd}
 \usepackage[normalem]{ulem}
--- a/notation.tex
+++ b/notation.tex
@ -4,7 +4,7 @@

 The following notation is used to reason about the sketching of world membership for a given tuple.  We denote the set of all possible worlds as $\pw$.  A given sketch $\sketch$ can be viewed as an $\sketchRows \times \sketchCols$ matrix, i.e. a matrix with $\sketchRows$ rows and $\sketchCols$ columns.  Each row of $\sketch$ is an estimation of the of $\kDom$ frequency for the given tuple represented by $\sketch$ across all possible worlds.  

-To facilitate binning the $\kDom$ values for a given world $\wVec$, each row has two pairwise independent hash functions $\sketchHash{i}:\pw \to [B]$ and $\sketchPolar{i}:\pw \to \{-1,1\}$, where all functions are independent of one another.  Finally, the function $\wIndicator{t}$ defined as $\wIndicator{t} : \{0, 1\}^\numTup \rightarrow \kDom$ is used to determine the tuple's $\kDom$ annotation for a given world.
+To facilitate binning the $\kDom$ values for a given world $\wVec$, each row has two pairwise independent hash functions $\sketchHash{i}:\pw \to [B]$ and $\sketchPolar{i}:\pw \to \{-1,1\}$, where all functions are independent of one another.  Finally, the function $\kMap{t}$ defined as $\kMap{t} : \{0, 1\}^\numTup \rightarrow \kDom$ is used to determine the tuple's $\kDom$ annotation for a given world.

 \AR{I do not like this notation. I prefer vectors being typeset in bold, i.e. $\mathbf{w}$. $\wVec$ is good for writing on the board but it is more standard to bold vectors in linear algebra. Also the $\kDom$ values are not binned by $\sketchHash{i}$ but the actual $\wVec$s are.} 
 \AH{Done.}
@ -14,11 +14,11 @@ To facilitate binning the $\kDom$ values for a given world $\wVec$, each row has
 \AR{While in general I'm a fan of using English to define things, one of the exceptions if when you are defining a function. It would be better to explicit state that $\sketchHash{i}:W\to [B]$ and $\sketchPolar{i}:W\to \{-1,1\}$. Of course for these definitions you need to define $W$ upfront.}
 \AH{Done}

-When a world value $\wVec$'s $\kDom$ value is updated, it's $\kDom$ value is first retrieved via $\wIndicator{t}$ and then multiplied by the output of the $i^{th}$ row's polarity function $\sketchPolar{i}$.  The resulting computation is then added to the current value contained in the bin mapping.  Formally:
-$$\sketch[\sketchHash{i}(\wVec)] ~+=~ \sketchPolar{i}(\wVec) \times \wIndicator{t}(\wVec)$$
+When a world value $\wVec$'s $\kDom$ value is updated, it's $\kDom$ value is first retrieved via $\kMap{t}$ and then multiplied by the output of the $i^{th}$ row's polarity function $\sketchPolar{i}$.  The resulting computation is then added to the current value contained in the bin mapping.  Formally:
+$$\sketch[\sketchHash{i}(\wVec)] ~+=~ \sketchPolar{i}(\wVec) \times \kMap{t}(\wVec)$$

 When referring to Tuple Independent Databases (TIDB), a database $\relation$ contains $\numTup$ tuples, with $\numWorlds$ possible worlds $\pw$.  $\pw$ is denoted as $\{0, 1\}^\numTup$, where a specific world $\wVec$ is defined as $\wVec \in \{0, 1\}^\numTup$. 

-\AR{I'm fine $\wIndicator{t}$ defined as a function instead of a vector in $\kDom^W$ but I'm not sure if one would be easier than the other to write arguments. I guess we can re-consider this later as it is defined as a macro.}
+\AR{I'm fine $\kMap{t}$ defined as a function instead of a vector in $\kDom^W$ but I'm not sure if one would be easier than the other to write arguments. I guess we can re-consider this later as it is defined as a macro.}
 \AH{I too am unsure of which way would be best to go on this.  I think originally we had proposed to define $\wVec$ as a mapping to the tuple's $\kDom$ annotation.}