Calculation and notational changes.

This commit is contained in:
Aaron Huber 2019-08-15 08:55:08 -04:00
parent 98bbc7b487
commit 8a67cc4eb6
2 changed files with 36 additions and 32 deletions

View file

@ -148,24 +148,26 @@ Note that four-wise independence is assumed across all four random variables of
\sketchPolarParam{\wOne}\cdot\sketchPolarParam{\wOneP}\cdot\sketchPolarParam{\wTwo}\cdot\sketchPolarParam{\wTwoP} \label{eq:polar-product}
\end{equation}
we see that %it can be seen that for $\wOne, \wOneP \in \pw$ and $\wTwo, \wTwoP \in \pw'$, all four random variables in \eqref{eq:polar-product} take their values from $\pw$, although we have iteration over two separate sets $\pw$.
there are five possible sets of $\wVec$ variable combinations, namely for $a, b, c, d \in \{1, 1', 2, 2'\} \st a \neq b \neq c \neq d$:
there are five possible sets of $\wVec$ variable combinations. The following sets all assume each $\wVec$ to be from the set $\pw$. For $a, b, c, d \in \{1, 1', 2, 2'\} \st a \neq b \neq c \neq d$:
\begin{align*}
&\distPattern{1}:&\forNElems{\cOne}\\
&\distPattern{2}:&\forElems{\cTwo}\\
&\distPattern{3}:&\forElems{\cThree}\\
&\distPattern{4}:&\forElems{\cFour}\\
&\distPattern{5}:&\forNElems{\cFive}
&\distPattern{1}:\forNElems{\cOne}\\
&\distPattern{2}:\forElems{\cTwo}\\
&\distPattern{3}:\forElems{\cThree}\\
&\distPattern{4}:\forElems{\cFour}\\
&\distPattern{5}:\forNElems{\cFive}
\end{align*}
\AR{I think the definitions above need more work and/or there needs to be a justification for why $S_1$ to $S_2$ partition all the possibilities.}
\AH{Maybe we could further discuss this today 8/7/19.}
With four random variables coming from sets containing the same elements, there exist five possibilities in how they relate to one another. This is true since they come from the same set or seperate, yet duplicate sets each containing the same members. Therefore, any $\wVec$ variable can be equal or alternatively not equal to its remaining counterparts. A simple enumeration in equalities (non-equalities) suffices to partition the set of all possible combinations. The variables could all be equal as we see in $\distPattern{1}$, or three of the variables could be equal, with the fourth different. Enumerating to having just two varibales sharing an equality generates two cases, because we have two variables left over, which themselves may either be equal or not equal. There is the case of $\distPattern{2}$ where a pair of variables could be the same with the remaining two equal to each other but not equal to the first two. $\distPattern{3}$ is the case when there are two variables the same, with the remaining variables not equal to any of the others. And finally, they could all be different as in $\distPattern{5}$.
The use of variable subscripts in the notation is necessary as different combinations of equal $\wVec$ variables produce different results in the variance computation, as we will see shortly.
Note that each $\wVec$ is the input of the same $\sketchPolar$ function, meaning, that equal worlds will produce the same output.
We are interested in those particular cases whose expectation does not equal zero, since an expectation of zero will not add to the summation of \eqref{eq:var-sum-w}. In expectation we have that
\begin{align}
\forAllW{\distPattern{1}}&\rightarrow\expect{%\sum_{\substack{\elems \\
\forAllNW{\distPattern{1}}&\rightarrow\expect{%\sum_{\substack{\elems \\
%\st \cOne}}
\polarProdEq} = 1 \label{eq:polar-prod-all}
\polarProdNEq} = 1 \label{eq:polar-prod-all}
\end{align}
since we have the same element of the image of $\sketchPolar$ being multiplied to itself an even number of times. Similarly,
\begin{align}
@ -185,9 +187,9 @@ because the same element of the image of $\sketchPolar$ is being multiplied to i
\polarProdEq} = 0 \nonumber
\end{align}
\begin{align}
\forAllW{\distPattern{5}}&\rightarrow\expect{%\sum_{\substack{\elems \\
\forAllNW{\distPattern{5}}&\rightarrow\expect{%\sum_{\substack{\elems \\
%\st \cFive}}
\polarProdEq} = 0. \nonumber
\polarProdNEq} = 0. \nonumber
\end{align}
@ -217,22 +219,22 @@ Considered separately, the subsets result in the following $\var$.
&\wOne = \wOneP \neq \wTwo =\wTwoP \rightarrow\nonumber\\
&\qquad = \sum_{\substack{\wOne, \wOneP, \wTwo, \wTwoP \in \pw \st \\
\wOne = \wOneP = \wVec \neq\\
\wTwo = \wTwoP = \wVecPrime}}\genVParam{\wVec}\genVParam{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVecPrime} \label{eq:variantOne}\nonumber\\
&\qquad = \sum_{\wVec, \wVecPrime \in \pw \st \wVec \neq \wVecPrime}\genVParam{\wVec}\genVParam{\wVecPrime}\\
\wTwo = \wTwoP = \wVecPrime}}\expect{\genVParam{\wVec}\genVParam{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVecPrime}} \label{eq:variantOne}\nonumber\\
&\qquad = \sum_{\wVec, \wVecPrime \in \pw \st \wVec \neq \wVecPrime}\expect{\genVParam{\wVec}\genVParam{\wVecPrime}}\\
&\wOne = \wTwo \neq \wOneP = \wTwoP \rightarrow\nonumber\\
&\qquad = \sum_{\substack{\wOne, \wOneP, \wTwo, \wTwoP \in \pw \st \\
\wOne = \wTwo = \wVec \neq\\
\wOneP = \wTwoP = \wVecPrime,\\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \genVParam{\wVec}\genVParam{\wVec}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\nonumber \\
&\qquad = \sum_{\wVec \in \pw}| \{\wVecPrime \st \wVecPrime \neq \wVec, \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \genVParam{\wVec}^2\label{eq:variantTwo} \\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \expect{\genVParam{\wVec}\genVParam{\wVec}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}}\nonumber \\
&\qquad = \sum_{\wVec \in \pw}\expect{| \{\wVecPrime \st \wVecPrime \neq \wVec, \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} | \cdot \genVParam{\wVec}^2}\label{eq:variantTwo} \\
&\wOne = \wTwoP \neq \wOneP =\wTwo \rightarrow \nonumber \\
&\qquad = \sum_{\substack{\wOne, \wOneP, \wTwo, \wTwoP \in \pw \st \\
\wOne = \wTwoP = \wVec \neq \\
\wOneP = \wTwo = \wVecPrime,\\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}} \genVParam{\wVec} \genVParam{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec} \nonumber \\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\expect{ \genVParam{\wVec} \genVParam{\wVecPrime}\sketchPolarParam{\wVec}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}} \nonumber \\
&\qquad = \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\
\wVec \neq \wVecPrime,\\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\genVParam{\wVec}\cdot\kMapParam{\wVecPrime}\label{eq:variantThree}
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\expect{\genVParam{\wVec}\cdot\kMapParam{\wVecPrime}}\label{eq:variantThree}
\end{align}
Note that for $\distPattern{22}$, we have the cardinality of a bucket as a multiplicative factor for each squared annotation. This is because of the constraint that $\wOne \neq \wOneP$ coupled with the additional constraint that $\sketchHashParam{\wOne} = \sketchHashParam{\wOneP}$. Since $\wOneP$ must belong to the same bucket as $\wOne$, yet not equal to $\wOne$, we have that each operand of the sum must be the annotation squared for each $\wOneP$ that belongs to the same bucket but is not equal to $\wOne$.
@ -268,7 +270,7 @@ Computing each term separately gives
\begin{align}
&\expect{\sum_{\wVec \in \pw}\big|~ \{\wVecPrime \st \wVecPrime \neq \wVec, \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} ~\big| \cdot \genVParam{\wVec}^2}\nonumber\\
&~=\sum_{\wVec \in \pw}\genVParam{\wVec}^2 \cdot \expect{\big|~ \{\wVecPrime \st \wVecPrime \neq \wVec, \sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}\} ~\big|}\nonumber\\%\numWorldsP
&~=\norm{\genV}^2_2\cdot \frac{|\pw|}{\sketchCols} - 1\label{eq:spaceOne}
&~=\norm{\genV}^2_2\cdot \left(\frac{|\pw|}{\sketchCols} - 1\right)\label{eq:spaceOne}
\end{align}
\begin{align}
&\expect{ \sum_{\substack{\wVec, \wVecPrime \in \pw \st \\
@ -280,27 +282,27 @@ Computing each term separately gives
%\numWorldsP \cdot \frac{\numWorldsP - 1}{\sketchCols}\label{eq:spaceTwo}.
&~= \expect{\sum_{\wVec \in \pw}\genVParam{\wVec} \cdot \big((\sum_{\substack{\wVecPrime \in\pw \st\\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}} }\genVParam{\wVecPrime } )- \genVParam{\wVec}\big)} \nonumber \\
&~=\expect{\left(\sum_{\wVec \in \pw}\genVParam{\wVec} \cdot \sum_{\substack{\wVecPrime \in \pw \st \\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\genVParam{\wVecPrime}\right) - \sum_{\wVec \in \pw}\genVParam{\wVec}^2}\nonumber\\
&~=\expect{\sum_{\wVec \in \pw}\genVParam{\wVec} \cdot \sum_{\substack{\wVecPrime \in \pw \st \\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\genVParam{\wVecPrime} - \sum_{\wVec \in \pw}\genVParam{\wVec}^2}\nonumber\\
&~=\expect{\sum_{\wVec \in \pw}\genVParam{\wVec} \cdot \sum_{\substack{\wVecPrime \in \pw \st \\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\genVParam{\wVecPrime}} - \expect{\sum{\wVec \in \pw}\genVParam{\wVec}^2}\nonumber \\
\sketchHashParam{\wVec} = \sketchHashParam{\wVecPrime}}}\genVParam{\wVecPrime}} - \expect{\sum_{\wVec \in \pw}\genVParam{\wVec}^2}\nonumber \\
&~\leq\norm{\genV}_1 \cdot \frac{\norm{\genV}_1}{\sketchCols} - \expect{\sum_{\wVec \in \pw}\genVParam{\wVec}^2}\nonumber \\
&~\leq\frac{\norm{\genV}_1^2 - \norm{\genV}_2^2}{\sketchCols} \label{eq:spaceTwo}.
%&\norm{\genV}\prob \cdot \frac{\norm{\genV}\prob - \frac{\norm{\genV}}{\numWorlds}}{\sketchCols}\label{eq:spaceTwo}.
\end{align}
\AH{Add some verbose justification and assumptions.}
%In both equations, the sum of $\genVParam{\wVec}$ over all $\wVec \in \pw$ is $\numWorldsP$ since as noted in equation \eqref{eq:mu} we are summing the number of worlds a tuple $t$ appears in, and for a TIPDB, that is exactly 2 to the power of the number of tuples in the TIPDB (due to the independence of tuples) times tuple $t$'s probability.
Note that when $\genV$ is positive, the bound is tight.
In equation \eqref{eq:spaceOne} we have the multiplicative factor which in expectation turns out to be the number of worlds $|\pw|$ divided evenly across the number of buckets $\sketchCols$ minus the one tuple that $\wVecPrime$ cannot be. This factor is multiplied to the sum of squares of each of the world values.
Equation \eqref{eq:spaceTwo} has each of the $|\pw|$ worlds times all the rest of the worlds appearing in the corresponding bucket. The equation is first rearranged, by allowing the duplicating the $\wVec$ in the second summation and subsequently subtracting the product afterwards. The product in the expectation yiellds two factors. The first factor is simply the sum of vector values. The latter is the same sum divided by bucket size. Finally, we subtract the quantity that shouldn't be there, specifically when $\wVecPrime = \wVec$, which is the sum of squares within a bucket.
\AH{This next bit needs to be redone.}
Equation \eqref{eq:spaceTwo} has each of the $\numWorldsP$ worlds times all the rest of the worlds that tuple $t$ appears in within that bucket. This factor is represented by $\frac{\numWorldsP - 1}{\sketchCols}$, i.e. we have a world in a given bucket $j$ in which tuple $t$ appears, being summed over each of its products with other worlds in which it is present in bucket $j$.
%\AR{Again, argue why the above claims are true.}
%\AH{All my arguing is plain English. Is there a better way to go about this?}
\eqref{eq:spaceOne} and \eqref{eq:spaceTwo} further reduce to
\eqref{eq:spaceOne} and \eqref{eq:spaceTwo} together form
\begin{equation}
%\frac{2^{2N}(\prob + \prob^2)}{\sketchCols} - \numWorlds(\frac{\prob}{\sketchCols} + \prob)\label{eq:variance}
\norm{\genV}^2_2\prob\left(\numWorlds - 1\right) + \norm{\genV}\left(\norm{\genV} - \frac{\sketchCols}{\numWorlds}\right)\label{eq:variance}
\norm{\genV}^2_2\left(\frac{|\pw|}{\sketchCols}- 1\right) + \frac{\norm{\genV}_1^2 - \norm{\genV}_2^2}{\sketchCols} \label{eq:variance}
\end{equation}
By \eqref{eq:variance} we have then
\begin{align*}
@ -311,7 +313,7 @@ By \eqref{eq:variance} we have then
%\sdRel& < \sqrt{\frac{2}{\sketchCols\prob}}.
\sdRel &< \frac{\sqrt{\norm{\genV}^2_2\prob(\numWorlds - 1) + (\norm{\genV}\prob)^2} }{\norm{\genV}\prob}
\end{align*}
Recall that $\sdRel = \frac{\sd}{\mu}$ where $\mu$ is defined as $\numWorldsP$ in \eqref{eq:mu} for TIDB and $\norm{\genV}\prob$ for general $\genV$ in \eqref{eq:gen-mu}.
Recall that $\sdRel = \frac{\sd}{\mu}$.% where $\mu$ is defined as $\numWorldsP$ in \eqref{eq:mu} for TIDB and $\norm{\genV}\prob$ for general $\genV$ in \eqref{eq:gen-mu}.
Since the sketch has multiple trials, a probability of exceeding error bound $\errB$ smaller than one half guarantees an estimate that is less than or equal to the error bound when taking the median of all trials. Expressing the error relative to $\mu$ in Chebyshev's Inequality yields
\begin{equation*}

View file

@ -70,15 +70,17 @@
%%%%%%%%%%%%%%%%
%4-way cases
%%%%%%%%%%%%%%%%
\newcommand{\polarProdNEq}{\sketchPolarParam{\wOne}\cdot\sketchPolarParam{\wOneP}\cdot\sketchPolarParam{\wTwo}\cdot\sketchPolarParam{\wTwoP}}%
\newcommand{\polarProdEq}{\sketchPolarParam{\wa}\cdot\sketchPolarParam{\wb}\cdot\sketchPolarParam{\wc}\cdot\sketchPolarParam{\wVecD}}%
\newcommand{\elems}{\wa, \wb, \wc, \wVecD}
\newcommand{\nElems}{\wOne, \wOneP, \wTwo, \wTwoP}
\newcommand{\forAllW}[1]{\forall \elems \in {#1}}
\newcommand{\forAllW}[1]{\forall (\elems) \in {#1}}
\newcommand{\forAllNW}[1]{\forall (\nElems) \in {#1}}
\newcommand{\lab}[1]{#1}
\newcommand{\distPattern}[1]{\lab{S_{#1}}}
\newcommand{\vCase}[1]{\lab{Variant }{#1}}
\newcommand{\forElems}[1]{\{\elems \in \pw \st {#1}\}}
\newcommand{\forNElems}[1]{\{\nElems \in \pw \st {#1}\}}
\newcommand{\forElems}[1]{\{~(\elems~)\st {#1}, \elems \in \pw\}}
\newcommand{\forNElems}[1]{\{~(\nElems~)\st {#1}, \nElems \in \pw\}}
\newcommand{\cOne}{\wOne= \wOneP = \wTwo = \wTwoP}
\newcommand{\cTwo}{\wa = \wb \neq \wc = \wVecD}
\newcommand{\cThree}{\wa = \wb = \wc \neq \wVecD}