diff --git a/analysis.tex b/analysis.tex index 6f60a27..01c00d8 100644 --- a/analysis.tex +++ b/analysis.tex @@ -394,7 +394,7 @@ Substituting the Cauchy Schwarts bounds into the Chebyshev calculations gives \begin{align} &\sketchCols \leq \frac{3\norm{\genV}_2^2\left(|\pw|\right) + \norm{\genV}_2^2\left(|\pw|\right)}{\norm{\genV}_2\sqrt{|\pw|}}\nonumber\\ &\sketchCols \leq \frac{4\norm{\genV}_2^2\left(|\pw|\right)}{\norm{\genV}_2\sqrt{|\pw|}}\nonumber\\ -&\sketchCols \leq 4\norm{\genV}_2\sqrt{|\pw|} +&\sketchCols \leq 4\norm{\genV}_2\sqrt{|\pw|}\label{eq:b-cauchy} \end{align} \AH{\textbf{BEGIN}: Old Bound calculations} \begin{align*} diff --git a/combining.tex b/combining.tex new file mode 100644 index 0000000..827efdc --- /dev/null +++ b/combining.tex @@ -0,0 +1,41 @@ +% -*- root: main.tex -*- +\section{Combining Sketches} +\label{sec:combining} +\subsection{Adding Sketches} +When assuming that the variables are independent, as in the TIDB model, it is a known result that +\[ +\varParam{X + Y} = \varParam{X} + \varParam{Y}. +\] +It then immediately follows that adding $n$ base sketches results in the following variance: +\[ +n \cdot 4\norm{\genV}_2 |\pw|^{1/2}. +\] + +\subsection{Multiplying Sketches} +For the case of multiplication it is a known result that +\[ +\varParam{X \cdot Y} = \expect{X^2}\expect{Y^2} - (\expect{X})^2 (\expect{Y})^2. +\] +Assuming discreet variables the expectation of the square of a random variable is simply the sum of its weighted squares. This yields +\begin{align} +&\expect{\left(\sum_{\wVec \in \pw}\sketchJParam{\sketchHashParam{\wVec}}\cdot \sketchPolarParam{\wVec}\right)^2}\\ +=& \sum_{\wVec \in \pw}\expect{\left(\sketchJParam{\sketchHashParam{\wVec}}\cdot\sketchPolarParam{\wVec}\right)^2}\\ +=& \sum_{\wVec \in \pw}\expect{\left(\sum_{\substack{\wVecPrime \in \pw \st \\ + \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}}} \genVParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}\right)^2}\\ +=& \sum_{\wVec \in \pw}\expect{\left(\genVParam{\wVec}^2\sketchPolarParam{\wVec}^2 + \sum_{\substack{\wVecPrime \in \pw \st \\ + \sketchHashParam{\wVecPrime} = \sketchHashParam{\wVec}, + \wVecPrime \neq \wVec}} \genVParam{\wVecPrime}\sketchPolarParam{\wVecPrime}\sketchPolarParam{\wVec}\right)^2}\\ +=& \sum_{\wVec \in \pw}\expect{\genVParam{\wVec}^2}\\ +=& \sum_{\wVec \in \pw}\genVParam{\wVec}^2. +\end{align} +\begin{Justification} +\hfill + \begin{itemize} + \item stuff goes here + \end{itemize} +\end{Justification} +It then follows that the muliplication of two base sketches results in +\begin{align} +&\sum_{\wVec \in \pw}\genV_1\paramBox{\wVec}^2\sum_{\wVec \in \pw}\genV_2\paramBox{\wVec}^2 - \left(\sum_{\wVec \in \pw} \genV_1\paramBox{\wVec}\right)^2\left(\sum_{\wVec \in \pw} \genV_2\paramBox{\wVec}\right)^2\\ +=&\norm{\genV_1}_2^2\norm{\genV_2}_2^2 - \norm{\genV_1}_1^2\norm{\genV_2}_1^2. +\end{align} \ No newline at end of file diff --git a/instantiation.tex b/instantiation.tex index a6cca53..bba1bde 100644 --- a/instantiation.tex +++ b/instantiation.tex @@ -4,11 +4,22 @@ \AH{This section has been started, but needs to be completed.} \subsection{TIDB} -Consider the case of a TIDB with $\numTup$ tuples, with $\prob = \frac{1}{2}$ for given tuple $t$. The vector $\genV$ can then be defined as +Consider the case of a TIDB with $\numTup$ tuples, with $\prob = \frac{1}{2}$ for given tuple $t$. Because TIDB has the property of set semantics, the vector $\genV$ can then be defined as a binary bit vector $\{0, 1\}^\numTup$, whose value represents a possible world, and, where each index represents a specific tuple $t$ id. Under these semantics, with $w_t$ representing the index mapped to a a tuple $t$'s identity, $\genV$ can alternatively be viewed as a function \begin{equation*} \genV = \begin{cases} 1, &w_t = 1\\ 0, &otherwise \end{cases} +\end{equation*} +where a value of $1$ indicates that the tuple is present in a given world, and $0$ denotes that the tuple is absent in the world represented by the binary bit string. + +In this representation, a few properties of $\genV$ immediately stand out. First, the length of $\genV$ is the same as the number of tuples in the TIDB, $|\genV| = \numTup$. This combined with the assumption of $\prob = \frac{1}{2}$ implies that the L1 norm of $\genV$ is $\frac{\numTup}{2}$ and that the L2 norm of $\genV$ squared is also the same value, +\begin{equation*} +|\genV| = \numTup \wedge \prob = \frac{1}{2} \Rightarrow \norm{\genV}_1 = \norm{\genV}_2^2. +\end{equation*} + +By \eqref{eq:b-cauchy} this yields a bucket size of +\begin{equation*} +\sketchCols \leq 4\sqrt{\frac{\numTup}{2}} \cdot 2^{(\numTup/2)}. \end{equation*} \ No newline at end of file diff --git a/main.tex b/main.tex index f07d451..f9da1d5 100644 --- a/main.tex +++ b/main.tex @@ -126,6 +126,7 @@ \input{notation} \input{analysis} +\input{combining} \input{instantiation} \input{hash_const} \input{exact}