paper-BagRelationalPDBsAreHard/notation.tex

32 lines
3.7 KiB
TeX

% -*- root: main.tex -*-
\section{Notation}
\label{sec:notation}
The following notation is used to reason about the sketching of world membership for a given tuple. We denote the set of all possible worlds as $\pw$. A given sketch $\sketch$ can be viewed as an $\sketchRows \times \sketchCols$ matrix, i.e. a matrix with $\sketchRows$ rows and $\sketchCols$ columns. Upon initialization each row of $\sketch$ is an estimation of the $\kDom$ frequency for a given tuple represented by $\sketch$ across all possible worlds. %\AR{Nitpick: the claim in the last sentence is only true at initialization. If you add/mult the vector (via aggregates) then the claim is no longer true.}
%\AH{I am not sure if I know you what mean that the claim is no longer true: do you mean that it is no longer true until we prove bounds for multiplication? We can add sketches with the same epsilom delta bounds, correct? OR do you mean that the tuple which the $\sketch$ represents is a different tuple than the one we started with (after performing add/mult operations on it.}
%\AR{In this section, the notations $\sketchHash{i}$ and $\sketchPolar{i}$ in this section are messed up.}
%\AH{Fixed.}
To facilitate binning the $\kDom$ values for a given world $\wVec$, each row has two pairwise independent hash functions $\sketchHash[i]:\pw \to [B]$ and $\sketchPolar[i]:\pw \to \{-1,1\}$, where all functions are independent of one another. Finally, the function $\kMap{t}$ defined as $\kMap{t} : \{0, 1\}^\numTup \rightarrow \kDom$ is used to determine the tuple's $\kDom$ annotation for a given world.
%\AR{I do not like this notation. I prefer vectors being typeset in bold, i.e. $\mathbf{w}$. $\wVec$ is good for writing on the board but it is more standard to bold vectors in linear algebra. Also the $\kDom$ values are not binned by $\sketchHash{i}$ but the actual $\wVec$s are.}
%\AH{Done.}
%for each $i, j \in \sketchRows \text{ s.t. } i \neq j, \sketchHash{i}$ is independent of $\sketchHash{j}$ and $\sketchPolar{i}$ is independent of $\sketchPolar{j}$. Thus each row can be viewed as an independent estimation.
%\AR{While in general I'm a fan of using English to define things, one of the exceptions if when you are defining a function. It would be better to explicit state that $\sketchHash{i}:W\to [B]$ and $\sketchPolar{i}:W\to \{-1,1\}$. Of course for these definitions you need to define $W$ upfront.}
%\AH{Done}
When a world $\wVec$'s $\kDom$ value is updated, it's $\kDom$ value is first retrieved via $\kMap{t}$ and then multiplied by the output of the $i^{th}$ row's polarity function $\sketchPolar$. The resulting computation is then added to the current value contained in the bin mapping. Formally:
$$\sketch[i][\sketchHashParam{\wVec}] ~+=~ \sketchPolarParam{\wVec} \times \kMapParam{\wVec}$$
%\AR{It would also be good to state what the value in $\sketch[i][j]$ is after the initialization with the function $v_t$ is done.}
%\AH{Done.}
After initialization is complete we have that
$$\sketch[i][j] = \sum_{\{\wVec \st \sketchHashParam{\wVec} = j\}}\kMapParam{\wVec} \sketchPolarParam{\wVec}.$$
When referring to Tuple Independent Databases (TIDB), a database $\relation$ contains $\numTup$ tuples, with $\numWorlds$ possible worlds $\pw$. $\pw$ is denoted as $\{0, 1\}^\numTup$, where a specific world $\wVec$ is defined as $\wVec \in \{0, 1\}^\numTup$.
\AR{I'm fine $\kMap{t}$ defined as a function instead of a vector in $\kDom^W$ but I'm not sure if one would be easier than the other to write arguments. I guess we can re-consider this later as it is defined as a macro.}
\AH{I too am unsure of which way would be best to go on this. I think originally we had proposed to define $\wVec$ as a mapping to the tuple's $\kDom$ annotation.}