paper-BagRelationalPDBsAreHard/davidscheme.tex

%root: main.tex
\section{David's Scheme}
Here we define a sampling scheme for which to compare to our scheme.  Given two vectors, $\sone = \vecform{v}_1$ and $\stwo = \vecform{v}_2 \had \cdots \had \vecform{v}_\numTup$, we want to estimate $\sone \cdot \stwo$ using sampling. Recall that $\had$ is defined as the pointwise product of all vector elements.  Specifically, a vector $\wVec = \vecform{v}_1 \had \vecform{v}_2$ iff for all $i \in [\veclen], \vecform{\wElem}[i] = \vecform{v}_1[i] \cdot \vecform{v}_2[i]$.

It can be shown that with $O(\frac{1}{\epsilon^2})$ words of space, with constant probability we can achieve an additive error of $O(\epsilon \norm{\sone}_2 \hp(\vecform{v}_2,\ldots, \vecform{v}_\numTup))$.

\subsection{POS queries}
Further, define $\sone = \vecform{v}_{1} + \cdots + \vecform{v}_{\numTup}$ and $\stwo = \stwo_1 \had \cdots \had \stwo_{\prodsize}$, where for each $i \in [\prodsize]$, $\stwo_i$ is a summation of at most $\numTup$ vectors.  Since we are measuring the error of estimating $\prodsize$ products, set $\sone = \wVec[v]_1$ and $\stwo = \wVec[v]_2 \had \cdots \had \wVec[v]_\prodsize$.  We want to estimate the hadamard product $\sone \had \stwo$ through sampling.

\AH{This part I am not sure on...David speaks of taking a random sample from a stream, but in our situation, where does the stream originate?}
\AH{Also, why is it a fair comparison to only sample one POS term, and not from all POS terms?  Sorry if this is a trivial question, but I don't understand, that if we are comparing with sampling, why is it that we assume we have all of $\stwo$?  Shouldn't a fair comparison require that we sample from each of the $\prodsize$ terms?}