diff --git a/approx_alg.tex b/approx_alg.tex index 8740d20..cdb8566 100644 --- a/approx_alg.tex +++ b/approx_alg.tex @@ -5,7 +5,7 @@ In~\Cref{sec:hard}, we showed that computing the expected multiplicity of a compressed representation of a bag polynomial for \ti (even just based on project-join queries) is unlikely to be possible in linear time (\Cref{thm:mult-p-hard-result}), even if all tuples have the same probability (\Cref{th:single-p-hard}). Given this, we now design an approximation algorithm for our problem that runs in {\em linear time}. -Unlike the results in~\Cref{sec:hard} our approximation algorithm works for \bi, though our bounds are more meaningful for a non-trivial subclass of \bis that contains both \tis, as well as the PDBench benchmark. +Unlike the results in~\Cref{sec:hard} our approximation algorithm works for \bi, though our bounds are more meaningful for a non-trivial subclass of \bis that contains both \tis, as well as the PDBench benchmark~\cite{pdbench}. %it is then desirable to have an algorithm to approximate the multiplicity in linear time, which is what we describe next. \subsection{Preliminaries and some more notation} @@ -132,10 +132,10 @@ For any expression tree $\etree$, the corresponding {\em positive tree}, denoted $\abs{\etree}$ obtained from $\etree$ as follows. For each leaf node $\ell$ of $\etree$ where $\ell.\type$ is $\tnum$, update $\ell.\vari{value}$ to $|\ell.\vari{value}|$. %value $\coef$ of each coefficient leaf node in $\etree$ is set to %$\coef_i$ in $\etree$ is exchanged with its absolute value$|\coef|$. \end{Definition} -Using the same factorization from ~\Cref{example:expr-tree-T}, $poly(\abs{\etree}) = (X + 2Y)(2X + Y) = 2X^2 +XY +4XY + 2Y^2 = 2X^2 + 5XY + 2Y^2$. Note that this \textit{is not} the same as the polynomial from~\Cref{eq:poly-eg}. +Using the same factorization from ~\Cref{example:expr-tree-T}, $\polyf(\abs{\etree}) = (X + 2Y)(2X + Y) = 2X^2 +XY +4XY + 2Y^2 = 2X^2 + 5XY + 2Y^2$. Note that this \textit{is not} the same as the polynomial from~\Cref{eq:poly-eg}. \begin{Definition}[Evaluation]\label{def:exp-poly-eval} -Given an expression tree $\etree$ and $\vct{v} \in \mathbb{R}^\numvar$, we define the evaluation of $\etree$ on $\vct{v}$ as $\etree(\vct{v}) = poly(\etree)(\vct{v})$. +Given an expression tree $\etree$ and $\vct{v} \in \mathbb{R}^\numvar$, we define the evaluation of $\etree$ on $\vct{v}$ as $\etree(\vct{v}) = \polyf(\etree)(\vct{v})$. \end{Definition} \subsection{Our main result} @@ -144,9 +144,9 @@ Given an expression tree $\etree$ and $\vct{v} \in \mathbb{R}^\numvar$, we defin In the subsequent subsections we will prove the following theorem. \begin{Theorem}\label{lem:approx-alg} -Let $\etree$ be an expression tree for a UCQ over \bi and define $\poly(\vct{X})=\polyf(\etree)$ and let $k=\degree(\poly)$ +Let $\etree$ be an expression tree for a UCQ over \bi and define $\poly(\vct{X})=\polyf(\etree)$ and let $k=\degree(\poly)$. %Let $\poly(\vct{X})$ be a query polynomial corresponding to the output of a UCQ in a \bi. -An estimate $\mathcal{E}$ %=\approxq(\etree, (p_1,\dots,p_\numvar), \conf, \error')$ +Then an estimate $\mathcal{E}$ %=\approxq(\etree, (p_1,\dots,p_\numvar), \conf, \error')$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ can be computed in time \[O\left(\treesize(\etree) + \frac{\log{\frac{1}{\conf}}\cdot \abs{\etree}^2(1,\ldots, 1)\cdot k\cdot \log{k} \cdot depth(\etree))}{\inparen{\error'}^2\cdot\rpoly^2(\prob_1,\ldots, \prob_\numvar)}\right)\] such that @@ -173,7 +173,7 @@ We next present couple of corollaries of~\Cref{lem:approx-alg}. \label{cor:approx-algo-const-p} Let $\poly(\vct{X})$ be as in~\Cref{lem:approx-alg} and let $\gamma=\gamma(\etree)$. Further let it be the case that $p_i\ge p_0$ for all $i\in[\numvar]$. Then an estimate $\mathcal{E}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ satisfying~\Cref{eq:approx-algo-bound} can be computed in time \[O\left(\treesize(\etree) + \frac{\log{\frac{1}{\conf}}\cdot k\cdot \log{k} \cdot depth(\etree))}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot p_0^{2k}}\right)\] -In particular, if $p_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\frac 1{\eps^2}\cdot\treesize(\etree)\cdot \log{\frac{1}{\conf}}\right)$. +In particular, if $p_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\treesize(\etree)\cdot \log{\frac{1}{\conf}}\right)$. \end{Corollary} The proof for~\Cref{cor:approx-algo-const-p} can be seen in~\Cref{sec:proofs-approx-alg}. @@ -188,7 +188,7 @@ Thus, we expect the corrolary to hold in general. \subsection{Approximating $\rpoly$} -The algorithm to prove~\Cref{lem:approx-alg} follows from the following observation. Given a query polynomial $\poly(\vct{X})=poly(\etree)$ for expression tree $\etree$ over $\bi$, we can exactly represent $\rpoly(\vct{X})$ as follows: +The algorithm to prove~\Cref{lem:approx-alg} follows from the following observation. Given a query polynomial $\poly(\vct{X})=\polyf(\etree)$ for expression tree $\etree$ over $\bi$, we can exactly represent $\rpoly(\vct{X})$ as follows: \begin{equation} \label{eq:tilde-Q-bi} \rpoly\inparen{X_1,\dots,X_\numvar}=\hspace*{-1mm}\sum_{(v,c)\in \expandtree{\etree}} \hspace*{-2mm} \indicator{\monom\mod{\mathcal{B}}\not\equiv 0}\cdot c\cdot\hspace*{-2mm}\prod_{X_i\in \var\inparen{v}}\hspace*{-2mm} X_i @@ -335,7 +335,7 @@ The number of samples is computed by (see \Cref{app:subsec-th-mon-samp}): \subsubsection{Correctness} In order to prove~\Cref{lem:approx-alg}, we will need to argue the correctness of~\Cref{alg:mon-sam}. Before we formally do that, -we first state the lemmas that summarize the relevant properties of $\onepass$ and $\sampmon$, the auxiliary algorithms on which ~\Cref{alg:mon-sam} relies. Their proofs are given in~\Cref{sec:onepass} and~\Cref{sec:samplemonomial} respectively. +we first state the lemmas that summarize the relevant properties of $\onepass$ and $\sampmon$, the auxiliary algorithms on which ~\Cref{alg:mon-sam} relies. %Their proofs are given in~\Cref{sec:onepass} and~\Cref{sec:samplemonomial} respectively. \begin{Lemma}\label{lem:one-pass} @@ -354,7 +354,7 @@ Armed with the above two lemmas, we are ready to argue the following result (pro \begin{Theorem}\label{lem:mon-samp} %If the contracts for $\onepass$ and $\sampmon$ hold, then For any $\etree$ with $\degree(poly(|\etree|)) = k$, algorithm \ref{alg:mon-sam} outputs an estimate $\vari{acc}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ such that %$\expct\pbox{\empmean} = \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)\cdot(1 - \gamma)}{\abs{\etree}(1,\ldots, 1)}$. %within an additive $\error \cdot \abs{\etree}(1,\ldots, 1)$ error with -$\empmean$ has bounds +%$\empmean$ has bounds \[P\left(\left|\vari{acc} - \rpoly(\prob_1,\ldots, \prob_\numvar)\right|> \error \cdot \abs{\etree}(1,\ldots, 1)\right) \leq \conf,\] in $O\left(\treesize(\etree)\right.$ $+$ $\left.\left(\frac{\log{\frac{1}{\conf}}}{\error^2} \cdot k \cdot\log{k} \cdot depth(\etree)\right)\right)$ time. \end{Theorem} @@ -399,7 +399,7 @@ It turns out that for proof of~\Cref{lem:sample}, we need to argue that when $\e %\begin{align*} %&\eval{\etree~|~\etree.\type = +}_{\wght} =&&\eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}; \etree_\lchild.\wght = \frac{\eval{\etree_\lchild}_{\abs{\etree}}}{\eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}}; \etree_\rchild.\wght = \frac{\eval{\etree_\rchild}_{\abs{\etree}}}{\eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}} %\end{align*} -\noindent \onepass\ (Algorithm ~\ref{alg:one-pass} in \Cref{sec:proofs-approx-alg}) essentially populates the \vari{weight} variable on each node with the above definitions. +\noindent \onepass\ (Algorithm ~\ref{alg:one-pass} in \Cref{sec:proofs-approx-alg}) essentially populates the \vari{weight} variable on each node with the above definitions. Lemma~\ref{lem:one-pass} is also proved in~\Cref{sec:proofs-approx-alg}. %\subsubsection{Psuedo Code}