Done with pass on Sec 4

This commit is contained in:
Atri Rudra 2020-12-19 23:36:11 -05:00
parent fff29ce2f6
commit abdbd0da1a

View file

@ -5,7 +5,7 @@
In~\Cref{sec:hard}, we showed that computing the expected multiplicity of a compressed representation of a bag polynomial for \ti (even just based on project-join queries) is unlikely to be possible in linear time (\Cref{thm:mult-p-hard-result}), even if all tuples have the same probability (\Cref{th:single-p-hard}).
Given this, we now design an approximation algorithm for our problem that runs in {\em linear time}.
Unlike the results in~\Cref{sec:hard} our approximation algorithm works for \bi, though our bounds are more meaningful for a non-trivial subclass of \bis that contains both \tis, as well as the PDBench benchmark.
Unlike the results in~\Cref{sec:hard} our approximation algorithm works for \bi, though our bounds are more meaningful for a non-trivial subclass of \bis that contains both \tis, as well as the PDBench benchmark~\cite{pdbench}.
%it is then desirable to have an algorithm to approximate the multiplicity in linear time, which is what we describe next.
\subsection{Preliminaries and some more notation}
@ -132,10 +132,10 @@ For any expression tree $\etree$, the corresponding
{\em positive tree}, denoted $\abs{\etree}$ obtained from $\etree$ as follows. For each leaf node $\ell$ of $\etree$ where $\ell.\type$ is $\tnum$, update $\ell.\vari{value}$ to $|\ell.\vari{value}|$. %value $\coef$ of each coefficient leaf node in $\etree$ is set to %$\coef_i$ in $\etree$ is exchanged with its absolute value$|\coef|$.
\end{Definition}
Using the same factorization from ~\Cref{example:expr-tree-T}, $poly(\abs{\etree}) = (X + 2Y)(2X + Y) = 2X^2 +XY +4XY + 2Y^2 = 2X^2 + 5XY + 2Y^2$. Note that this \textit{is not} the same as the polynomial from~\Cref{eq:poly-eg}.
Using the same factorization from ~\Cref{example:expr-tree-T}, $\polyf(\abs{\etree}) = (X + 2Y)(2X + Y) = 2X^2 +XY +4XY + 2Y^2 = 2X^2 + 5XY + 2Y^2$. Note that this \textit{is not} the same as the polynomial from~\Cref{eq:poly-eg}.
\begin{Definition}[Evaluation]\label{def:exp-poly-eval}
Given an expression tree $\etree$ and $\vct{v} \in \mathbb{R}^\numvar$, we define the evaluation of $\etree$ on $\vct{v}$ as $\etree(\vct{v}) = poly(\etree)(\vct{v})$.
Given an expression tree $\etree$ and $\vct{v} \in \mathbb{R}^\numvar$, we define the evaluation of $\etree$ on $\vct{v}$ as $\etree(\vct{v}) = \polyf(\etree)(\vct{v})$.
\end{Definition}
\subsection{Our main result}
@ -144,9 +144,9 @@ Given an expression tree $\etree$ and $\vct{v} \in \mathbb{R}^\numvar$, we defin
In the subsequent subsections we will prove the following theorem.
\begin{Theorem}\label{lem:approx-alg}
Let $\etree$ be an expression tree for a UCQ over \bi and define $\poly(\vct{X})=\polyf(\etree)$ and let $k=\degree(\poly)$
Let $\etree$ be an expression tree for a UCQ over \bi and define $\poly(\vct{X})=\polyf(\etree)$ and let $k=\degree(\poly)$.
%Let $\poly(\vct{X})$ be a query polynomial corresponding to the output of a UCQ in a \bi.
An estimate $\mathcal{E}$ %=\approxq(\etree, (p_1,\dots,p_\numvar), \conf, \error')$
Then an estimate $\mathcal{E}$ %=\approxq(\etree, (p_1,\dots,p_\numvar), \conf, \error')$
of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ can be computed in time
\[O\left(\treesize(\etree) + \frac{\log{\frac{1}{\conf}}\cdot \abs{\etree}^2(1,\ldots, 1)\cdot k\cdot \log{k} \cdot depth(\etree))}{\inparen{\error'}^2\cdot\rpoly^2(\prob_1,\ldots, \prob_\numvar)}\right)\]
such that
@ -173,7 +173,7 @@ We next present couple of corollaries of~\Cref{lem:approx-alg}.
\label{cor:approx-algo-const-p}
Let $\poly(\vct{X})$ be as in~\Cref{lem:approx-alg} and let $\gamma=\gamma(\etree)$. Further let it be the case that $p_i\ge p_0$ for all $i\in[\numvar]$. Then an estimate $\mathcal{E}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ satisfying~\Cref{eq:approx-algo-bound} can be computed in time
\[O\left(\treesize(\etree) + \frac{\log{\frac{1}{\conf}}\cdot k\cdot \log{k} \cdot depth(\etree))}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot p_0^{2k}}\right)\]
In particular, if $p_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\frac 1{\eps^2}\cdot\treesize(\etree)\cdot \log{\frac{1}{\conf}}\right)$.
In particular, if $p_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\treesize(\etree)\cdot \log{\frac{1}{\conf}}\right)$.
\end{Corollary}
The proof for~\Cref{cor:approx-algo-const-p} can be seen in~\Cref{sec:proofs-approx-alg}.
@ -188,7 +188,7 @@ Thus, we expect the corrolary to hold in general.
\subsection{Approximating $\rpoly$}
The algorithm to prove~\Cref{lem:approx-alg} follows from the following observation. Given a query polynomial $\poly(\vct{X})=poly(\etree)$ for expression tree $\etree$ over $\bi$, we can exactly represent $\rpoly(\vct{X})$ as follows:
The algorithm to prove~\Cref{lem:approx-alg} follows from the following observation. Given a query polynomial $\poly(\vct{X})=\polyf(\etree)$ for expression tree $\etree$ over $\bi$, we can exactly represent $\rpoly(\vct{X})$ as follows:
\begin{equation}
\label{eq:tilde-Q-bi}
\rpoly\inparen{X_1,\dots,X_\numvar}=\hspace*{-1mm}\sum_{(v,c)\in \expandtree{\etree}} \hspace*{-2mm} \indicator{\monom\mod{\mathcal{B}}\not\equiv 0}\cdot c\cdot\hspace*{-2mm}\prod_{X_i\in \var\inparen{v}}\hspace*{-2mm} X_i
@ -335,7 +335,7 @@ The number of samples is computed by (see \Cref{app:subsec-th-mon-samp}):
\subsubsection{Correctness}
In order to prove~\Cref{lem:approx-alg}, we will need to argue the correctness of~\Cref{alg:mon-sam}. Before we formally do that,
we first state the lemmas that summarize the relevant properties of $\onepass$ and $\sampmon$, the auxiliary algorithms on which ~\Cref{alg:mon-sam} relies. Their proofs are given in~\Cref{sec:onepass} and~\Cref{sec:samplemonomial} respectively.
we first state the lemmas that summarize the relevant properties of $\onepass$ and $\sampmon$, the auxiliary algorithms on which ~\Cref{alg:mon-sam} relies. %Their proofs are given in~\Cref{sec:onepass} and~\Cref{sec:samplemonomial} respectively.
\begin{Lemma}\label{lem:one-pass}
@ -354,7 +354,7 @@ Armed with the above two lemmas, we are ready to argue the following result (pro
\begin{Theorem}\label{lem:mon-samp}
%If the contracts for $\onepass$ and $\sampmon$ hold, then
For any $\etree$ with $\degree(poly(|\etree|)) = k$, algorithm \ref{alg:mon-sam} outputs an estimate $\vari{acc}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ such that %$\expct\pbox{\empmean} = \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)\cdot(1 - \gamma)}{\abs{\etree}(1,\ldots, 1)}$. %within an additive $\error \cdot \abs{\etree}(1,\ldots, 1)$ error with
$\empmean$ has bounds
%$\empmean$ has bounds
\[P\left(\left|\vari{acc} - \rpoly(\prob_1,\ldots, \prob_\numvar)\right|> \error \cdot \abs{\etree}(1,\ldots, 1)\right) \leq \conf,\]
in $O\left(\treesize(\etree)\right.$ $+$ $\left.\left(\frac{\log{\frac{1}{\conf}}}{\error^2} \cdot k \cdot\log{k} \cdot depth(\etree)\right)\right)$ time.
\end{Theorem}
@ -399,7 +399,7 @@ It turns out that for proof of~\Cref{lem:sample}, we need to argue that when $\e
%\begin{align*}
%&\eval{\etree~|~\etree.\type = +}_{\wght} =&&\eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}; \etree_\lchild.\wght = \frac{\eval{\etree_\lchild}_{\abs{\etree}}}{\eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}}; \etree_\rchild.\wght = \frac{\eval{\etree_\rchild}_{\abs{\etree}}}{\eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}}
%\end{align*}
\noindent \onepass\ (Algorithm ~\ref{alg:one-pass} in \Cref{sec:proofs-approx-alg}) essentially populates the \vari{weight} variable on each node with the above definitions.
\noindent \onepass\ (Algorithm ~\ref{alg:one-pass} in \Cref{sec:proofs-approx-alg}) essentially populates the \vari{weight} variable on each node with the above definitions. Lemma~\ref{lem:one-pass} is also proved in~\Cref{sec:proofs-approx-alg}.
%\subsubsection{Psuedo Code}