approx algo

2021-09-20 20:41:06 -05:00 · 2021-09-20 20:41:06 -05:00 · 91dac1d104
parent 67b3f750be
commit 91dac1d104
1 changed files with 16 additions and 15 deletions
--- a/approx_alg.tex
+++ b/approx_alg.tex
@ -10,10 +10,11 @@ The folowing approximation algorithm applies to \abbrBIDB lineage polynomials (o

 \subsection{Preliminaries and some more notation}

-We now introduce definitions and notation related to circuits and polynomials that we will need to state our upper bound results.  
+We now introduce definitions and notation related to circuits and polynomials that we will need to state our upper bound results. First we introduce the expansion $\expansion{\circuit}$ of circuit $\circuit$ which % encodes the reduced polynomial for $\polyf\inparen{\circuit}$ and is the basis
+is used in our algorithm for sampling monomials (part of our approximation algorithm).

 \begin{Definition}[$\expansion{\circuit}$]\label{def:expand-circuit}
-For a circuit $\circuit$, we define $\expansion{\circuit}$ as a list of tuples $(\monom, \coef)$, where $\monom$ is a set of variables and $\coef \in \domN$.  
+For a circuit $\circuit$, we define $\expansion{\circuit}$ as a list of tuples $(\monom, \coef)$, where $\monom$ is a set of variables and $\coef \in \domN$.
 $\expansion{\circuit}$ has the following recursive definition ($\circ$ is list concatenation).
 $\expansion{\circuit} =
 \begin{cases}
@ -54,8 +55,8 @@ Next, we use the following notation for the complexity of multiplying integers:
 In a RAM model of word size of $W$-bits, $\multc{M}{W}$ denotes the complexity of multiplying two integers represented with $M$-bits. (We will assume that for input of size $N$, $W=O(\log{N})$.)
 \end{Definition}

-Finally, to get linear runtime results, we will need to define another parameter modeling the (weighted) number of monomials in %$\poly\inparen{\vct{X}}$ 
-$\expansion{\circuit}$ 
+Finally, to get linear runtime results, we will need to define another parameter modeling the (weighted) number of monomials in %$\poly\inparen{\vct{X}}$
+$\expansion{\circuit}$
 that need to be `canceled' when monomials with dependent variables are removed (\Cref{def:reduced-bi-poly}).  %def:hen it is modded with $\mathcal{B}$ (\Cref{def:mod-set-polys}).
 Let $\isInd{\cdot}$ be a boolean function returning true if monomial $\encMon$ is composed of independent variables and false otherwise; further, let $\indicator{\theta}$ also be a boolean function returning true if $\theta$ evaluates to true.
 \begin{Definition}[Parameter $\gamma$]\label{def:param-gamma}
@ -69,9 +70,9 @@ Given a \abbrBIDB circuit $\circuit$ define

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \mypar{Algorithm Idea}
-%We prove \Cref{lem:approx-alg} by developing an 
-Our approximation algorithm (\approxq pseudo code in \Cref{sec:proof-lem-approx-alg}) 
-%with the desired runtime. This algorithm 
+%We prove \Cref{lem:approx-alg} by developing an
+Our approximation algorithm (\approxq pseudo code in \Cref{sec:proof-lem-approx-alg})
+%with the desired runtime. This algorithm
 is based on the following observation.
 % The algorithm (\approxq detailed in \Cref{alg:mon-sam}) to prove \Cref{lem:approx-alg} follows from the following observation.
 Given a lineage polynomial $\poly(\vct{X})=\polyf(\circuit)$ for circuit \circuit over $\bi$, we have: % can exactly represent $\rpoly(\vct{X})$ as follows:
@ -87,7 +88,7 @@ Given a lineage polynomial $\poly(\vct{X})=\polyf(\circuit)$ for circuit \circui

 Given the above, the algorithm is a sampling based algorithm for the above sum: we sample (via \sampmon) $(\monom,\coef)\in \expansion{\circuit}$ with probability proportional
 to $\abs{\coef}$ and compute $\vari{Y}=\indicator{\isInd{\encMon}}
- \cdot \prod_{X_i\in \monom} p_i$. %Taking $\ceil{\frac{2 \log{\frac{2}{\conf}}}{\error^2}}$ samples 
+ \cdot \prod_{X_i\in \monom} p_i$. %Taking $\ceil{\frac{2 \log{\frac{2}{\conf}}}{\error^2}}$ samples
 Repeating the sampling appropriate number of times
 and computing the average of $\vari{Y}$ gives us our final estimate. \onepass is used to compute the sampling probabilities needed in \sampmon (details are in \Cref{sec:proofs-approx-alg}).
 %%%%%%%%%%%%%%%%%%%%%%%
@ -95,7 +96,7 @@ and computing the average of $\vari{Y}$ gives us our final estimate. \onepass is
 %The following results assume input circuit \circuit computed from an arbitrary $\raPlus$ query $\query$ and arbitrary \abbrBIDB $\pdb$.  We refer to \circuit as a \abbrBIDB circuit.
 %\AH{Verify that the proof for \Cref{lem:approx-alg} doesn't rely on properties of $\raPlus$ or \abbrBIDB.}
 %\begin{Theorem}\label{lem:approx-alg}
-%Let \circuit be an arbitrary \abbrBIDB circuit %for a UCQ over \bi 
+%Let \circuit be an arbitrary \abbrBIDB circuit %for a UCQ over \bi
 %and define $\poly(\vct{X})=\polyf(\circuit)$ and let $k=\degree(\circuit)$.
 %Then an estimate $\mathcal{E}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ can be computed in time
 %{\small
@ -113,11 +114,11 @@ and computing the average of $\vari{Y}$ gives us our final estimate. \onepass is
 % We next present a few corollaries of \Cref{lem:approx-alg}.
 \begin{Theorem}
 \label{cor:approx-algo-const-p}
-Let \circuit be an arbitrary \abbrBIDB circuit %for a UCQ over \bi 
+Let \circuit be an arbitrary \abbrBIDB circuit %for a UCQ over \bi
 and define $\poly(\vct{X})=\polyf(\circuit)$ and let $k=\degree(\circuit)$.
-%Let $\poly(\vct{X})$ be as in \Cref{lem:approx-alg} and 
-Let $\gamma=\gamma(\circuit)$. Further let it be the case that $\prob_i\ge \prob_0$ for all $i\in[\numvar]$. Then an estimate $\mathcal{E}$  of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ 
-satisfying 
+%Let $\poly(\vct{X})$ be as in \Cref{lem:approx-alg} and
+Let $\gamma=\gamma(\circuit)$. Further let it be the case that $\prob_i\ge \prob_0$ for all $i\in[\numvar]$. Then an estimate $\mathcal{E}$  of $\rpoly(\prob_1,\ldots, \prob_\numvar)$
+satisfying
 \begin{equation}
 \label{eq:approx-algo-bound-main}
 \probOf\left(\left|\mathcal{E} - \rpoly(\prob_1,\dots,\prob_\numvar)\right|> \error' \cdot \rpoly(\prob_1,\dots,\prob_\numvar)\right) \leq \conf
@ -130,7 +131,7 @@ O\left(\left(\size(\circuit) + \frac{\log{\frac{1}{\conf}}\cdot k\cdot \log{k} \
 In particular, if $\prob_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)$.
 \end{Theorem}

-The restriction on $\gamma$ is satisfied by any \ti (where $\gamma=0$) as well as for all three queries of the PDBench \bi benchmark (see \Cref{app:subsec:experiment} for experimental results). 
+The restriction on $\gamma$ is satisfied by any \ti (where $\gamma=0$) as well as for all three queries of the PDBench \bi benchmark (see \Cref{app:subsec:experiment} for experimental results).

 We briefly connect the runtime in \Cref{eq:approx-algo-runtime} to the algorithm outline earlier (where  we ignore the dependence on $\multc{\cdot}{\cdot}$, which is needed to handle the cost of arithmetic operations over integers). The $\size(\circuit)$ comes from the time take to run \onepass once (\onepass essentially computes $\abs{\circuit}(1,\ldots, 1)$ using the natural circuit evaluation algorithm on $\circuit$). We make $\frac{\log{\frac{1}{\conf}}}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot \prob_0^{2k}}$ many calls to \sampmon (each of which essentially traces $O(k)$ random sink to source paths in $\circuit$ all of which by definition have length at most $\depth(\circuit)$).

@ -154,7 +155,7 @@ Finally, note that by \Cref{prop:circuit-depth} and \Cref{lem:circ-model-runtime
 \label{cor:approx-algo-punchline}
 Let $\query$ be an $\raPlus$ query and $\pdb$ be an \abbrBIDB with $p_0>0$ and $\gamma<1$ (where $p_0,\gamma$ as in \Cref{cor:approx-algo-const-p}) are absolute constants. Let $\poly(\vct{X})=\apolyqdt$ for any result tuple $\tup$ with $\deg(\poly)=k$. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time $O_{k,|Q|,\error',\conf}\inparen{\qruntime{\query, \dbbase}}$ (given $\query,\dbbase$ and $p_i$ for each $i\in [n]$ that defines $\pd$).
 %Let $\poly(\vct{X})$ be a \abbrBIDB-lineage polynomial correspoding to an \abbrBIDB circuit $\circuit$ that satisfies the specific conditions in \Cref{lem:val-ub}. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time
-% $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)$. % for the case when $\circuit$ satisfies the specific conditions in \Cref{lem:val-ub}. 
+% $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)$. % for the case when $\circuit$ satisfies the specific conditions in \Cref{lem:val-ub}.
 \end{Corollary}
 If we want to approximate the expected multiplicities of all $Z=O(n^k)$ result tuples $\tup$ simultaneously, we just need to run the above result with $\conf$ replaced by $\frac \conf Z$. Note this increases the runtime by only a logarithmic factor.