The following results assume input circuit \circuit computed from an arbitrary $\raPlus$ query $\query$ and arbitrary \abbrBIDB$\pdb$. We refer to \circuit as a \abbrBIDB circuit.
The slight abuse of notation seen in $\abs{\circuit}\inparen{1,\ldots,1}$ is explained after \Cref{def:positive-circuit} and an example is given in \Cref{ex:def-pos-circ}. The only difference in the use of this notation in \Cref{lem:approx-alg} is that we include an additional exponent to square the quantity.
We prove \Cref{lem:approx-alg} constructively by presenting an algorithm \approxq (\Cref{alg:mon-sam}) which has the desired runtime and computes an approximation with the desired approximation guarantee. Algorithm \approxq uses auxiliary algorithm \onepass to compute weights on the edges of a circuit. These weights are then used to sample a set of monomials of $\poly(\circuit)$ from the circuit $\circuit$ by traversing the circuit using the weights to ensure that monomials are sampled with an appropriate probability. The correctness of \approxq relies on the correctness (and runtime behavior) of auxiliary algorithms \onepass and \sampmon that we state in the following lemmas (and prove later in this part of the appendix).
$\onepass$ guarantees two post-conditions: First, for each subcircuit $\vari{S}$ of $\circuit$, we have that $\vari{S}.\vari{partial}$ is set to $\abs{\vari{S}}(1,\ldots, 1)$. Second, when $\vari{S}.\type=\circplus$, \subcircuit.\lwght$=\frac{\abs{\subcircuit_\linput}(1,\ldots, 1)}{\abs{\subcircuit}(1,\ldots, 1)}$ and likewise for \subcircuit.\rwght.
To prove correctness of \Cref{alg:mon-sam}, we only use the following fact that follows from the above lemma: for the modified circuit ($\circuit_{\vari{mod}}$) output by \onepass, $\circuit_{\vari{mod}}.\vari{partial}=\abs{\circuit}(1,\dots,1)$.
$$O(\log{k}\cdot k \cdot\depth(\circuit)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log{\size(\circuit)}})$$
where $k =\degree(\circuit)$. The function returns every $\left(\monom, sign(\coef)\right)$ for $(\monom, \coef)\in\expansion{\circuit}$ with probability $\frac{|\coef|}{\abs{\circuit}(1,\ldots, 1)}$.
in $O\left(\left(\size(\circuit)+\frac{\log{\frac{1}{\conf}}}{\error^2}\cdot k \cdot\log{k}\cdot\depth(\circuit)\right)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log{\size(\circuit)}}\right)$ time.
which achieves the claimed error bound on $\mathcal{E}$ (\vari{acc}) trivially due to the assignment to $\error'$ and \cref{lem:mon-samp}, since $\error' \cdot\abs{\circuit}(1,\ldots, 1)=\error\cdot\frac{\rpoly(\prob_1,\ldots, \prob_\numvar)}{\abs{\circuit}(1,\ldots, 1)}\cdot\abs{\circuit}(1,\ldots, 1)=\error\cdot\rpoly(\prob_1,\ldots, \prob_\numvar)$.
Consider now the random variables $\randvar_1,\dots,\randvar_\numsamp$, where each $\randvar_\vari{i}$ is the value of $\vari{Y}_{\vari{i}}$ in \cref{alg:mon-sam} after \cref{alg:mon-sam-product} is executed. Overloading $\isInd{\cdot}$ to receive monomial input (recall $\encMon$ is the monomial composed of the variables in the set $\monom$), we have
where in the first equality we use the fact that $\vari{sgn}_{\vari{i}}\cdot\abs{\coef}=\coef$ and the second equality follows from \Cref{eq:tilde-Q-bi} with $X_i$ substituted by $\prob_i$.
Hoeffding's inequality states that if we know that each $\randvar_i$ (which are all independent) always lie in the intervals $[a_i, b_i]$, then it is true that
Line~\ref{alg:mon-sam-sample} shows that $\vari{sgn}_\vari{i}$ has a value in $\{-1, 1\}$ that is multiplied with $O(k)$$\prob_i\in[0, 1]$, which implies the range for each $\randvar_i$ is $[-1, 1]$.
For the claimed probability bound of $\probOf\left(\left|\vari{acc}-\rpoly(\prob_1,\ldots, \prob_\numvar)\right|> \error\cdot\abs{\circuit}(1,\ldots, 1)\right)\leq\conf$, note that in the algorithm, \vari{acc} is exactly $\empmean\cdot\abs{\circuit}(1,\ldots, 1)$. Multiplying the rest of the terms by the additional factor $\abs{\circuit}(1,\ldots, 1)$ yields the said bound.
The runtime of the algorithm is dominated first by \Cref{alg:mon-sam-onepass} which has $O\left({\size(\circuit)}\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)$ runtime by \Cref{lem:one-pass}. There are then $\samplesize$ iterations of the loop in \Cref{alg:sampling-loop}. Each iteration's run time is dominated by the call to \sampmon in \Cref{alg:mon-sam-sample} (which by \Cref{lem:sample} takes $O\left(\log{k}\cdot k \cdot{\depth(\circuit)}\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)$
) and the check \Cref{alg:check-duplicate-block}, which by the subsequent argument takes $O(k\log{k})$ time. We sort the $O(k)$ variables by their block IDs and then check if there is a duplicate block ID or not. Combining all the times discussed here gives us the desired overall runtime.
Applying this bound in the runtime bound in \Cref{lem:approx-alg} gives the first claimed runtime. The final runtime of $O_k\left(\frac1{\eps^2}\cdot\size(\circuit)\cdot\log{\frac{1}{\conf}}\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)$ follows by noting that $\depth({\circuit})\le\size({\circuit})$ and absorbing all factors that just depend on $k$.
The circuit \circuit' is built from \circuit in the following manner. For each input gate $\gate_i$ with $\gate_i.\val= X_\tup$, replace $\gate_i$ with the circuit \subcircuit encoding the sum $\sum_{j =1}^\bound j\cdot X_{\tup, j}$. We argue that \circuit' is a valid circuit by the following facts. Let $\pdb=\inparen{\worlds, \bpd}$ be the original \abbrCTIDB\circuit was generated from. Then, by~\Cref{prop:ctidb-reduct} there exists a \abbrOneBIDB$\pdb' =\inparen{\onebidbworlds{\tupset'}, \bpd'}$, with $\tupset' =\inset{\intuple{\tup, j}~|~\tup\in\tupset, j\in\pbox{\bound}}$, from which the conversion from \circuit to \circuit' follows. Both $\polyf\inparen{\circuit}$ and $\polyf\inparen{\circuit'}$ have the same expected multiplicity since (by~\Cref{prop:ctidb-reduct}) the distributions $\bpd$ and $\bpd'$ are equivalent and $\sum_{j=1}^\bound j\cdot\worldvec'_{\tup, j}=\worldvec_\tup$ for $\worldvec'\in\inset{0, 1}^{\bound\numvar}$ and $\worldvec\in\worlds$ such that $\worldvec_\tup\equiv\worldvec'_\tup$. Finally, note that because there exists a (sub) circuit encoding $\sum_{j =1}^\bound j\cdot X_{\tup, j}$ that is a \emph{balanced} binary tree, the above conversion implies the claimed size and depth bounds of the lemma.
Next we argue the claim on $\gamma\inparen{\circuit'}$. Consider the list of expanded monomials $\expansion{\circuit}$ for \abbrCTIDB circuit \circuit. Let
$\encMon= X_{\tup_1}^{d_1}\cdots X_{\tup_\ell}^{d_\ell}$ be an arbitrary monomial with $\ell$ variables and let (abusing notation) $\encMon' =\inparen{\sum_{j =1}^{\bound}j\cdot X_{\tup_1, j}}^{d_1}\cdots\inparen{\sum_{j =1}^{\bound}j\cdot X_{\tup_\ell, j}}^{d_\ell}$. Then, for $f_\ell=\sum_{i =1}^\ell d_i$, $\encMon$ induces the set of monomials $\inset{\prod_{i =1}^{f_\ell} j_i\cdot X_{\tup_i, j_i}^{d_i}}_{j_i\in\pbox{\bound}}$ in the pure expansion of $\encMon'$.
%Denote the additional list elements (projecting out coefficient terms) \emph{induced} by $\monom$ as $\vari{E}_\monom\inparen{\circuit'}$. Then $\vari{E}_\monom\inparen{\circuit'}=\inset{\monom'^1~|~\encMon' \in \vari{S}}$%\inset{j_1^{d_1}\cdot X_{\tup, j_1}^{d_1}\times\cdots\times j_\ell^{d_\ell}\cdot X_{\tup, j_\ell}^{d_\ell}}_{j_1,\ldots, j_\ell \in \pbox{\bound}}$ in $\expansion{\circuit'}$.
Recall that a cancellation occurs in $\encMon'$ when there exists $\tup_{i, j}\neq\tup_{i, j'}$ in the same block $\block$ where variables $X_{\tup_i, j}, X_{\tup_i, j'}$ are in the set of variables $\monom_i'$ of $\monom_{\vari{m}_\vari{i}}\in\encMon'$. Observe that cancellations can only occur for each $X_{\tup}^{d_\tup}\in\encMon$, where the expansion $\inparen{\sum_{j =1}^\bound j\cdot X_{\tup, j}}^{d_\tup}$ represents the monomial $X_\tup^{d_\tup}$ in $\tupset'$. Consider the number of cancellations for $\inparen{\sum_{j =1}^\bound j\cdot X_{\tup, j}}^{d_t}$. Then $\gamma\leq1-\bound^{-\inparen{d_\tup-1}}$, since
for each element in the set of cross products $\inset{\bigtimes_{i\in\pbox{d_\tup}, j_i\in\pbox{\bound}}X_{\tup, j_i}}$ there are \emph{exactly}$\bound$ surviving elements with $j_1=\cdots=j_{d_\tup}=j$, i.e. $X_{t,j}^{d_\tup}$ for each $j\in\pbox{\bound}$. The rest of the $\bound^{d_\tup}-c$ cross terms cancel. Regarding all of $\encMon'$, it is the case that the proportion of non-cancellations for each $\inparen{\sum_{j =1}^{\bound}j\cdot X_{\tup_i, j }}^{d_i}\in\encMon'$ multiply because non-cancelling terms for $\inparen{\sum_{j =1}^{\bound}j\cdot X_{\tup_i, j}}^{d_i}$ can only be joined with non-cancelling terms of $\inparen{\sum_{j=1}^{\bound}X_{\tup_{i'}, j}}^{d_{i'}}\in\encMon'$ for $\tup\neq\tup'$. This then yields the fraction of cancelled monomials $\gamma\le1-\prod_{i =1}^{\ell}\bound^{-\inparen{d_i -1}}\leq1-\bound^{-\inparen{k -1}}$ where the inequalities take into account the fact that $f_\ell\leq k$.
For the base case, we have that \depth(\circuit) $=0$, and there can only be one node which must contain a coefficient or constant. In this case, $\abs{\circuit}(1,\ldots, 1)=1$, and \size(\circuit) $=1$, and by \Cref{def:degree} it is the case that $0\leq k =\degree\inparen{\circuit}\leq1$, and it is true that $\abs{\circuit}(1,\ldots, 1)=1\leq N^{k+1}=1^{k +1}=1$ for $k \in\inset{0, 1}$.
For the inductive step we consider a circuit \circuit such that $\depth(\circuit)=\ell+1$. The sink can only be either a $\circmult$ or $\circplus$ gate. Let $k_\linput, k_\rinput$ denote \degree($\circuit_\linput$) and \degree($\circuit_\rinput$) respectively. Consider when sink node is $\circmult$.
In the above the first inequality follows from the inductive hypothesis (and the fact that the size of either subtree is at most $N-1$) and \Cref{eq:sumcoeff-times-upper} follows by \cref{def:degree} which states that for $k =\degree(\circuit)$ we have $k=k_\linput+k_\rinput+1$.
In the above, the first inequality follows from the inductive hypothes and \cref{def:degree} (which implies the fact that $k_\linput,k_\rinput\le k$). Note that the RHS of this inequality is maximized when the base and exponent of one of the terms is maximized. The second inequality follows from this fact as well as the fact that since $\circuit$ is a tree we have $N_\linput+N_\rinput=N-1$ and, lastly, the fact that $k\ge0$. This completes the proof.
%\AH{I don't think that it matters whether or not \circuit is a tree. For $N=\size\inparen{\circuit}$ it must follow that $N_L + N_R + 1 = N$ regardless of whether a gate a allowed to have more than one parent. Not true, consider when $\circuit_R = \circuit_L$.}
The upper bound in \Cref{lem:val-ub} for the general case is a simple variant of the above proof (but we present a proof sketch of the bound below for completeness):
We use the same notation as in the proof of \Cref{lem:C-ub-tree} and further define $d=\depth(\circuit)$. We will prove by induction on $\depth(\circuit)$ that $\abs{\circuit}(1,\ldots, 1)\leq2^{2^k\cdot d }$. The base case argument is similar to that in the proof of \Cref{lem:C-ub-tree}. In the inductive case we have that $d_\linput,d_\rinput\le d-1$.
In the above the first inequality follows from inductive hypothesis while the second inequality follows from the fact that $k_\linput,k_\rinput\le k-1$ and $d_\linput, d_\rinput\le d-1$, where we substitute the upperbound into every respective term.
In the above the first inequality follows from the inductive hypothesis while the second inequality follows from the facts that $k_\linput,k_\rinput\le k$ and $d_\linput,d_\rinput\le d-1$. The final inequality follows from the fact that $k\ge0$.