Stuck in proof of Lemma 4.15

2021-04-06 16:35:11 -04:00 · 2021-04-06 16:35:11 -04:00 · f226af1dc3
parent acdc8396d8
commit f226af1dc3
3 changed files with 95 additions and 17 deletions
--- a/app_approx-alg-analysis.tex
+++ b/app_approx-alg-analysis.tex
@ -4,14 +4,14 @@
 Before proving~\Cref{lem:mon-samp}, we use it to argue our main result,~\Cref{lem:approx-alg}:
 \subsection{Proof of Theorem \ref{lem:approx-alg}}\label{sec:proof-lem-approx-alg}

-Set $\mathcal{E}=\approxq(\revision{\circuit}, (\prob_1,\dots,\prob_\numvar),$ $\conf, \error')$, where
-\[\error' = \error \cdot \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)\cdot (1 - \gamma)}{\abs{\revision{\circuit}}(1,\ldots, 1)},\]
+Set $\mathcal{E}=\approxq({\circuit}, (\prob_1,\dots,\prob_\numvar),$ $\conf, \error')$, where
+\[\error' = \error \cdot \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)\cdot (1 - \gamma)}{\abs{{\circuit}}(1,\ldots, 1)},\]
 which achieves the claimed accuracy bound on $\mathcal{E}$ due to~\Cref{lem:mon-samp}.

 The claim on the runtime follows from~\Cref{lem:mon-samp} since
 \begin{align*}
-\frac 1{\inparen{\error'}^2}\cdot \log\inparen{\frac 1\conf}=&\frac{\log{\frac{1}{\conf}}}{\error^2 \left(\frac{\rpoly(\prob_1,\ldots, \prob_N)}{\abs{\revision{\circuit}}(1,\ldots, 1)}\right)^2}\\
-= &\frac{\log{\frac{1}{\conf}}\cdot \abs{\revision{\circuit}}^2(1,\ldots, 1)}{\error^2 \cdot \rpoly^2(\prob_1,\ldots, \prob_\numvar)},
+\frac 1{\inparen{\error'}^2}\cdot \log\inparen{\frac 1\conf}=&\frac{\log{\frac{1}{\conf}}}{\error^2 \left(\frac{\rpoly(\prob_1,\ldots, \prob_N)}{\abs{{\circuit}}(1,\ldots, 1)}\right)^2}\\
+= &\frac{\log{\frac{1}{\conf}}\cdot \abs{{\circuit}}^2(1,\ldots, 1)}{\error^2 \cdot \rpoly^2(\prob_1,\ldots, \prob_\numvar)},
 \end{align*}
 %and the runtime then follows, thus upholding ~\cref{lem:approx-alg}.
 which completes the proof.
@ -23,8 +23,8 @@ Consider now the random variables $\randvar_1,\dots,\randvar_\numvar$, where eac
 where the indicator variable handles the check in~\Cref{alg:check-duplicate-block}
 Then for random variable $\randvar_i$, it is the case that
 \begin{align*}
-\expct\pbox{\randvar_i} &= \sum\limits_{(\monom, \coef) \in \expansion{\revision{\circuit}} }\frac{\onesymbol\inparen{\monom\mod{\mathcal{B}}\not\equiv 0}\cdot c\cdot\prod_{X_i\in \var\inparen{v}} p_i }{\abs{\revision{\circuit}}(1,\dots,1)} \\
-&= \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)}{\abs{\revision{\circuit}}(1,\ldots, 1)},
+\expct\pbox{\randvar_i} &= \sum\limits_{(\monom, \coef) \in \expansion{{\circuit}} }\frac{\onesymbol\inparen{\monom\mod{\mathcal{B}}\not\equiv 0}\cdot c\cdot\prod_{X_i\in \var\inparen{v}} p_i }{\abs{{\circuit}}(1,\dots,1)} \\
+&= \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)}{\abs{{\circuit}}(1,\ldots, 1)},
 \end{align*}
 where in the first equality we use the fact that $\vari{sgn}_{\vari{i}}\cdot \abs{\coef}=\coef$ and the second equality follows from~\cref{eq:tilde-Q-bi} with $X_i$ substituted by $\prob_i$.

@ -32,7 +32,7 @@ Let $\empmean = \frac{1}{\samplesize}\sum_{i = 1}^{\samplesize}\randvar_i$.  It

 \[\expct\pbox{\empmean}  
 = \frac{1}{\samplesize}\sum_{i = 1}^{\samplesize}\expct\pbox{\randvar_i}
-= \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)}{\abs{\revision{\circuit}}(1,\ldots, 1)}.\]
+= \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)}{\abs{{\circuit}}(1,\ldots, 1)}.\]

 Hoeffding's inequality states that if we know that each $\randvar_i$ (which are all independent) always lie in the intervals $[a_i, b_i]$, then it is true that
 \begin{equation*}
@ -54,8 +54,82 @@ The runtime of the algorithm is dominated by~\Cref{alg:mon-sam-onepass} (which b

 \subsection{Proof of~\Cref{cor:approx-algo-const-p}}
 The result follows by first noting that by definition of $\gamma$, we have
-\[\rpoly(1,\dots,1)= (1-\gamma)\cdot \abs{\revision{\circuit}}(1,\dots,1).\]
+\[\rpoly(1,\dots,1)= (1-\gamma)\cdot \abs{{\circuit}}(1,\dots,1).\]
 Further, since each $\prob_i\ge \prob_0$ and $\poly(\vct{X})$ (and hence $\rpoly(\vct{X})$) has degree at most $k$, we have that
 \[ \rpoly(1,\dots,1) \ge \prob_0^k\cdot \rpoly(1,\dots,1).\]
-The above two inequalities implies $\rpoly(1,\dots,1) \ge \prob_0^k\cdot (1-\gamma)\cdot \abs{\revision{\circuit}}(1,\dots,1)$.
-Applying this bound in the runtime bound in~\Cref{lem:approx-alg} gives the first claimed runtime. The final runtime of $O_k\left(\frac 1{\eps^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)$ follows by noting that $depth(\revision{\circuit})\le \size(\revision{\circuit})$ and absorbing all factors that just depend on $k$.
+The above two inequalities implies $\rpoly(1,\dots,1) \ge \prob_0^k\cdot (1-\gamma)\cdot \abs{{\circuit}}(1,\dots,1)$.
+Applying this bound in the runtime bound in~\Cref{lem:approx-alg} gives the first claimed runtime. The final runtime of $O_k\left(\frac 1{\eps^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\cdot \multc{\log\left(\abs{\circuit}^2(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)$ follows by noting that $\depth({\circuit})\le \size({\circuit})$ and absorbing all factors that just depend on $k$.
+
+\subsection{Proof of~\Cref{lem:val-ub}}
+
+
+%\paragraph{Sufficient condition for $\abs{\circuit}(1,\ldots, 1)$ to be size $O(N)$}
+%For our runtime results to be relevant, it must be the case that the sum of the coefficients computed by \onepass is indeed size $O(N)$ since there are $O(\log{N})$ bits in the RAM model where $N$ is the size of the input.  The size of the input here is \size(\circuit).  We show that when \size$(\circuit_\linput) = N_\linput$, \size$(\circuit_\rinput) = N_\rinput$, where $N_\linput + N_\rinput \leq N$, this is indeed the case.
+
+We will prove~\Cref{lem:val-ub} by considering the three cases separetly. We first being with the case when $\circuit$ is a tree:
+\begin{Lemma}
+\label{lem:C-ub-tree}
+Let $\circuit$ be a tree (i.e. the sub-circuits corresponding to two children of a node in $\circuit$ are completely disjoint). Then we have
+\[\abs{\circuit}(1,\dots,1)\le \left(\size(\circuit)\right)^{2^{\depth(\circuit)}}.\]
+\end{Lemma}
+\begin{proof}%[Proof of $\abs{\circuit}(1,\ldots, 1)$ is size $O(N)$]
+For notational simplcity define $N=\size(\circuit)$ and $k=\depth(\circuit)$.
+To prove this result, we by prove by induction on $k$ that $\abs{\circuit}(1,\ldots, 1) \leq N^{2^k }$.
+For the base case, we have that \depth(\circuit) $= 0$, and there can only be one node which must contain a coefficient (or constant) of $1$.  In this case, $\abs{\circuit}(1,\ldots, 1) = 1$, and \size(\circuit) $= 1$, and it is true that $\abs{\circuit}(1,\ldots, 1) = 1 \leq N^{2^k} = 1^{2^0} = 1$.
+
+Assume for $\ell > 0$ an arbitrary circuit \circuit of $\depth(\circuit) \leq \ell$ that it is true that $\abs{\circuit}(1,\ldots, 1) \leq N^{2^\ell }$.% for $k \geq 1$ when \depth(C) $\geq 1$.
+
+For the inductive step we consider a circuit \circuit such that $\depth(\circuit) = \ell + 1$.  The sink can only be either a $\circmult$ or $\circplus$ gate.  Consider when sink node is $\circmult$.  Let $k_\linput, k_\rinput$ denote \degree($\circuit_\linput$) and \degree($\circuit_\rinput$) respectively.  %Note that this case does not require the constraint on $N_\linput$ or $N_\rinput$.
+In this case we do not use the fact that $\circuit$ is a tree and just assume that $N_\linput,N_\rinput\le N-1$. Then note that
+\begin{align}
+\abs{\circuit}(1,\ldots, 1) &= \abs{\circuit_\linput}(1,\ldots, 1)\circmult \abs{\circuit_\rinput}(1,\ldots, 1) \nonumber\\
+&\leq (N-1)^{2^{k_\linput}} \circmult (N - 1)^{2^{k_\rinput}}\nonumber\\
+ &\leq (N-1)^{2^{k}}\label{eq:sumcoeff-times-upper}\\
+ &\leq N^{2^k}.\nonumber
+\end{align}
+%We derive the upperbound of \cref{eq:sumcoeff-times-upper} by noting that the maximum value of the LHS occurs when both the base and exponent are maximized.
+In the above the first inequality follows from the inductive hypothesis and \cref{eq:sumcoeff-times-upper} follows by nothing that for $\times$ node we have $k=k_\linput+k_\rinput$.
+
+For the case when the sink node is a $\circplus$ node, then we have
+\begin{align}
+\abs{\circuit}(1,\ldots, 1) &= \abs{\circuit_\linput}(1,\ldots, 1) \circplus \abs{\circuit_\rinput}(1,\ldots, 1) \nonumber\\
+&\leq
+N_\linput^{2^{k_\linput}} + N_\rinput^{2^{k_\rinput}}\nonumber\\
+&\leq (N-1)^{2^k } \label{eq:sumcoeff-plus-upper}\\
+&\leq N^{2^k}.\nonumber
+\end{align}
+In the above, the first inequality follows from the inductive hypothesis while the second inequality follows from the fact that since $\circuit$ is a tree we have $N_\linput+N_\rinput=N-1$ and the fact that $0\le k_\linput,k_\rinput\le k$. This compeletes the proof.
+%Similar to the $\circmult$ case, \cref{eq:sumcoeff-plus-upper} upperbounds its LHS by the fact that the maximum base and exponent combination is always greater than or equal to the sum of lower base/exponent combinations.  The final equality is true given the constraint over the inputs.
+
+%Since $\abs{\circuit}(1,\ldots, 1) \leq N^{2^k}$ for all circuits such that all $\circplus$ gates share at most one gate with their sibling (across their respective subcircuits), then $\log{N^{2^k}} = 2^k \cdot \log{N}$ which for fixed $k$ yields the desired $O(\log{N})$ bits for $O(1)$ arithmetic operations.% for the given query class.
+\end{proof}
+
+\revision{\textbf{THE PART BELOW NEEDS WORK. --Atri}}
+The upper bound in~\Cref{lem:val-ub} for the general case is a simple variant of the above proof (but we present a proof sketch of the bound below for completeness):
+\begin{Lemma}
+\label{lem:C-ub-gen}
+Let $\circuit$ be a (general) circuit. % tree (i.e. the sub-circuits corresponding to two children of a node in $\circuit$ are completely disjoint). 
+Then we have
+\[\abs{\circuit}(1,\dots,1)\le 2^{\depth(\circuit)\cdot \size(\circuit)}.\]
+\end{Lemma}
+\begin{proof}[Proof Sketch]
+We use the same notation as in the proof of~\Cref{lem:C-ub-tree}. We will prove by induction on $k$ that $\abs{\circuit}(1,\ldots, 1) \leq 2^{(k+1)N }$. The base case argument is similar to that in the proof of~\Cref{lem:C-ub-tree}. In the inductive case we have that $N_\linput,N_\rinput\le N-1$.
+
+For the case when the sink node is $\times$, we get that
+\begin{align*}
+\abs{\circuit}(1,\ldots, 1) &= \abs{\circuit_\linput}(1,\ldots, 1)\circmult \abs{\circuit_\rinput}(1,\ldots, 1) \\
+&\leq {2^{(k_\linput+1)\cdot N_\linput}} \circmult {2^{(k_\rinput+1)\cdot N_\rinput}}\\
+ &\leq {2\cdot 2^{(\max(k_\linput,k_\rinput)+1)(N-1)}}\\
+ &\leq 2^{(k+1) N}.
+\end{align*}
+In the above the first inequality follows from inductive hypothesis while the third inequality follows from the fact that $k_\linput+k_\rinput=k$ (and hence $\max(k_\linput,k_\rinput)\le k$) as well as the fact that $k\ge 0$.
+
+Now consider the case when the sink node is $+$, we get that
+\begin{align*}
+\abs{\circuit}(1,\ldots, 1) &= \abs{\circuit_\linput}(1,\ldots, 1) \circplus \abs{\circuit_\rinput}(1,\ldots, 1) \\
+&\leq 2^{(k_\linput+1)\cdot N_\linput} + 2^{(k_\rinput+1)\cdot N_\rinput}\\
+&\leq 2\cdot {2^{(k+1)(N-1)} } \\
+&\leq 2^{(k+1)N}.
+\end{align*}
+In the above the first inequality follows from the inductive hypothesis while the second inequality follows from the fact that $k_\linput,k_\rinput\le k$. The final inequality follows from the fact that $k\ge 0$.
+\end{proof}
--- a/app_one-pass-analysis.tex
+++ b/app_one-pass-analysis.tex
@ -173,6 +173,9 @@ When $\gate_{k+1}.\type = \circmult$, then line ~\ref{alg:one-pass-mult} compute
 \paragraph{\onepass Runtime}
 It is known that $\topord(G)$ is computable in linear time.  Next, each of the $\numvar$ iterations of the loop in ~\Cref{alg:one-pass-loop} take $O(1)$ time.  In general it is known that an arithmetic computation which requires $M$ bits takes $O(\frac{\log{M}}{\log{N}})$ time for an input size $N$.  Since each of the arithmetic operations at a given gate has a bit size of $O(\log{\abs{\circuit}(1,\ldots, 1)})$,  thus, we obtain the general runtime of $O\left(\size(\circuit)\cdot \frac{\log{\abs{\circuit}(1,\ldots, 1)}}{\log{\size(\circuit)}}\right)$.

+
+%%%Moved the stuff below to earlier in the appendix
+\iffalse
 \paragraph{Sufficient condition for $\abs{\circuit}(1,\ldots, 1)$ to be size $O(N)$}
 For our runtime results to be relevant, it must be the case that the sum of the coefficients computed by \onepass is indeed size $O(N)$ since there are $O(\log{N})$ bits in the RAM model where $N$ is the size of the input.  The size of the input here is \size(\circuit).  We show that when \size$(\circuit_\linput) = N_\linput$, \size$(\circuit_\rinput) = N_\rinput$, where $N_\linput + N_\rinput \leq N$, this is indeed the case.

@ -200,4 +203,5 @@ N_\linput^{2^{k_\linput}} + N_\rinput^{2^{k_\rinput}}\nonumber\\
 Similar to the $\circmult$ case, \cref{eq:sumcoeff-plus-upper} upperbounds its LHS by the fact that the maximum base and exponent combination is always greater than or equal to the sum of lower base/exponent combinations.  The final equality is true given the constraint over the inputs.  

 Since $\abs{\circuit}(1,\ldots, 1) \leq N^{2^k}$ for all circuits such that all $\circplus$ gates share at most one gate with their sibling (across their respective subcircuits), then $\log{N^{2^k}} = 2^k \cdot \log{N}$ which for fixed $k$ yields the desired $O(\log{N})$ bits for $O(1)$ arithmetic operations.% for the given query class.
-\end{proof} 
+\end{proof} 
+\fi
--- a/approx_alg.tex
+++ b/approx_alg.tex
@ -100,7 +100,7 @@ In the subsequent subsections we will prove the following theorem.
 \begin{Theorem}\label{lem:approx-alg}
 Let \circuit be a circuit for a UCQ over \bi and define $\poly(\vct{X})=\polyf(\circuit)$ and let $k=\degree(\circuit)$.
 Then an estimate $\mathcal{E}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ can be computed in time
-\[O\left(\left(\size(\circuit) + \frac{\log{\frac{1}{\conf}}\cdot \abs{\circuit}^2(1,\ldots, 1)\cdot  k\cdot \log{k} \cdot \depth(\circuit))}{\inparen{\error'}^2\cdot\rpoly^2(\prob_1,\ldots, \prob_\numvar)}\right)\cdot\multc{\log\left(\abs{\circuit}^2(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)\]
+\[O\left(\left(\size(\circuit) + \frac{\log{\frac{1}{\conf}}\cdot \abs{\circuit}^2(1,\ldots, 1)\cdot  k\cdot \log{k} \cdot \depth(\circuit))}{\inparen{\error'}^2\cdot\rpoly^2(\prob_1,\ldots, \prob_\numvar)}\right)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)\]
 such that
 \begin{equation}
 \label{eq:approx-algo-bound}
@ -121,8 +121,8 @@ Given an expression tree $\circuit$, define
 \begin{Corollary}
 \label{cor:approx-algo-const-p}
 Let $\poly(\vct{X})$ be as in~\Cref{lem:approx-alg} and let $\gamma=\gamma(\circuit)$. Further let it be the case that $\prob_i\ge \prob_0$ for all $i\in[\numvar]$. Then an estimate $\mathcal{E}$  of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ satisfying~\Cref{eq:approx-algo-bound} can be computed in time
-\[O\left(\left(\size(\circuit) + \frac{\log{\frac{1}{\conf}}\cdot k\cdot \log{k} \cdot \depth(\circuit))}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot \prob_0^{2k}}\right)\cdot\multc{\log\left(\abs{\circuit}^2(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)\]
-In particular, if $\prob_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)\cdot\multc{\log\left(\abs{\circuit}^2(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)$.
+\[O\left(\left(\size(\circuit) + \frac{\log{\frac{1}{\conf}}\cdot k\cdot \log{k} \cdot \depth(\circuit))}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot \prob_0^{2k}}\right)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)\]
+In particular, if $\prob_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)$.
 \end{Corollary}

 The proof for~\Cref{cor:approx-algo-const-p} can be seen in~\Cref{sec:proofs-approx-alg}.
@ -130,18 +130,18 @@ The restriction on $\gamma$ is satisfied by any \ti (where $\gamma=0$) as well a
 We also observe that (i) tuple presence is independent across blocks, so the corresponding probabilities (and hence $\prob_0$) are independent of the number of blocks, and (ii) \bis model uncertain attributes, so block size (and hence $\gamma$) is a function of the ``messiness'' of a dataset, rather than its size.
 Thus, we expect the corollary to hold in general.

-Finally, we address the $\multc{\log\left(\abs{\circuit}^2(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}$ term in the runtime. In Appendix\revision{Fill in ref later on}, we show the following:
+Finally, we address the $\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}$ term in the runtime. In Appendix\revision{Fill in ref later on}, we show the following:
 \begin{Lemma}
 \label{lem:val-ub}
 For any circuit $\circuit$ with $\degree(\circuit)=k$, we have
-\[\abs{\circuit}^2(1,\ldots, 1)\le 2^{O(k\size(\circuit))}.\]
+\[\abs{\circuit}(1,\ldots, 1)\le 2^{O(k\size(\circuit))}.\]
 Further, under the following conditions:
 \begin{enumerate}
 \item $\circuit$ is a tree,
 \item $\circuit$ is the output of a FAQ query from algorithm in~\cite{DBLP:conf/pods/KhamisNR16},
 \end{enumerate}
 we have
-\[\abs{\circuit}^2(1,\ldots, 1)\le  \size(\circuit)^{O(k)}.\]
+\[\abs{\circuit}(1,\ldots, 1)\le  \size(\circuit)^{O(k)}.\]
 \end{Lemma}

 Note that the above implies that with the assumption $\prob_0>0$ and $\gamma<1$ are absolute constants from Corollary~\Cref{cor:approx-algo-const-p}, then the runtime there simplies to $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)^2\cdot \log{\frac{1}{\conf}}\right)$ for general circuits $\circuit$ and to $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)$ for the case when $\circuit$ satisfies the special conditions in~\Cref{lem:val-ub}. In Appendix\revision{Fill in ref later on} we argue that these conditions are very general and encompass many interesting scenarios.