Done with pass on App C

2021-09-18 15:55:52 -04:00 · 2021-09-18 15:55:52 -04:00 · 9433e27392
parent c6dade9953
commit 9433e27392
1 changed files with 16 additions and 14 deletions
--- a/app_approx-alg-analysis.tex
+++ b/app_approx-alg-analysis.tex
@ -118,7 +118,7 @@ Applying this bound in the runtime bound in \Cref{lem:approx-alg} gives the firs
 %\paragraph{Sufficient condition for $\abs{\circuit}(1,\ldots, 1)$ to be size $O(N)$}
 %For our runtime results to be relevant, it must be the case that the sum of the coefficients computed by \onepass is indeed size $O(N)$ since there are $O(\log{N})$ bits in the RAM model where $N$ is the size of the input.  The size of the input here is \size(\circuit).  We show that when \size$(\circuit_\linput) = N_\linput$, \size$(\circuit_\rinput) = N_\rinput$, where $N_\linput + N_\rinput \leq N$, this is indeed the case.

-We will prove \Cref{lem:val-ub} by considering the three cases separately. We start by considering the case when $\circuit$ is a tree:
+We will prove \Cref{lem:val-ub} by considering the two cases separately. We start by considering the case when $\circuit$ is a tree:
 \begin{Lemma}
 \label{lem:C-ub-tree}
 Let $\circuit$ be a tree (i.e. the sub-circuits corresponding to two children of a node in $\circuit$ are completely disjoint). Then we have
@ -159,33 +159,35 @@ The upper bound in \Cref{lem:val-ub} for the general case is a simple variant of
 \label{lem:C-ub-gen}
 Let $\circuit$ be a (general) circuit. % tree (i.e. the sub-circuits corresponding to two children of a node in $\circuit$ are completely disjoint). 
 Then we have
-\[\abs{\circuit}(1,\dots,1)\le 2^{2^{\degree(\circuit)}\cdot \size(\circuit)}.\]
+\[\abs{\circuit}(1,\dots,1)\le 2^{2^{\degree(\circuit)}\cdot \depth(\circuit)}.\]
 \end{Lemma}
 \begin{proof}[Proof Sketch of \Cref{lem:C-ub-gen}]
-We use the same notation as in the proof of \Cref{lem:C-ub-tree}. We will prove by induction on $\depth(\circuit)$ that $\abs{\circuit}(1,\ldots, 1) \leq 2^{2^k\cdot N }$. The base case argument is similar to that in the proof of \Cref{lem:C-ub-tree}. In the inductive case we have that $N_\linput,N_\rinput\le N-1$.
+We use the same notation as in the proof of \Cref{lem:C-ub-tree} and further define $d=\depth(\circuit)$. We will prove by induction on $\depth(\circuit)$ that $\abs{\circuit}(1,\ldots, 1) \leq 2^{2^k\cdot d }$. The base case argument is similar to that in the proof of \Cref{lem:C-ub-tree}. In the inductive case we have that $d_\linput,d_\rinput\le d-1$.

 For the case when the sink node is $\times$, we get that
 \begin{align*}
 \abs{\circuit}(1,\ldots, 1) &= \abs{\circuit_\linput}(1,\ldots, 1)\circmult \abs{\circuit_\rinput}(1,\ldots, 1) \\
-&\leq {2^{2^{k_\linput}\cdot N_\linput}} \circmult {2^{2^{k_\rinput}\cdot N_\rinput}}\\
- &\leq 2^{2\cdot 2^{k-1}\cdot (N-1)}\\
- &\leq 2^{2^k N}.
+&\leq {2^{2^{k_\linput}\cdot d_\linput}} \circmult {2^{2^{k_\rinput}\cdot d_\rinput}}\\
+ &\leq 2^{2\cdot 2^{k-1}\cdot (d-1)}\\
+ &\leq 2^{2^k d}.
 \end{align*}
-In the above the first inequality follows from inductive hypothesis while the second inequality follows from the fact that $k_\linput,k_\rinput\le k-1$ and $N_\linput, N_\rinput\le N-1$, where we substitute the upperbound into every respective term.
+In the above the first inequality follows from inductive hypothesis while the second inequality follows from the fact that $k_\linput,k_\rinput\le k-1$ and $d_\linput, d_\rinput\le d-1$, where we substitute the upperbound into every respective term.
 %$k_\linput+k_\rinput=k$ (and hence $\max(k_\linput,k_\rinput)\le k$) as well as the fact that $k\ge 0$.

 Now consider the case when the sink node is $+$, we get that
 \begin{align*}
 \abs{\circuit}(1,\ldots, 1) &= \abs{\circuit_\linput}(1,\ldots, 1) \circplus \abs{\circuit_\rinput}(1,\ldots, 1) \\
-&\leq 2^{2^{k_\linput}\cdot N_\linput} + 2^{2^{k_\rinput}\cdot N_\rinput}\\
-&\leq 2\cdot {2^{2^k(N-1)} } \\
-&\leq 2^{2^kN}.
+&\leq 2^{2^{k_\linput}\cdot d_\linput} + 2^{2^{k_\rinput}\cdot d_\rinput}\\
+&\leq 2\cdot {2^{2^k(d-1)} } \\
+&\leq 2^{2^kd}.
 \end{align*}
-In the above the first inequality follows from the inductive hypothesis while the second inequality follows from the facts that $k_\linput,k_\rinput\le k$ and $N_\linput,N_\rinput\le N-1$. The final inequality follows from the fact that $k\ge 0$.
+In the above the first inequality follows from the inductive hypothesis while the second inequality follows from the facts that $k_\linput,k_\rinput\le k$ and $d_\linput,d_\rinput\le d-1$. The final inequality follows from the fact that $k\ge 0$.
 \qed
 \end{proof}

-\AH{I really didn't follow the rest of this...}
-Finally, we consider the case when $\circuit$ encodes the run of the algorithm from~\cite{DBLP:conf/pods/KhamisNR16} on an FAQ query. We cannot handle the full generality of an FAQ query but we can handle an FAQ query that has a ``core'' join query on $k$ relations and then a subset of the $k$ attributes are ``summed'' out (e.g. the sum could be because of projecting out a subset of attributes from the join query). While the algorithm~\cite{DBLP:conf/pods/KhamisNR16} essentially figures out when to `push in' the sums, in our case since we only care about $\abs{\circuit}(1,\dots,1)$.  We will consider the obvious circuit that computes the ``inner join'' using a worst-case optimal join (WCOJ) algorithm like~\cite{NPRR} and then adding in the addition gates. The basic idea is very simple: we will argue that the there are at most $\size(\circuit)^k$ tuples in the join output (each with having a value of $1$ in $\abs{\circuit}(1,\dots,1)$). Then the largest value we can see in $\abs{\circuit}(1,\dots,1)$ is by summing up these at most $\size(\circuit)^k$ values of $1$. Note that this immediately implies the claimed bound in \Cref{lem:val-ub}.
+%\AH{I really didn't follow the rest of this...}
+%\AR{I don't think the rest of the stuff is needed anymore so am commenting the next two paras out.

-We now sketch the argument for the claim about the join query above. First, we note that the computation of a WCOJ algorithm like~\cite{NPRR} can be expressed as a circuit with {\em multiple} sinks (one for each output tuple). Note that annotation corresponding to $\mathbf{t}$ in $\circuit$ is the polynomial $\prod_{e\in E} R(\pi_e(\mathbf{t}))$ (where $E$ indexes the set of relations). It is easy to see that in this case the value of  $\mathbf{t}$ in $\abs{\circuit}(1,\dots,1)$ will be $1$ (by multiplying $1$ $k$ times). The claim on the number of output tuples follow from the trivial bound of multiplying the input size bound (each relation has at most $n\le \size(\circuit)$ tuples and hence we get an overall bound of $n^k\le\size(\circuit)^k$. Note that we did not really use anything about the WCOJ algorithm except for the fact that $\circuit$ for the join part only is built only of multiplication gates. In fact, we do not need the better WCOJ join size bounds either (since we used the trivial $n^k$ bound). As a final remark, we note that we can build the circuit for the join part by running say the algorithm from~\cite{DBLP:conf/pods/KhamisNR16} on an FAQ query that just has the join query but each tuple is annotated with the corresponding variable $X_i$ (i.e. the semi-ring for the FAQ query is $\mathbb{N}[\mathbf{X}]$). 
+%Finally, we consider the case when $\circuit$ encodes the run of the algorithm from~\cite{DBLP:conf/pods/KhamisNR16} on an FAQ query. We cannot handle the full generality of an FAQ query but we can handle an FAQ query that has a ``core'' join query on $k$ relations and then a subset of the $k$ attributes are ``summed'' out (e.g. the sum could be because of projecting out a subset of attributes from the join query). While the algorithm~\cite{DBLP:conf/pods/KhamisNR16} essentially figures out when to `push in' the sums, in our case since we only care about $\abs{\circuit}(1,\dots,1)$.  We will consider the obvious circuit that computes the ``inner join'' using a worst-case optimal join (WCOJ) algorithm like~\cite{NPRR} and then adding in the addition gates. The basic idea is very simple: we will argue that the there are at most $\size(\circuit)^k$ tuples in the join output (each with having a value of $1$ in $\abs{\circuit}(1,\dots,1)$). Then the largest value we can see in $\abs{\circuit}(1,\dots,1)$ is by summing up these at most $\size(\circuit)^k$ values of $1$. Note that this immediately implies the claimed bound in \Cref{lem:val-ub}.
+
+%We now sketch the argument for the claim about the join query above. First, we note that the computation of a WCOJ algorithm like~\cite{NPRR} can be expressed as a circuit with {\em multiple} sinks (one for each output tuple). Note that annotation corresponding to $\mathbf{t}$ in $\circuit$ is the polynomial $\prod_{e\in E} R(\pi_e(\mathbf{t}))$ (where $E$ indexes the set of relations). It is easy to see that in this case the value of  $\mathbf{t}$ in $\abs{\circuit}(1,\dots,1)$ will be $1$ (by multiplying $1$ $k$ times). The claim on the number of output tuples follow from the trivial bound of multiplying the input size bound (each relation has at most $n\le \size(\circuit)$ tuples and hence we get an overall bound of $n^k\le\size(\circuit)^k$. Note that we did not really use anything about the WCOJ algorithm except for the fact that $\circuit$ for the join part only is built only of multiplication gates. In fact, we do not need the better WCOJ join size bounds either (since we used the trivial $n^k$ bound). As a final remark, we note that we can build the circuit for the join part by running say the algorithm from~\cite{DBLP:conf/pods/KhamisNR16} on an FAQ query that just has the join query but each tuple is annotated with the corresponding variable $X_i$ (i.e. the semi-ring for the FAQ query is $\mathbb{N}[\mathbf{X}]$).