Done with my pass on Sec 5

2020-12-17 01:32:08 -05:00 · 2020-12-17 01:32:08 -05:00 · 19b6220ee6
parent f63cf9c2e5
commit 19b6220ee6
2 changed files with 92 additions and 70 deletions
--- a/approx_alg.tex
+++ b/approx_alg.tex
@ -183,7 +183,7 @@ We next present couple of corollaries of~\Cref{lem:approx-alg}.
 \label{cor:approx-algo-const-p}
 Let $\poly(\vct{X})$ be as in~\Cref{lem:approx-alg} and let $\gamma=\gamma(\etree)$. Further let it be the case that $p_i\ge p_0$ for all $i\in[\numvar]$. Then an estimate $\mathcal{E}$  of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ satisfying~\cref{eq:approx-algo-bound} can be computed in time
 \[O\left(\treesize(\etree) + \frac{\log{\frac{1}{\conf}}\cdot k\cdot \log{k} \cdot depth(\etree))}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot p_0^{2k}}\right)\]
-In particular, if $p_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\frac 1\eps\cdot\treesize(\etree)\cdot \log{\frac{1}{\conf}}\right)$. 
+In particular, if $p_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\frac 1{\eps^2}\cdot\treesize(\etree)\cdot \log{\frac{1}{\conf}}\right)$. 
 \end{Corollary}
 We note that the restriction on $\gamma$ is satisfied by TIDB (where $\gamma=0$) and for some BIDB benchmarks (see~\Cref{sec:experiments} for more on this claim).
 \AH{I am thinking that perhaps the terminology and presentation of~\Cref{sec:experiments} may need word-smithing to clearly illustrate the $\bi$ benchmarks satisfied--although the substance is already written there.}
--- a/circuits-model-runtime.tex
+++ b/circuits-model-runtime.tex
@ -5,78 +5,87 @@ In this section, we consider  couple of generalizations/corollaries of our resul
 \subsection{Lineage circuits}
 \label{sec:circuits}

-In~\Cref{sec:semnx-as-repr}, we switched to thinking of our query results as polynomials and indeed pretty much of the rest of the paper has focussed on thinking of our input as  a polynomial. In particular, starting with~\Cref{sec:expression-trees} with considered these polynomials to be represented an expression tree. However, these do not capture many of the compressed polynomial representations that we can get from query processing algorithms on bags including the recent work on worst-case optimal join algorithms~\cite{ngo-survey,skew}, factorized databases~\cite{factorized-db} and FAQ~\cite{DBLP:conf/pods/KhamisNR16}. Intuitively the main reason is that an expression tree does not allow for `storing' any intermediate results, which is crucial for these algorithms (and other query processing results as well).
+In~\Cref{sec:semnx-as-repr}, we switched to thinking of our query results as polynomials and indeed pretty much of the rest of the paper has focused on thinking of our input as  a polynomial. In particular, starting with~\Cref{sec:expression-trees} with considered these polynomials to be represented an expression tree. However, these do not capture many of the compressed polynomial representations that we can get from query processing algorithms on bags including the recent work on worst-case optimal join algorithms~\cite{ngo-survey,skew}, factorized databases~\cite{factorized-db} and FAQ~\cite{DBLP:conf/pods/KhamisNR16}. Intuitively the main reason is that an expression tree does not allow for `storing' any intermediate results, which is crucial for these algorithms (and other query processing results as well).

-In this section, we represent query polynomials via {\em arithmetic circits}~\cite{arith-complexity}, which are a standard way to represent polynomials over fields (and is standard in thhe field of algebraic complexity), though in our case we use them for polynomials over $\mathbb N$ in the obvious way. We present a formal treatment of {\em lineage circuit} but we present a quick overview here. A lineage circuit is represented by DAG, where each source node corresponds to either one of the input variables or a constant and the sinks correspond to the output. Every other node has at most two incoming edges (and is labeled as either an addition or a multiplication node) but there is no limit on the outdegree of such nodes. We note that is we restricted thhe outdegree to be one, then we get back expression trees.
+In this section, we represent query polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, which are a standard way to represent polynomials over fields (and is standard in the field of algebraic complexity), though in our case we use them for polynomials over $\mathbb N$ in the obvious way. We present a formal treatment of {\em lineage circuit} but we present a quick overview here. A lineage circuit is represented by DAG, where each source node corresponds to either one of the input variables or a constant and the sinks correspond to the output. Every other node has at most two incoming edges (and is labeled as either an addition or a multiplication node) but there is no limit on the outdegree of such nodes. We note that is we restricted the outdegree to be one, then we get back expression trees.


-In~\Cref{sec:results-circuits} we argue why our results from earlier sections also hold of lineage circuits (which we formally define in~\Cref{sec:circuits-formal}) and then argue why lineage circuits so indeed capture the notion of runtime of some well-known query processing algorithms in~\Cref{sec:circuit-runtime} (and we formaly define our cost model to capture the runtime of algorithms in~\Cref{sec:cost-model}).
+In~\Cref{sec:results-circuits} we argue why our results from earlier sections also hold of lineage circuits (which we formally define in~\Cref{sec:circuits-formal}) and then argue why lineage circuits so indeed capture the notion of runtime of some well-known query processing algorithms in~\Cref{sec:circuit-runtime} (and we formally define our cost model to capture the runtime of algorithms in~\Cref{sec:cost-model}).

 \subsubsection{Extending our results to lineage circuits}
 \label{sec:results-circuits}

 We first note that since expression trees are a special case of lineage circuits, all of our hardness results in~\Cref{sec:hard} are still valid for lineage circuits.

-For the approximation algorithm in~\Cref{sec:algo} we note that $\approxq$ (\Cref{alg:mon-sam}) works for lineage circuits as long as $\onepass$ and $\sampmon$ have the same guarantees (\Cref{lem:one-pass} and~\Cref{lem:onepass} respectively) hold for lineage circuits as well. It turns out that both $\onepass$ and $\sampmon$ work for lineage circuits as well simply because the only property these use for expression trees is that each node has two children and this is still valid of lineage trees (where for each non-source node the children correspond to the two nodes that have incoming edges to the given node). Put another way, our argument never used the fact that in an expression tree, each node has at most one parent.
+For the approximation algorithm in~\Cref{sec:algo} we note that $\approxq$ (\Cref{alg:mon-sam}) works for lineage circuits as long as $\onepass$ and $\sampmon$ have the same guarantees (\Cref{lem:one-pass} and~\Cref{lem:sample} respectively) hold for lineage circuits as well. It turns out that both $\onepass$ and $\sampmon$ work for lineage circuits as well simply because the only property these use for expression trees is that each node has two children and this is still valid of lineage trees (where for each non-source node the children correspond to the two nodes that have incoming edges to the given node). Put another way, our argument never used the fact that in an expression tree, each node has at most one parent.

-More specifically consider $\onepass$. The algorithm (as well as its analysis) basically uses the fact that one can compute the corresponding polynomial at all $1$s input with a simple recursive formula (\cref{eq:T-all-ones}) and that we can compute a probability distribution based on these weights (as in~\cref{eq:T-weights}). It can be verified that all the arguments go through if we replace $\etree_\lchild$ and $\etree_\lchild$ for expression tree $\etree$ withh the two incoming nodes of the sink for the given lineage circuit.
+More specifically consider $\onepass$. The algorithm (as well as its analysis) basically uses the fact that one can compute the corresponding polynomial at all $1$s input with a simple recursive formula (\cref{eq:T-all-ones}) and that we can compute a probability distribution based on these weights (as in~\cref{eq:T-weights}). It can be verified that all the arguments go through if we replace $\etree_\lchild$ and $\etree_\lchild$ for expression tree $\etree$ with the two incoming nodes of the sink for the given lineage circuit. Another way to look at this is we could `unroll' the recursion in $\onepass$ and think of the algorithm as doing the evaluation at each node bottom up from leaves to the root in the expression tree. For lineage circuits, we start from the source nodes and do the computation in the topological order till we reach the sink(s).
+
+The argument for $\sampmon$ is similar. Since we argued that $\onepass$ works as intended for lineage circuits since~\Cref{alg:mon-sam} only recurses on children of the current node in the expression tree and we can generalize it to lineage circuits by recursing to the two children of the current node in the lineage circuit. Alternatively, as we have already used in the proof of~\Cref{lem:sample}, we can think of the sampling algorithm sampling a sub-graph of the expression tree. For lineage circuits, we can think of $\sampmon$ as sampling the same sub-graph. Alternatively, one can implicitly expand the circuit lineage into a (larger but) equivalent expression tree. Since $\sampmon$ only explores one sub-graph during its run we can think of its run on a lineage circuit as being done on the implicit equivalent expression tree. Hence, all of the results on $\sampmon$  on expression trees carry over to lineage circuits.
+
+Thus, we have argued that~\Cref{lem:approx-alg} also holds if we use a lineage circuit instead of an expression tree as the input to our approximation algorithm.

 \subsubsection{The cost model}
 \label{sec:cost-model}
 Thus far, our analysis of the runtime of $\onepass$ has been in terms of the size of the compressed lineage polynomial. 
-We now show that this models the behavior of a deterministic database by proving that for any boolean conjunctive query, we can construct a compressed lineage polynomial with the same complexity as it would take to evaluate the query on a deterministic \emph{bag-relational} database.
+We now show that this models the behavior of a deterministic database by proving that for any union of conjunctive query, we can construct a compressed lineage polynomial with the same complexity as it would take to evaluate the query on a deterministic \emph{bag-relational} database.
 We adopt a minimalistic model of query evaluation focusing on the size of intermediate materialized states.
-\newcommand{\qruntime}[1]{\textbf{eval}(#1)}
+\newcommand{\qruntime}[1]{\textbf{cost}(#1)}
 \begin{align*}
 \qruntime{Q} & = |Q|\\
 \qruntime{\sigma Q} & = \qruntime{Q}\\
-\qruntime{\pi Q} & = \qruntime{Q}\\
-\qruntime{Q \cup Q'} & = \qruntime{Q} + \qruntime{Q'}\\
+\qruntime{\pi Q} & = \qruntime{Q} + \abs{Q}\\
+\qruntime{Q \cup Q'} & = \qruntime{Q} + \qruntime{Q'} +\abs{Q}+\abs{Q'}\\
 \qruntime{Q_1 \bowtie \ldots \bowtie Q_n} & = \qruntime{Q_1} + \ldots + \qruntime{Q_n} + |Q_1 \bowtie \ldots \bowtie Q_n|\\
 \end{align*}
 Under this model the query plan $Q(D)$ has runtime $O(\qruntime{Q(D)})$.
 Base relations assume that a full table scan is required; We model index scans by treating an index scan query $\sigma_\theta(R)$ as a single base relation.

-\begin{proposition}
-\label{prop:queries-need-to-output-tuples}
-The runtime $\qruntime{Q}$ of any query $Q$ is at least $|Q|$
-\end{proposition}
+It can be verified that the worst-case join algorithms~\cite{skew,ngo-survey} as well as query evaluation via factorized databases~\cite{factorized-db} (and work on FAQs~\cite{DBLP:conf/pods/KhamisNR16}) can be modeled as select-union-project-join queries (though these queries can be data dependent).\footnote{This claim can be verified by e.g. simply looking at the {\em Generic-Join} algorithm in~\cite{skew} and {\em factorize} algorithm in~\cite{factorized-db}.} Further, it can be verified that the above cost model on the corresponding SUPJ join queries correctly captures their runtime.
+\AR{Am not sure if we need to motivate the cost model more.} 
+%We now make a simple observation on the above cost model:
+%\begin{proposition}
+%\label{prop:queries-need-to-output-tuples}
+%The runtime $\qruntime{Q}$ of any query $Q$ is at least $|Q|$
+%\end{proposition}


 \subsubsection{Lineage circuit for query plans}
 \label{sec:circuits-formal}
-We represent lineage polynomials with arithmetic circuits over $\mathbb N$ with $+$, $\times$.  
-A circuit for relation $R$ is an acyclic graph $\tuple{V_R, E_R, \phi_R, \ell_R}$ with vertices $V_R$ and directed edges $E_R \subset V_R^2$.  
-A sink function $\phi_R : R \rightarrow V$ maps the tuples of the relation to vertices in the graph.  
-We require that $\phi_R$'s range be limited to sink vertices (i.e., vertices with out-degree 0).
-We call a sink vertex not in the range of $\phi_R$ a \emph{dead sink}.
-A function $\ell_R : V_R \rightarrow \{\;+,\times\;\}\cup \mathbb N \cup \vct X$ assigns a label to each node: Source nodes (i.e., vertices with in-degree 0) are labeled with constants or variables (i.e., $\mathbb N \cup \vct X$), while the remaining nodes are labeled with the symbol $+$ or $\times$.
+We now define a linear circuit more formally and also show how to construct a lineage circuit given a SUPJ query $Q$.
+
+As mentioned earlier, we represent lineage polynomials with arithmetic circuits over $\mathbb N$ with $+$, $\times$.  
+A circuit for query $Q$ is a directed acyclic graph $\tuple{V_Q, E_Q, \phi_Q, \ell_Q}$ with vertices $V_Q$ and directed edges $E_Q \subset V_Q^2$.  
+A sink function $\phi_Q : Q \rightarrow V_Q$ maps the tuples of the relation to vertices in the graph.  
+We require that $\phi_Q$'s range be limited to sink vertices (i.e., vertices with out-degree 0).
+%We call a sink vertex not in the range of $\phi_R$ a \emph{dead sink}.
+A function $\ell_Q : V_Q \rightarrow \{\;+,\times\;\}\cup \mathbb N \cup \vct X$ assigns a label to each node: Source nodes (i.e., vertices with in-degree 0) are labeled with constants or variables (i.e., $\mathbb N \cup \vct X$), while the remaining nodes are labeled with the symbol $+$ or $\times$.
 We require that vertices have an in-degree of at most two.

-\newcommand{\getpoly}[1]{\textbf{poly}(#1)}
-Each vertex $v \in V_R$ in the arithmetic circuit for $\tuple{V_R, E_R, \phi_R, \ell_R}$ encodes a polynomial, realized as 
-$$\getpoly(v) = \begin{cases}
-\sum_{v' : (v',v) \in E_R} \getpoly(v') & \textbf{if } \ell(v) = +\\
-\prod_{v' : (v',v) \in E_R} \getpoly(v') & \textbf{if } \ell(v) = \times\\
+\newcommand{\getpoly}[1]{\textbf{poly}\inparen{#1}}
+Each vertex $v \in V_Q$ in the arithmetic circuit for $\tuple{V_Q, E_Q, \phi_Q, \ell_Q}$ encodes a polynomial, realized as 
+$$\getpoly{v} = \begin{cases}
+\sum_{v' : (v',v) \in E_Q} \getpoly(v') & \textbf{if } \ell(v) = +\\
+\prod_{v' : (v',v) \in E_Q} \getpoly(v') & \textbf{if } \ell(v) = \times\\
 \ell(v) & \textbf{otherwise}
 \end{cases}$$

 \newcommand{\caseheading}[1]{\smallskip \noindent \textbf{#1}.~}
-We define the circuit for $R$ recursively by cases as follows.  In each case, let $\tuple{V_{Q_i}, E_{Q_i}, \phi_{Q_i}, \ell_{Q_i}}$ denote the circuit for subquery $Q_i$.
+We define the circuit for a select-union-project-join $Q$ recursively by cases as follows.  In each case, let $\tuple{V_{Q_i}, E_{Q_i}, \phi_{Q_i}, \ell_{Q_i}}$ denote the circuit for subquery $Q_i$.

 \caseheading{Base Relation}
 Let $Q$ be a base relation $R$.  We define one node for each tuple.  Formally, let $V_Q = \comprehension{v_t}{t\in R}$, let $\phi_Q(t) = v_t$, let $\ell_Q(v_t) = R(t)$, and let $E_Q = \emptyset$.
 This circuit has $|R|$ vertices.

 \caseheading{Selection}
-Let $Q = \sigma_\theta Q_1$.
-We re-use the circuit for $Q_1$, but define a new distinguished node $v_0$ with label $0$ and make it the sink node for all tuples that fail the selection predicate.  
-Formally, let $V_Q = V_{Q_1} \cup {v_0}$, let $\ell_Q(v_0) = 0$, and let $\ell_Q(v) = \ell_{Q_1}(v)$ for any $v \in V_{Q_1}$.  Let $E_Q = E_{Q_1}$, and define
-$$\phi_Q = \begin{cases}
-\phi_{Q_1} & \textbf{if } \theta(t)\\
-v_0 & \textbf{otherwise}
-\end{cases}$$
-This circuit has $|V_{Q_1}|+1$ vertices.
+Let $Q = \sigma_\theta \inparen{Q_1}$.
+We re-use the circuit for $Q_1$. %, but define a new distinguished node $v_0$ with label $0$ and make it the sink node for all tuples that fail the selection predicate.  
+Formally, let $V_Q = V_{Q_1}$, let $\ell_Q(v_0) = 0$, and let $\ell_Q(v) = \ell_{Q_1}(v)$ for any $v \in V_{Q_1}$.  Let $E_Q = E_{Q_1}$, and define
+$$\phi_Q(t) =
+\phi_{Q_1}(t)  \text{for } t \text{ s.t.} \theta(t).$$
+%v_0 & \textbf{otherwise}
+%\end{cases}$$
+This circuit has $|V_{Q_1}|$ vertices.

 \caseheading{Projection}
 Let $Q = \pi_{\vct A} {Q_1}$.
@ -84,7 +93,7 @@ We extend the circuit for ${Q_1}$ with a new set of sum vertices (i.e., vertices
 Naively, let $V_Q = V_{Q_1} \cup \comprehension{v_t}{t \in \pi_{\vct A} {Q_1}}$, let $\phi_Q(t) = v_t$, and let $\ell_Q(v_t) = +$.  Finally let 
 $$E_Q = E_{Q_1} \cup \comprehension{(\phi_{Q_1}(t'), v_t)}{t = \pi_{\vct A} t', t' \in {Q_1}, t \in \pi_{\vct A} {Q_1}}$$
 This formulation will produce vertices with an in-degree greater than two, a problem that we correct by replacing every vertex with an in-degree over two by an equivalent fan-in tree.  The resulting structure has at most $|{Q_1}|-1$ additional vertices.
-The corrected circuit thus has at most $|V_{Q_1}|+|\pi_{\vct A} {Q_1}| + |{Q_1}|-1$ vertices.
+The corrected circuit thus has at most $|V_{Q_1}|+ |{Q_1}|-|\pi_{\vct A} {Q_1}|$ vertices.

 \caseheading{Union}
 Let $Q = {Q_1} \cup {Q_2}$.
@ -99,47 +108,49 @@ v_t & \textbf{if } t \in {Q_1} \cap {Q_1}\\
 \end{cases}$$
 This circuit has $|V_{Q_1}|+|V_{Q_2}|+|{Q_1} \cap {Q_2}|$ vertices.

-\caseheading{N-ary Join}
-Let $Q = {Q_1} \bowtie \ldots \bowtie {Q_n}$.
+\caseheading{$k$-ary Join}
+Let $Q = {Q_1} \bowtie \ldots \bowtie {Q_k}$.
 We merge graphs and produce a multiplication vertex for all tuples resulting from the join
-Naively, let $V_Q = V_{Q_1} \cup \ldots \cup V_{Q_n} \cup \comprehension{v_t}{t \in {Q_1} \bowtie \ldots \bowtie {Q_n}}$, let 
+Naively, let $V_Q = V_{Q_1} \cup \ldots \cup V_{Q_k} \cup \comprehension{v_t}{t \in {Q_1} \bowtie \ldots \bowtie {Q_k}}$, let 
 {\small
 \begin{multline*}
-E_Q = E_{Q_1} \cup \ldots \cup E_{Q_n} \cup \\
-\comprehension{(\phi_{Q_1}(\pi_{\sch({Q_1})}t), v_t), \ldots, (\phi_{Q_n}(\pi_{\sch({Q_n})}t), v_t)}{t \in {Q_1} \bowtie \ldots \bowtie {Q_n}}
+E_Q = E_{Q_1} \cup \ldots \cup E_{Q_k} \cup \\
+\comprehension{(\phi_{Q_1}(\pi_{\sch({Q_1})}t), v_t), \ldots, (\phi_{Q_k}(\pi_{\sch({Q_k})}t), v_t)}{t \in {Q_1} \bowtie \ldots \bowtie {Q_k}}
 \end{multline*}
 }
 Let $\ell_Q(v_t) = \times$, and let $\phi_Q(t) = v_t$
-As in projection, newly created vertices will have an in-degree of $n$, and a fan-in tree is required.  
-There are $|{Q_1} \bowtie \ldots \bowtie {Q_n}|$ such vertices, so the corrected circuit has $|V_{Q_1}|+\ldots+|V_{Q_n}|+(n-1)|{Q_1} \bowtie \ldots \bowtie {Q_n}|$ vertices.
+As in projection, newly created vertices will have an in-degree of $k$, and a fan-in tree is required.  
+There are $|{Q_1} \bowtie \ldots \bowtie {Q_k}|$ such vertices, so the corrected circuit has $|V_{Q_1}|+\ldots+|V_{Q_k}|+(k-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|$ vertices.

 \subsubsection{Circuit size vs. runtime}
 \label{sec:circuit-runtime}

+We now connect the size of a lineage circuit (where the size of a lineage circuit is the number of vertices in the corresponding DAG\footnote{since each node has indegree at most two, this also is the same up to constants to counting the number of edges in the DAG.}) for a given SUPJ query $Q$ to its $\qruntime{Q}$, this formally showing that size of lineage circuit is asymptotically no worse the corresponding runtime of a large class of deterministic query processing algorithm.
+
 \begin{lemma}
 \label{lem:circuits-model-runtime}
-The runtime of any query plan $Q$ has the same or better complexity as the lineage of the corresponding query result for any specific database instance.  That is, for any query plan $Q$ there exists some constants $a$, $b$ such that $|V_Q| \leq a\qruntime{Q}+b$
+The runtime of any query plan $Q$ has the same or better complexity as the lineage of the corresponding query result for any specific database instance.  That is, for any query plan $Q$ we have $|V_Q| \leq (k-1)\qruntime{Q}$, where $k$ is the degree of query polynomial corresponding to $Q$. 
 \end{lemma}
 \begin{proof}
-Proof by recursion.  The base case is a base relation: $Q = R$ and is trivially true since $|V_R| = |R|$.
+Proof by induction.  The base case is a base relation: $Q = R$ and is trivially true since $|V_R| = |R|$.
 For the inductive step, we assume that we have circuits for subplans $Q_1, \ldots, Q_n$ such that $|V_{Q_i}| \leq a_i\qruntime{Q_i} + b_i$.

 \caseheading{Selection}
 Assume that $Q = \sigma_\theta(Q_1)$.
-In the circuit for $Q$, $|V_Q| = |V_{Q_1}|+1$ vertices, so from the inductive assumption and $\qruntime{Q} = \qruntime{Q_1}$ by definition, we have $|V_Q| \leq a_i \qruntime{Q} + (b_i + 1)$.
+In the circuit for $Q$, $|V_Q| = |V_{Q_1}|$ vertices, so from the inductive assumption and $\qruntime{Q} = \qruntime{Q_1}$ by definition, we have $|V_Q| \leq (k-1) \qruntime{Q} $.

 \caseheading{Projection}
 Assume that $Q = \pi_{\vct A}(Q_1)$.
-The circuit for $Q$ has at most $|V_{Q_1}|+|\pi_{\vct A} {Q_1}| + |{Q_1}|-1$ vertices.
+The circuit for $Q$ has at most $|V_{Q_1}|+|\pi_{\vct A} {Q_1}| + |{Q_1}|-\abs{\pi_AQ}$ vertices.
 \begin{align*}
-|V_{Q}| & \leq |V_{Q_1}|+|\pi_{\vct A} {Q_1}| + |{Q_1}|-1\\
-& \leq |V_{Q_1}| + 2|Q_1|\\
-\intertext{By \Cref{prop:queries-need-to-output-tuples} $\qruntime{Q_1} \geq |Q_1|$}
-& \leq |V_{Q_1}| + 2 \qruntime{Q_1}\\
-\intertext{From the inductive assumption}
-& \leq a_1\qruntime{Q_1} + b_1 + 2 \qruntime{Q_1}\\
-\intertext{By definition, and compacting}
-& = (a_1+2)\qruntime{Q} + b_1\\
+|V_{Q}| & \leq |V_{Q_1}|+Q_1-|\pi_{\vct A} {Q_1}| \\
+& \leq |V_{Q_1}| + |Q_1|\\
+%\intertext{By \Cref{prop:queries-need-to-output-tuples} $\qruntime{Q_1} \geq |Q_1|$}
+%& \leq |V_{Q_1}| + 2 \qruntime{Q_1}\\
+\intertext{(From the inductive assumption)}
+& \leq (k-1)\qruntime{Q_1} + \abs{Q_1}\\
+\intertext{(By definition  of $\qruntime{Q}$)}
+& \le (k-1)\qruntime{Q}.
 \end{align*}

 \caseheading{Union}
@ -147,31 +158,42 @@ Assume that $Q = Q_1 \cup Q_2$.
 The circuit for $Q$ has $|V_{Q_1}|+|V_{Q_2}|+|{Q_1} \cap {Q_2}|$ vertices.
 \begin{align*}
 |V_{Q}| & \leq |V_{Q_1}|+|V_{Q_2}|+|{Q_1}|+|{Q_2}|\\
-\intertext{By \Cref{prop:queries-need-to-output-tuples} $\qruntime{Q_1} \geq |Q_1|$}
-& \leq |V_{Q_1}|+|V_{Q_2}|+\qruntime{Q_1}+\qruntime{Q_2}|\\
-\intertext{From the inductive assumption and compacting}
-& \leq (a_1+a_2+2)(\qruntime{Q_1} + \qruntime{Q_2}) + (b_1 + b_2)
-\intertext{By definition}
-& \leq (a_1+a_2+2)(\qruntime{Q}) + (b_1 + b_2)
+%\intertext{By \Cref{prop:queries-need-to-output-tuples} $\qruntime{Q_1} \geq |Q_1|$}
+%& \leq |V_{Q_1}|+|V_{Q_2}|+\qruntime{Q_1}+\qruntime{Q_2}|\\
+\intertext{(From the inductive assumption)}
+& \leq (k-1)(\qruntime{Q_1} + \qruntime{Q_2}) + (b_1 + b_2)
+\intertext{(By definition of $\qruntime{Q}$)}
+& \leq (k-1)(\qruntime{Q}).
 \end{align*}

-\caseheading{N-ary Join}
-Assume that $Q = Q_1 \bowtie \ldots \bowtie Q_n$.
-The circuit for $Q$ has $|V_{Q_1}|+\ldots+|V_{Q_n}|+(n-1)|{Q_1} \bowtie \ldots \bowtie {Q_n}|$ vertices.
+\caseheading{$k$-ary Join}
+Assume that $Q = Q_1 \bowtie \ldots \bowtie Q_k$.
+The circuit for $Q$ has $|V_{Q_1}|+\ldots+|V_{Q_k}|+(k-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|$ vertices.
 \begin{align*}
-|V_{Q}| & = |V_{Q_1}|+\ldots+|V_{Q_n}|+(n-1)|{Q_1} \bowtie \ldots \bowtie {Q_n}|\\
+|V_{Q}| & = |V_{Q_1}|+\ldots+|V_{Q_k}|+(k-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|\\
 \intertext{From the inductive assumption}
-& \leq a_1\qruntime{Q_1}+b_1+\ldots+a_n\qruntime{Q_n}+b_n+\\
-&\;\;\; (n-1)|{Q_1} \bowtie \ldots \bowtie {Q_n}|\\
-& \leq (a_1+\ldots+a_n+n-1)(\qruntime{Q_1}+\ldots+\qruntime{Q_n}+\\
-&\;\;\;|{Q_1} \bowtie \ldots \bowtie {Q_n}|)+b_1+\ldots+b_n\\
-\intertext{By definition}
-& = (a_1+\ldots+a_n+n-1)\qruntime{Q}+(b_1+\ldots+b_n)\\
+& \leq (k-1)\qruntime{Q_1}+\ldots+(k-1)\qruntime{Q_k}+\\
+&\;\;\; (k-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|\\
+& \leq (k-1)(\qruntime{Q_1}+\ldots+\qruntime{Q_k}+\\
+&\;\;\;|{Q_1} \bowtie \ldots \bowtie {Q_k}|)\\
+\intertext{(By definition of $\qruntime{Q}$)}
+& = (k-1)\qruntime{Q}.
 \end{align*}

 The property holds for all recursive queries, and the proof holds.

+\end{proof}
+\qed
+
+We now have all the pieces to argue the following, which formally states that our approximation algorithm implies that approximating the expected multiplicities of  SUPJ query can be done in essentially the same runtime as deterministic query processing of the same query:
+\begin{Corollary}
+Given an SUPJ query $Q$ for a TIDB, we can present $(1\pm\eps)$ approximation to the expectation of each output tuple with probability at least $1-\delta$ in time $O_k\left(\frac 1{\eps^2}\cdot\qruntime{Q}\cdot \log{\frac{1}{\conf}}\cdot \log(n)\right)$.
+\end{Corollary}
+\begin{proof}
+This follows from~\Cref{lem:circuits-model-runtime} and (the lineage circuit counterpart-- see~\Cref{sec:results-circuits} of)~\Cref{cor:approx-algo-const-p} (where the latter is used with $\delta$ being substituted\footnote{Recall that~\Cref{cor:approx-algo-const-p} is stated for a single output tuple so to get the required guarantee for all (at most $n^k$) output tuples of $Q$ we get at most $\frac \delta{n^k}$ probability of failure for each output tuple and then just a union bound over all output tuples. } with $\frac \delta{n^k}$).
 \end{proof}

 \subsection{Higher moments}
 \label{sec:momemts}
+
+We make a simple observation to conclude the presentation of our results. So far we have presented algorithms that given $\poly$, we approximate its expectation. In addition, we would e.g. prove bounds of probability of the multiplicity being at least $1$. While we do not have a good approximation algorithm for this problem, we can make some progress as follows. We first note that for any positive integer $m$ we can compute the expectation $\poly^m$ (since this only changes the degree of the corresponding lineage polynomial by a factor of $m$). In other words, we can compute the $m$-th moment of the multiplicities as well. This allows us e.g. to use Chebyschev inequality or other high moment based probability bounds on the events we might be interested in. However, we leave the question of coming up with better approximation algorithms for proving probability bounds for future work.