Done with pass on App D

This commit is contained in:
Atri Rudra 2021-09-18 23:39:12 -04:00
parent 17a82ec57b
commit 5f12c56cf7

View file

@ -25,7 +25,8 @@
\subsection{Representing Polynomials with Circuits}\label{app:subsec-rep-poly-lin-circ} \subsection{Representing Polynomials with Circuits}\label{app:subsec-rep-poly-lin-circ}
\subsubsection{Circuits for query plans} \subsubsection{Circuits for query plans}
\label{sec:circuits-formal} \label{sec:circuits-formal}
We now formalize circuits and the construction of circuits for SPJU queries. \AR{Since this comment is not showing up below, I do not follow why the last sentence of this para is true.}
We now formalize circuits and the construction of circuits for $\raPlus$ queries.
As mentioned earlier, we represent lineage polynomials as arithmetic circuits over $\mathbb N$-valued variables with $+$, $\times$. As mentioned earlier, we represent lineage polynomials as arithmetic circuits over $\mathbb N$-valued variables with $+$, $\times$.
A circuit for query $Q$ and \abbrNXPDB $\pxdb$ is a directed acyclic graph $\tuple{V_{Q,\pxdb}, E_{Q,\pxdb}, \phi_{Q,\pxdb}, \ell_{Q,\pxdb}}$ with vertices $V_{Q,\pxdb}$ and directed edges $E_{Q,\pxdb} \subset {V_{Q,\pxdb}}^2$. A circuit for query $Q$ and \abbrNXPDB $\pxdb$ is a directed acyclic graph $\tuple{V_{Q,\pxdb}, E_{Q,\pxdb}, \phi_{Q,\pxdb}, \ell_{Q,\pxdb}}$ with vertices $V_{Q,\pxdb}$ and directed edges $E_{Q,\pxdb} \subset {V_{Q,\pxdb}}^2$.
The sink function $\phi_{Q,\pxdb} : \udom^n \rightarrow V_{Q,\pxdb}$ is a partial function that maps the tuples of the $n$-ary\AR{In the main paper we have used $n$ to denote the number of input tuples so we need to use some other notation $n$ but since I do not know where all this change will need to be propagated so am not changing it for now.} relation $Q(\pxdb)$ to vertices. The sink function $\phi_{Q,\pxdb} : \udom^n \rightarrow V_{Q,\pxdb}$ is a partial function that maps the tuples of the $n$-ary\AR{In the main paper we have used $n$ to denote the number of input tuples so we need to use some other notation $n$ but since I do not know where all this change will need to be propagated so am not changing it for now.} relation $Q(\pxdb)$ to vertices.
@ -76,17 +77,17 @@ We define the circuit for a $\raPlus$ query $\query$ recursively by cases as fol
\If{$\query$ is $R$} \Comment{\textbf{Case 1}: $\query$ is a relation atom} \If{$\query$ is $R$} \Comment{\textbf{Case 1}: $\query$ is a relation atom}
\For{$t \in \dbbase.R$} \For{$t \in \dbbase.R$}
\State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{(v_t, R(t))\}$ \Comment{Allocate a fresh node $v_t$} \State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{(v_t, R(t))\}$ \Comment{Allocate a fresh node $v_t$}
\State $\phi(t) = v_t$ \State $\phi(t) \gets v_t$
\EndFor \EndFor
\ElsIf{$\query$ is $\sigma_\theta(\query')$} \Comment{\textbf{Case 2}: $\query$ is a Selection} \ElsIf{$\query$ is $\sigma_\theta(\query')$} \Comment{\textbf{Case 2}: $\query$ is a Selection}
\State $\tuple{V, E, \phi', \ell} = \abbrStepOne(\query', \dbbase, V, E, \ell)$ \State $\tuple{V, E, \phi', \ell} \gets \abbrStepOne(\query', \dbbase, V, E, \ell)$
\For{$t \in \domain(\phi')$} \For{$t \in \domain(\phi')$}
\State \textbf{if }$\theta(t)$ \State \textbf{if }$\theta(t)$
\textbf{ then } $\phi(t) = \phi'(t)$ \textbf{ then } $\phi(t) \gets \phi'(t)$
\textbf{ else } $\phi(t) = v_0$ \textbf{ else } $\phi(t) \gets v_0$
\EndFor \EndFor
\ElsIf{$\query$ is $\pi_{\vec{A}}(\query')$} \Comment{\textbf{Case 3}: $\query$ is a Projection} \ElsIf{$\query$ is $\pi_{\vec{A}}(\query')$} \Comment{\textbf{Case 3}: $\query$ is a Projection}
\State $\tuple{V, E, \phi', \ell} = \abbrStepOne(\query', \dbbase, V, E, \ell)$ \State $\tuple{V, E, \phi', \ell} \gets \abbrStepOne(\query', \dbbase, V, E, \ell)$
\For{$t \in \pi_{\vec{A}}(\query'(\dbbase))$} \For{$t \in \pi_{\vec{A}}(\query'(\dbbase))$}
\State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{(v_t, +)\}$\Comment{Allocate a fresh node $v_t$} \State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{(v_t, +)\}$\Comment{Allocate a fresh node $v_t$}
\State $\phi(t) \leftarrow v_t$ \State $\phi(t) \leftarrow v_t$
@ -94,26 +95,26 @@ We define the circuit for a $\raPlus$ query $\query$ recursively by cases as fol
\For{$t \in \query'(\dbbase)$} \For{$t \in \query'(\dbbase)$}
\State $E \leftarrow E \cup \{(\phi'(t), \phi(\pi_{\vec{A}}t))\}$ \State $E \leftarrow E \cup \{(\phi'(t), \phi(\pi_{\vec{A}}t))\}$
\EndFor \EndFor
\State Correct nodes with in-degrees $>2$ by appending an equivalent fan-in tree instead \State Correct nodes with in-degrees $>2$ by appending an equivalent fan-in two tree instead
\ElsIf{$\query$ is $\query_1 \cup \query_2$} \Comment{\textbf{Case 4}: $\query$ is a Bag Union} \ElsIf{$\query$ is $\query_1 \cup \query_2$} \Comment{\textbf{Case 4}: $\query$ is a Bag Union}
\State $\tuple{V, E, \phi_1, \ell} = \abbrStepOne(\query_1, \dbbase, V, E, \ell)$ \State $\tuple{V, E, \phi_1, \ell} \gets \abbrStepOne(\query_1, \dbbase, V, E, \ell)$
\State $\tuple{V, E, \phi_2, \ell} = \abbrStepOne(\query_2, \dbbase, V, E, \ell)$ \State $\tuple{V, E, \phi_2, \ell} \gets \abbrStepOne(\query_2, \dbbase, V, E, \ell)$
\State $\phi = \phi_1 \cup \phi_2$ \State $\phi \gets \phi_1 \cup \phi_2$
\For{$t \in \domain(\phi_1) \cap \domain(\phi_2)$} \For{$t \in \domain(\phi_1) \cap \domain(\phi_2)$}
\State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{(v_t, +)\}$ \Comment{Allocate a fresh node $v_t$} \State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{(v_t, +)\}$ \Comment{Allocate a fresh node $v_t$}
\State $\phi(t) = v_t$ \State $\phi(t) \gets v_t$
\State $E \leftarrow E \cup \{(\phi_1(t), v_t), (\phi_2(t), v_t)\}$ \State $E \leftarrow E \cup \{(\phi_1(t), v_t), (\phi_2(t), v_t)\}$
\EndFor \EndFor
\ElsIf{$\query$ is $\query_1 \bowtie \ldots \bowtie \query_n$} \Comment{\textbf{Case 5}: $\query$ is a n-ary Join} \ElsIf{$\query$ is $\query_1 \bowtie \ldots \bowtie \query_m$} \Comment{\textbf{Case 5}: $\query$ is a $m$-ary Join}
\For{$i \in [n]$} \For{$i \in [m]$}
\State $\tuple{V, E, \phi_i, \ell} = \abbrStepOne(\query_i, \dbbase, V, E, \ell)$ \State $\tuple{V, E, \phi_i, \ell} \gets \abbrStepOne(\query_i, \dbbase, V, E, \ell)$
\EndFor \EndFor
\For{$t \in \domain(\phi_1) \bowtie \ldots \bowtie \domain(\phi_k)$} \For{$t \in \domain(\phi_1) \bowtie \ldots \bowtie \domain(\phi_m)$}
\State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{(v_t, \times)\}$ \Comment{Allocate a fresh node $v_t$} \State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{(v_t, \times)\}$ \Comment{Allocate a fresh node $v_t$}
\State $\phi(t) = v_t$ \State $\phi(t) \gets v_t$
\State $E \leftarrow E \cup \comprehension{(\phi_i(\pi_{sch(\query_i(\dbbase))}(t)), v_t)}{i \in [n]}$ \State $E \leftarrow E \cup \comprehension{(\phi_i(\pi_{sch(\query_i(\dbbase))}(t)), v_t)}{i \in [n]}$
\EndFor \EndFor
\State Correct nodes with in-degrees $>2$ by appending an equivalent fan-in tree instead \State Correct nodes with in-degrees $>2$ by appending an equivalent fan-in two tree instead
\EndIf \EndIf
@ -179,12 +180,12 @@ There are $|{Q_1} \bowtie \ldots \bowtie {Q_k}|$ such vertices, so the corrected
\subsubsection{Bounding circuit depth} \subsubsection{Bounding circuit depth}
\label{sec:circuit-depth} \label{sec:circuit-depth}
We first show that the depth of the circuit (\depth; \Cref{def:size-depth}) is bounded by the size of the query. Denote by $|\query|$ the number of relational operators in query $\query$, which recall we assume as a constant. We first show that the depth of the circuit (\depth; \Cref{def:size-depth}) is bounded by the size of the query. Denote by $|\query|$ the number of relational operators in query $\query$, which recall we assume is a constant.
\begin{Proposition}[Circuit depth is bounded] \begin{Proposition}[Circuit depth is bounded]
\label{prop:circuit-depth} \label{prop:circuit-depth}
Let $\query$ be a relational query and $\dbbase$ be a \dbbaseName with $n$ tuples. There exists a (lineage) circuit $\circuit^*$ encoding the lineage of all tuples $\tup \in \query(\dbbase)$ for which Let $\query$ be a relational query and $\dbbase$ be a \dbbaseName with $n$ tuples. There exists a (lineage) circuit $\circuit^*$ encoding the lineage of all tuples $\tup \in \query(\dbbase)$ for which
$\depth(\circuit^*) \leq O(k|\query|\log(n))$ $\depth(\circuit^*) \leq O(k|\query|\log(n))$.
\end{Proposition} \end{Proposition}
\begin{proof} \begin{proof}
@ -204,21 +205,21 @@ For the projection case, observe that the fan-in is bounded by $|\query'(\dbbase
\begin{Lemma}\label{lem:circ-model-runtime} \begin{Lemma}\label{lem:circ-model-runtime}
\label{lem:circuits-model-runtime} \label{lem:circuits-model-runtime}
Given a \abbrNXPDB $\pxdb$ with \dbbaseName $\dbbase$, and query plan $Q$, the runtime of $Q$ over $\dbbase$ has the same or greater complexity as the size of the lineage of $Q(\pxdb)$. That is, we have $\abs{V_{Q,\pxdb}} \leq (k-1)\qruntime{Q, \dbbase}+1$, where $k$ is the maximal degree of any polynomial in $Q(\pxdb)$. Given a \abbrNXPDB $\pxdb$ with \dbbaseName $\dbbase$, and an $\raPlus$ query $Q$, the runtime of $Q$ over $\dbbase$ has the same or greater complexity as the size of the lineage of $Q(\pxdb)$. That is, we have $\abs{V_{Q,\pxdb}} \leq k\qruntime{Q, \dbbase}+1$, where $k\ge 1$ is the maximal degree of any polynomial in $Q(\pxdb)$.
\end{Lemma} \end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\noindent The proof is shown in \Cref{app:subsec-lem-lin-vs-qplan}. %\noindent The proof is shown in \Cref{app:subsec-lem-lin-vs-qplan}.
%\subsection{Proof for \Cref{lem:circuits-model-runtime}}\label{app:subsec-lem-lin-vs-qplan} %\subsection{Proof for \Cref{lem:circuits-model-runtime}}\label{app:subsec-lem-lin-vs-qplan}
\begin{proof} \begin{proof}
We prove by induction that $\abs{V_{Q,\pxdb} - \{v_0\}} \leq (k-1)\qruntime{Q, \dbbase}$. For clarity, we implicitly exclude $v_0$ in the proof below. We prove by induction that $\abs{V_{Q,\pxdb} \setminus \{v_0\}} \leq k\qruntime{Q, \dbbase}$. For clarity, we implicitly exclude $v_0$ in the proof below.
The base case is a base relation: $Q = R$ and is trivially true since $|V_{R,\pxdb}| = |D_\Omega.R|$. The base case is a base relation: $Q = R$ and is trivially true since $|V_{R,\pxdb}| = |\dbbase.R|=\qruntime{R, \dbbase}$ (note that here the degree $k=1$).
For the inductive step, we assume that we have circuits for subplans $Q_1, \ldots, Q_n$ such that $|V_{Q_i,\pxdb}| \leq (k_i-1)\qruntime{Q_i,\dbbase}$ where $k_i$ is the degree of $Q_i$. For the inductive step, we assume that we have circuits for subqueries $Q_1, \ldots, Q_m$ such that $|V_{Q_i,\pxdb}| \leq k_i\qruntime{Q_i,\dbbase}$ where $k_i$ is the degree of $Q_i$.
\caseheading{Selection} \caseheading{Selection}
Assume that $Q = \sigma_\theta(Q_1)$. Assume that $Q = \sigma_\theta(Q_1)$.
In the circuit for $Q$, $|V_{Q,\pxdb}| = |V_{Q_1,\dbbase}|$ vertices, so from the inductive assumption and $\qruntime{Q,\dbbase} = \qruntime{Q_1,\dbbase}$ by definition, we have $|V_{Q,\pxdb}| \leq (k-1) \qruntime{Q,\dbbase} $. In the circuit for $Q$, $|V_{Q,\pxdb}| = |V_{Q_1,\dbbase}|$ vertices, so from the inductive assumption and $\qruntime{Q,\dbbase} = \qruntime{Q_1,\dbbase}$ by definition, we have $|V_{Q,\pxdb}| \leq k \qruntime{Q,\dbbase} $.
% \AH{Technically, $\kElem$ is the degree of $\poly_1$, but I guess this is a moot point since one can argue that $\kElem$ is also the degree of $\poly$.} % \AH{Technically, $\kElem$ is the degree of $\poly_1$, but I guess this is a moot point since one can argue that $\kElem$ is also the degree of $\poly$.}
% OK: Correct % OK: Correct
@ -231,9 +232,9 @@ The circuit for $Q$ has at most $|V_{Q_1,\pxdb}|+|{Q_1}|$ vertices.
%\intertext{By \Cref{prop:queries-need-to-output-tuples} $\qruntime{Q_1,\dbbase} \geq |Q_1|$} %\intertext{By \Cref{prop:queries-need-to-output-tuples} $\qruntime{Q_1,\dbbase} \geq |Q_1|$}
%& \leq |V_{Q_1,\pxdb}| + 2 \qruntime{Q_1,\pxdb}\\ %& \leq |V_{Q_1,\pxdb}| + 2 \qruntime{Q_1,\pxdb}\\
\intertext{(From the inductive assumption)} \intertext{(From the inductive assumption)}
& \leq (k-1)\qruntime{Q_1,\dbbase} + \abs{Q_1}\\ & \leq k\qruntime{Q_1,\dbbase} + \abs{Q_1}\\
\intertext{(By definition of $\qruntime{Q,\dbbase}$)} \intertext{(By definition of $\qruntime{Q,\dbbase}$)}
& \le (k-1)\qruntime{Q,\dbbase}. & \le k\qruntime{Q,\dbbase}.
\end{align*} \end{align*}
\caseheading{Union} \caseheading{Union}
Assume that $Q = Q_1 \cup Q_2$. Assume that $Q = Q_1 \cup Q_2$.
@ -243,23 +244,23 @@ The circuit for $Q$ has $|V_{Q_1,\pxdb}|+|V_{Q_2,\pxdb}|+|{Q_1} \cap {Q_2}|$ ver
%\intertext{By \Cref{prop:queries-need-to-output-tuples} $\qruntime{Q_1,\dbbase} \geq |Q_1|$} %\intertext{By \Cref{prop:queries-need-to-output-tuples} $\qruntime{Q_1,\dbbase} \geq |Q_1|$}
%& \leq |V_{Q_1,\pxdb}|+|V_{Q_2,\pxdb}|+\qruntime{Q_1,\pxdb}+\qruntime{Q_2,\dbbase}|\\ %& \leq |V_{Q_1,\pxdb}|+|V_{Q_2,\pxdb}|+\qruntime{Q_1,\pxdb}+\qruntime{Q_2,\dbbase}|\\
\intertext{(From the inductive assumption)} \intertext{(From the inductive assumption)}
& \leq (k-1)(\qruntime{Q_1,\dbbase} + \qruntime{Q_2,\dbbase}) + (b_1 + b_2) & \leq k(\qruntime{Q_1,\dbbase} + \qruntime{Q_2,\dbbase}) + (|Q_1| + |Q_2|)
\intertext{(By definition of $\qruntime{Q,\dbbase}$)} \intertext{(By definition of $\qruntime{Q,\dbbase}$)}
& \leq (k-1)(\qruntime{Q,\dbbase}). & \leq k(\qruntime{Q,\dbbase}).
\end{align*} \end{align*}
\caseheading{$k$-ary Join} \caseheading{$m$-ary Join}
Assume that $Q = Q_1 \bowtie \ldots \bowtie Q_k$. Assume that $Q = Q_1 \bowtie \ldots \bowtie Q_m$. Note that $k=\sum_{i=1}^m k_i\ge m$.
The circuit for $Q$ has $|V_{Q_1,\pxdb}|+\ldots+|V_{Q_k,\pxdb}|+(k-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|$ vertices. The circuit for $Q$ has $|V_{Q_1,\pxdb}|+\ldots+|V_{Q_k,\pxdb}|+(m-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|$ vertices.
\begin{align*} \begin{align*}
|V_{Q,\pxdb}| & = |V_{Q_1,\pxdb}|+\ldots+|V_{Q_k,\pxdb}|+(k-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|\\ |V_{Q,\pxdb}| & = |V_{Q_1,\pxdb}|+\ldots+|V_{Q_k,\pxdb}|+(m-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|\\
\intertext{From the inductive assumption and noting $\forall i: k_i \leq k-1$} \intertext{From the inductive assumption and noting $\forall i: k_i \leq k$ and $m\le k$}
& \leq (k-1)\qruntime{Q_1,\dbbase}+\ldots+(k-1)\qruntime{Q_k,\dbbase}+\\ & \leq k\qruntime{Q_1,\dbbase}+\ldots+k\qruntime{Q_k,\dbbase}+\\
&\;\;\; (k-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|\\ &\;\;\; (m-1)|{Q_1} \bowtie \ldots \bowtie {Q_m}|\\
& \leq (k-1)(\qruntime{Q_1,\dbbase}+\ldots+\qruntime{Q_k,\dbbase}+\\ & \leq k(\qruntime{Q_1,\dbbase}+\ldots+\qruntime{Q_m,\dbbase}+\\
&\;\;\;|{Q_1} \bowtie \ldots \bowtie {Q_k}|)\\ &\;\;\;|{Q_1} \bowtie \ldots \bowtie {Q_m}|)\\
\intertext{(By definition of $\qruntime{Q,\dbbase}$)} \intertext{(By definition of $\qruntime{Q,\dbbase}$ and assumption on $\jointime{\cdot}$)}
& = (k-1)\qruntime{Q,\dbbase}. & \le k\qruntime{Q,\dbbase}.
\end{align*} \end{align*}
The property holds for all recursive queries, and the proof holds. The property holds for all recursive queries, and the proof holds.
@ -271,17 +272,17 @@ The property holds for all recursive queries, and the proof holds.
We next need to show that we can construct the circuit in time linear in the deterministic runtime. We next need to show that we can construct the circuit in time linear in the deterministic runtime.
\begin{Lemma}\label{lem:tlc-is-the-same-as-det} \begin{Lemma}\label{lem:tlc-is-the-same-as-det}
Given a query $\query$ over a \dbbaseName $\dbbase$, the runtime $\timeOf{\abbrStepOne}(\query,\dbbase,\circuit) \le O(\qruntime{\query, \dbbase})$ Given a query $\query$ over a \dbbaseName $\dbbase$ and the $\circuit^*$ output by \Cref{alg:lc}, the runtime $\timeOf{\abbrStepOne}(\query,\dbbase,\circuit^*) \le O(\qruntime{\query, \dbbase})$.
\end{Lemma} \end{Lemma}
\begin{proof} \begin{proof}
By analysis of \Cref{alg:lc}, invoked as $\abbrStepOne(\query, \dbbase, \emptyset, \emptyset, \emptyset)$. By analysis of \Cref{alg:lc}, invoked as $\circuit^*\gets\abbrStepOne(\query, \dbbase, \emptyset, \emptyset, \emptyset)$.
We assume that $V$, $E$, and $\ell$ are each stored in a mutable accumulator with $O(1)$ ammortized append. We assume that $V$, $E$, and $\ell$ are each stored in a mutable accumulator with $O(1)$ ammortized append.
We assume that $\phi$ is stored in a linked hashmap, with $O(1)$ insertions and retrievals, and $O(n)$ iteration over the domain of keys. We assume that $\phi$ is stored in a linked hashmap, with $O(1)$ insertions and retrievals, and $O(n)$ iteration over the domain of keys.
We assume that the n-ary join $\domain(\phi_1) \bowtie \ldots \domain(\phi_n)$ can be computed in time $\jointime{\domain(\phi_1), \ldots, \domain(\phi_n)}$ and that an intersection $\domain(\phi_1) \cap \domain(\phi_2)$ can be computed in time $O(|\domain(\phi_1)| + |\domain(\phi_2)|)$ (i.e., with a hash table). We assume that the n-ary join $\domain(\phi_1) \bowtie \ldots \bowtie\domain(\phi_n)$ can be computed in time $\jointime{\domain(\phi_1), \ldots, \domain(\phi_n)}$ and that an intersection $\domain(\phi_1) \cap \domain(\phi_2)$ can be computed in time $O(|\domain(\phi_1)| + |\domain(\phi_2)|)$ (i.e., with a hash table).
Before proving our runtime bound, we first observe that $\qruntime{\query, \db} \geq O(|\query(\db)|)$. Before proving our runtime bound, we first observe that $\qruntime{\query, \db} \geq \Omega(|\query(\db)|)$.
This is true by construction for the relation, projection, and union cases, by \Cref{def:join-cost} for joins, and by the observation that $|\sigma(R)| \leq |R|$. This is true by construction for the relation, projection, and union cases, by \Cref{def:join-cost} for joins, and by the observation that $|\sigma(R)| \leq |R|$.
We showthat $\qruntime{\query, \dbbase}$ is an upper-bound for the runtime of \Cref{alg:lc} by recursion. We showthat $\qruntime{\query, \dbbase}$ is an upper-bound for the runtime of \Cref{alg:lc} by recursion.
@ -291,30 +292,30 @@ For the remaining cases, we make the recursive assumption that for every subquer
\caseheading{Selection} \caseheading{Selection}
Selection requires a recursive call to \Cref{alg:lc}, which by the recursive assumption is bounded by $O(\qruntime{\query', \dbbase})$. Selection requires a recursive call to \Cref{alg:lc}, which by the recursive assumption is bounded by $O(\qruntime{\query', \dbbase})$.
\Cref{alg:lc} requires a loop over every element of $\query'(\dbbase)$. \Cref{alg:lc} requires a loop over every element of $\query'(\dbbase)$.
By the observation above that $\qruntime{\query, \db} \geq O(|\query(\db)|)$, this iteration is also bounded by $O(\qruntime{\query', \dbbase})$. By the observation above that $\qruntime{\query, \db} \geq \Omega(|\query(\db)|)$, this iteration is also bounded by $O(\qruntime{\query', \dbbase})$.
\caseheading{Projection} \caseheading{Projection}
Projection requires a recursive call to \Cref{alg:lc}, which by the recursive assumption is bounded by $O(\qruntime{\query', \dbbase})$, which in turn is a term in $\qruntime{\pi_{\vec{A}}\query', \dbbase}$. Projection requires a recursive call to \Cref{alg:lc}, which by the recursive assumption is bounded by $O(\qruntime{\query', \dbbase})$, which in turn is a term in $\qruntime{\pi_{\vec{A}}\query', \dbbase}$.
What remains is an iteration over $\pi_{\vec A}(\query(\dbbase))$ (lines 13--16), an iteration over $\query'(\dbbase)$ (lines 17--19), and the construction of a fan-in tree (line 20). What remains is an iteration over $\pi_{\vec A}(\query(\dbbase))$ (lines 13--16), an iteration over $\query'(\dbbase)$ (lines 17--19), and the construction of a fan-in tree (line 20).
The first iteration is $O(|\query(\dbbase)|) \leq O(\qruntime{\query, \dbbase})$. The first iteration is $O(|\query(\dbbase)|) \leq O(\qruntime{\query, \dbbase})$.
The second iteration and the construction of the bounded fan-in tree are both $O(|\query'(\dbbase)|) \leq O(\qruntime{\query', \dbbase}) \leq O(\qruntime{\query, \dbbase}) $, by the the observation above that $\qruntime{\query, \db} \geq O(|\query(\db)|)$. The second iteration and the construction of the bounded fan-in tree are both $O(|\query'(\dbbase)|) \leq O(\qruntime{\query', \dbbase}) \leq O(\qruntime{\query, \dbbase}) $, by the the observation above that $\qruntime{\query, \db} \geq \Omega(|\query(\db)|)$.
\caseheading{Bag Union} \caseheading{Bag Union}
As above, the recursive calls explicitly correspond to terms in the expansion of $O(\qruntime{\query_1 \cup \query_2, \dbbase})$. As above, the recursive calls explicitly correspond to terms in the expansion of $\qruntime{\query_1 \cup \query_2, \dbbase}$.
Initializing $\phi$ (line 24) can be accomplished in $O(\domain(\phi_1) + \domain(\phi_2)) = O(|\query_1(\dbbase)| + |\query_2(\dbbase)|) \leq O(\qruntime{\query_1, \dbbase} + \qruntime{\query_2, \dbbase})$. Initializing $\phi$ (line 24) can be accomplished in $O(\domain(\phi_1) + \domain(\phi_2)) = O(|\query_1(\dbbase)| + |\query_2(\dbbase)|) \leq O(\qruntime{\query_1, \dbbase} + \qruntime{\query_2, \dbbase})$.
The remainder requires computing $\query_1 \cup \query_2$ (line 25) and iterating over it (lines 25--29), which is $O(|\query_1| + |\query_2|)$ as noted above --- this directly corresponds to terms in $\qruntime{\query_1 \cup \query_2, \dbbase}$. The remainder requires computing $\query_1 \cup \query_2$ (line 25) and iterating over it (lines 25--29), which is $O(|\query_1| + |\query_2|)$ as noted above --- this directly corresponds to terms in $\qruntime{\query_1 \cup \query_2, \dbbase}$.
\caseheading{n-ary Join} \caseheading{$m$-ary Join}
As in the prior cases, recursive calls explicitly correspond to terms in our target runtime. As in the prior cases, recursive calls explicitly correspond to terms in our target runtime.
The remaining logic consists of computing $\domain(\phi_1) \bowtie \ldots \bowtie \domain(\phi_n)$, iterating over the results, and combining nodes in a fan-in tree. The remaining logic consists of computing $\domain(\phi_1) \bowtie \ldots \bowtie \domain(\phi_m)$, iterating over the results, and combining nodes in a fan-in tree.
Respectively, these are $\jointime{\domain(\phi_1), \ldots, \domain(\phi_n)}$, $O(|\query_1(\dbbase) \bowtie \ldots \bowtie \query_n(\dbbase)|) \leq \jointime{\domain(\phi_1), \ldots, \domain(\phi_n)}$ (\Cref{def:join-cost}), and $O(k|\query_1(\dbbase) \bowtie \ldots \bowtie \query_n(\dbbase)|)$. Respectively, these are $\jointime{\domain(\phi_1), \ldots, \domain(\phi_m)}$, $O(|\query_1(\dbbase) \bowtie \ldots \bowtie \query_m(\dbbase)|) \leq O(\jointime{\domain(\phi_1), \ldots, \domain(\phi_m)})$ (\Cref{def:join-cost}), and $O(m|\query_1(\dbbase) \bowtie \ldots \bowtie \query_m(\dbbase)|)$.
\qed \qed
\end{proof} \end{proof}
With \Cref{lem:circ-model-runtime,lem:tlc-is-the-same-as-det} and our upper bound results on \approxq, we now have all the pieces to argue that using our approximation algorithm, the expected multiplicities of an $\raPlus$ query can be computed in essentially the same runtime as deterministic query processing for the same query, proving claim (iv) of the Introduction. %With \Cref{lem:circ-model-runtime,lem:tlc-is-the-same-as-det} and our upper bound results on \approxq, we now have all the pieces to argue that using our approximation algorithm, the expected multiplicities of an $\raPlus$ query can be computed in essentially the same runtime as deterministic query processing for the same query, proving claim (iv) of the Introduction.
\section{Proof of \Cref{cor:cost-model}} \section{Proof of \Cref{cor:cost-model}}
\begin{proof} \begin{proof}