%In this section, we consider generalizations/corollaries of our results.
%In particular, in~\Cref{sec:circuits} we first consider the case when the compressed polynomial is represented by a Directed Acyclic Graph (DAG) instead of an expression tree (\Cref{def:express-tree}) and observe that our results carry over.
%Then,
We formalize our claim from \Cref{sec:intro} that a linear algorithm for our problem implies that PDB queries can be answered in the same runtime as deterministic queries under reasonable assumptions.
%In~\Cref{sec:results-circuits} we argue why results from earlier sections also hold for circuits and then
We argue why circuits capture the runtime of well-known query processing algorithms in~\Cref{sec:circuit-runtime} (\Cref{sec:cost-model} formalizes the query cost model).
%\subsubsection{Extending our results to circuits}
%\label{sec:results-circuits}
%
%We first note that since expression trees are a special case of linear circuits, all of our hardness results for in~\Cref{sec:hard} are still valid for the latter.
%Observe that \textsc{Approx}\textsc{imate}$\rpoly$ (\Cref{alg:mon-sam} in \Cref{sec:algo}) works for circuits as long as the same guarantees on $\onepass$ and $\sampmon$ (\Cref{lem:one-pass} and \Cref{lem:sample} respectively) hold for circuits as well.
%It turns out that this is the case, simply because both algorithms rely on only one property of expression trees: that each node has two children;
%Analogously in a circuit, each node has a maximum in-degree of two.
%Put another way, our argument never used the fact that in an expression tree, each node has at most one parent.
%%
%For a more detailed discussion of why~\Cref{lem:approx-alg} holds for a circuit, see~\Cref{app:lineage-circuit-ext}.
So far our analysis of $\approxq$ has been in terms of the size of the compressed lineage polynomial.
We now show that this model corresponds to the behavior of a deterministic database by proving that for any union of conjunctive queries, we can construct a compressed lineage polynomial for a query $Q$ and \bi$\pxdb$ of size (and in runtime) linear in the runtime of a general class of query processing algorithms for the same query $Q$ on a deterministic database $\db$.
We assume a linear relationship between input sizes $|\pxdb|$ and $|\db|$ (i.e., $\exists c, \db\in\pxdb$ s.t. $\abs{\pxdb}\leq c \cdot\abs{\db})$).
In practice there is often a limited number of alternatives for each block (e.g., which of five conflicting data sources to trust). Note that all \tis trivially fulfill this condition (i.e., $c =1$).
%That is for \bis that fulfill this restriction approximating the expectation of results of SPJU queries is only has a constant factor overhead over deterministic query processing (using one of the algorithms for which we prove the claim).
Under this model a query $Q$ evaluated over database $D$ has runtime $O(\qruntime{Q,D})$.
We assume that full table scans are used for every base relation access. We can model index scans by treating an index scan query $\sigma_\theta(R)$ as a base relation.
It can be verified that worst-case optimal join algorithms~\cite{skew,ngo-survey}, as well as query evaluation via factorized databases~\cite{factorized-db} (and work on FAQs~\cite{DBLP:conf/pods/KhamisNR16}) can be modeled as select-union-project-join queries (though these queries can be data dependent).\footnote{This claim can be verified by e.g. simply looking at the {\em Generic-Join} algorithm in~\cite{skew} and {\em factorize} algorithm in~\cite{factorized-db}.} Further, it can be verified that the above cost model on the corresponding SPJU join queries correctly captures their runtime.
A circuit for query $Q$ and $\semNX$-PDB $\pxdb$ is a directed acyclic graph $\tuple{V_{Q,\pxdb}, E_{Q,\pxdb}, \phi_{Q,\pxdb}, \ell_{Q,\pxdb}}$ with vertices $V_{Q,\pxdb}$ and directed edges $E_{Q,\pxdb}\subset{V_{Q,\pxdb}}^2$.
The sink function $\phi_{Q,\pxdb} : \udom^n \rightarrow V_{Q,\pxdb}$ is a partial function that maps the tuples of the $n$-ary relation $Q(\pxdb)$ to vertices.
We require that $\phi_{Q,\pxdb}$'s range be limited to sink vertices (i.e., vertices with out-degree 0).
A function $\ell_{Q,\pxdb} : V_{Q,\pxdb}\rightarrow\{\;+,\times\;\}\cup\mathbb N \cup\vct X$ assigns a label to each node: Source nodes (i.e., vertices with in-degree 0) are labeled with constants or variables (i.e., $\mathbb N \cup\vct X$), while the remaining nodes are labeled with the symbol $+$ or $\times$.
For the specifics on how to construct a circuit to encode the polynomials of all result tuples for a query and $\semNX$-PDB see \Cref{app:subsec-rep-poly-lin-circ}. Note that we can construct circuits for \bis in time linear in the time required for deterministic query processing over a possible world of the \bi under the aforementioned assumption that $\abs{\pxdb}\leq c \cdot\abs{\db}$.
We now connect the size of a circuit (where the size of a circuit is the number of vertices in the corresponding DAG) %\footnote{since each node has indegree at most two, this also is the same up to constants to counting the number of edges in the DAG.})
for a given SPJU query $Q$ and $\semNX$-PDB $\pxdb$ to its $\qruntime{Q,\db}$ where $\db$ is one of the possible worlds of $\pxdb$. We do this formally by showing that the size of the circuit is asymptotically no worse than the corresponding runtime of a large class of deterministic query processing algorithms.
Given a $\semNX$-PDB $\pxdb$ and query plan $Q$, the runtime of $Q$ over $\bagdbof$ has the same or better complexity as the size of the lineage of $Q(\pxdb)$. That is, we have $\abs{V_{Q,\pxdb}}\leq(k-1)\qruntime{Q}$, where $k$ is the maximal degree of any polynomial in $Q(\pxdb)$.
\noindent The proof is shown in in~\Cref{app:subsec-lem-lin-vs-qplan}.
We now have all the pieces to argue that using our approximation algorithm, the expected multiplicities of a SPJU query can be computed in essentially the same runtime as deterministic query processing for the same query:
Given an SPJU query $Q$ over a \ti$\pxdb$ and let $\db_{max}$ denote the world containing all tuples of $\pxdb$, we can compute a $(1\pm\eps)$-approximation of the expectation for each output tuple in $\query(\pxdb)$ with probability at least $1-\delta$ in time
This follows from~\Cref{lem:circuits-model-runtime} and (the circuit counterpart-- see~\Cref{sec:results-circuits})~\Cref{cor:approx-algo-const-p} (where the latter is used with $\delta$ being substituted\footnote{Recall that~\Cref{cor:approx-algo-const-p} is stated for a single output tuple so to get the required guarantee for all (at most $n^k$) output tuples of $Q$ we get at most $\frac\delta{n^k}$ probability of failure for each output tuple and then just a union bound over all output tuples. } with $\frac\delta{n^k}$).
For any positive integer $m$ we can compute the expectation $\poly^m$ (which only changes the degree of the corresponding lineage polynomial by a factor of $m$).
In other words, we can compute the $m$-th moment of the multiplicities, allowing us to e.g. to use Chebyschev inequality or other high moment based probability bounds on the events we might be interested in.