In particular, in~\Cref{sec:circuits} we first consider the case when the compressed polynomial is represented by a Directed Acyclic Graph (DAG) instead of an expression tree (\Cref{def:express-tree}) and observe that our results carry over.
Then, in~\Cref{sec:intro}, we formalize our claim that a linear algorithm for our problem implies that PDB queries can be answered in the same runtime as deterministic queries.
In~\Cref{sec:semnx-as-repr}, we switched to thinking of our query results as polynomials and until now, have focused on thinking of our input as a polynomial. In particular, starting with~\Cref{sec:expression-trees} we considered these polynomials to be represented as an expression tree. However, these do not capture many of the compressed polynomial representations that we can get from query processing algorithms on bags, including the recent work on worst-case optimal join algorithms~\cite{ngo-survey,skew}, factorized databases~\cite{factorized-db}, and FAQ~\cite{DBLP:conf/pods/KhamisNR16}. Intuitively, the main reason is that an expression tree does not allow for `sharing' any intermediate results, which is crucial for these algorithms (and other query processing methods as well).
In this section, we represent query polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way.
We present a formal treatment of {\em lineage circuit}s in~\Cref{sec:circuits-formal}, with only a quick overview to in this section.
A lineage circuit is represented by a DAG, where each source node corresponds to either one of the input variables or a constant and the sinks correspond to output tuples.
In~\Cref{sec:results-circuits} we argue why results from earlier sections also hold for lineage circuits and then argue why lineage circuits capture the runtime of well-known query processing algorithms in~\Cref{sec:circuit-runtime} (\Cref{sec:cost-model} formalizes the query cost model).
We first note that since expression trees are a special case of linear circuits, all of our hardness results in~\Cref{sec:hard} are still valid for the latter.
Observe that \textsc{Approx}\textsc{imate}$\rpoly$ (\Cref{alg:mon-sam} in \Cref{sec:algo}) works for lineage circuits as long as the same guarantees on $\onepass$ and $\sampmon$ (\Cref{lem:one-pass} and \Cref{lem:sample} respectively) hold for lineage circuits as well.
It turns out that this is the case, simply because both algorithms rely on only one property of expression trees: that each node has two children;
Thus, so far our analysis of the runtime of $\onepass$ has been in terms of the size of the compressed lineage polynomial.
We now show that this model corresponds to the behavior of a deterministic database by proving that for any union of conjunctive queries, we can construct a compressed lineage polynomial with the same complexity as it would take to evaluate the query on a deterministic \emph{bag} database of the same size as the input PDB. We adopt a minimalistic compute-bound model of query evaluation drawn from the worst-case optimal join literature~\cite{skew,ngo-survey}.
Under this model a query $Q$ evaluated over database $D$ has runtime $O(\qruntime{Q,D})$.
We assume that full table scans are used for every base relation access. We can model index scans by treating an index scan query $\sigma_\theta(R)$ as a base relation.
It can be verified that worst-case optimal join algorithms~\cite{skew,ngo-survey}, as well as query evaluation via factorized databases~\cite{factorized-db} (and work on FAQs~\cite{DBLP:conf/pods/KhamisNR16}) can be modeled as select-union-project-join queries (though these queries can be data dependent).\footnote{This claim can be verified by e.g. simply looking at the {\em Generic-Join} algorithm in~\cite{skew} and {\em factorize} algorithm in~\cite{factorized-db}.} Further, it can be verified that the above cost model on the corresponding SPJU join queries correctly captures their runtime.
As mentioned earlier, we represent lineage polynomials as arithmetic circuits over $\mathbb N$ with $+$, $\times$.
A circuit for query $Q$ and $\semNX$-PDB $\pxdb$ is a directed acyclic graph $\tuple{V_{Q,\pxdb}, E_{Q,\pxdb}, \phi_{Q,\pxdb}, \ell_{Q,\pxdb}}$ with vertices $V_{Q,\pxdb}$ and directed edges $E_{Q,\pxdb}\subset{V_{Q,\pxdb}}^2$.
The sink function $\phi_{Q,\pxdb} : \udom^n \rightarrow V_{Q,\pxdb}$ is a partial function that maps the tuples of the $n$-ary relation $Q(\pxdb)$ to vertices.
We require that $\phi_{Q,\pxdb}$'s range be limited to sink vertices (i.e., vertices with out-degree 0).
A function $\ell_{Q,\pxdb} : V_{Q,\pxdb}\rightarrow\{\;+,\times\;\}\cup\mathbb N \cup\vct X$ assigns a label to each node: Source nodes (i.e., vertices with in-degree 0) are labeled with constants or variables (i.e., $\mathbb N \cup\vct X$), while the remaining nodes are labeled with the symbol $+$ or $\times$.
For the specifics on how to construct a lineage circuit to encode the polynomials of all result tuples for a query and $\semNX$-PDB see~\Cref{app:subsec-rep-poly-lin-circ}.
We now connect the size of a lineage circuit (where the size of a lineage circuit is the number of vertices in the corresponding DAG) %\footnote{since each node has indegree at most two, this also is the same up to constants to counting the number of edges in the DAG.})
for a given SPJU query $Q$ and $\semNX$-PDB $\pxdb$ to its $\qruntime{Q,\db}$ where $\bagdbof$ is the bag database consisting of all tuples of $\pxdb$ with multiplicity $1$. We do this formally by showing that the size of the lineage circuit is asymptotically no worse than the corresponding runtime of a large class of deterministic query processing algorithms.
Given a $\semNX$-PDB $\pxdb$ and query plan $Q$, the runtime of $Q$ over $\bagdbof$ has the same or better complexity as the size of the lineage of $Q(\pxdb)$. That is, we have $\abs{V_{Q,\pxdb}}\leq(k-1)\qruntime{Q}$, where $k$ is the maximal degree of any polynomial in $Q(\pxdb)$.
We now have all the pieces to argue that using our approximation algorithm, the expected multiplicities of a SPJU query can be computed in essentially the same runtime as deterministic query processing for the same query:
Given an SPJU query $Q$ over a TIDB $\pxdb$, we can compute a $(1\pm\eps)$-approximation of the expectation for each output tuple with probability at least $1-\delta$ in time
This follows from~\Cref{lem:circuits-model-runtime} and (the lineage circuit counterpart-- see~\Cref{sec:results-circuits})~\Cref{cor:approx-algo-const-p} (where the latter is used with $\delta$ being substituted\footnote{Recall that~\Cref{cor:approx-algo-const-p} is stated for a single output tuple so to get the required guarantee for all (at most $n^k$) output tuples of $Q$ we get at most $\frac\delta{n^k}$ probability of failure for each output tuple and then just a union bound over all output tuples. } with $\frac\delta{n^k}$).
Note that for any positive integer $m$ we can compute the expectation $\poly^m$ (since this only changes the degree of the corresponding lineage polynomial by a factor of $m$).
In other words, we can compute the $m$-th moment of the multiplicities as well allowing us to e.g. to use Chebyschev inequality or other high moment based probability bounds on the events we might be interested in.