save line

master
Boris Glavic 2020-12-20 17:57:03 -06:00
parent c15830d003
commit a2b8867edb
1 changed files with 3 additions and 4 deletions

View File

@ -10,8 +10,8 @@ Finally, in~\Cref{sec:momemts}, we generalize our result for expectation to othe
\subsection{Lineage Circuits}
\label{sec:circuits}
In~\Cref{sec:semnx-as-repr}, we switched to thinking of our query results as polynomials and until now, have focused on thinking of inputs this way.
In particular, starting with~\Cref{sec:expression-trees} we considered these polynomials to be represented as an expression tree.
In~\Cref{sec:semnx-as-repr}, we switched to thinking of our query results as polynomials and until now, have focused on thinking of inputs this way.
In particular, starting with~\Cref{sec:expression-trees} we considered these polynomials to be represented as an expression tree.
However, these do not capture many of the compressed polynomial representations that we can get from query processing algorithms on bags, including the recent work on worst-case optimal join algorithms~\cite{ngo-survey,skew}, factorized databases~\cite{factorized-db}, and FAQ~\cite{DBLP:conf/pods/KhamisNR16}. Intuitively, the main reason is that an expression tree does not allow for `sharing' of intermediate results, which is crucial for these algorithms (and other query processing methods as well).
In this section, we represent query polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way.
@ -41,7 +41,7 @@ For a more detailed discussion of why~\Cref{lem:approx-alg} holds for a lineage
So far our analysis of $\approxq$ has been in terms of the size of the compressed lineage polynomial.
We now show that this model corresponds to the behavior of a deterministic database by proving that for any union of conjunctive queries, we can construct a compressed lineage polynomial for a query $Q$ and \bi $\pxdb$ of size (and in runtime) linear in the runtime of a general class of query processing algorithms for the same query $Q$ on a deterministic database $\db$.
We assume a linear relationship between input sizes $|\pxdb|$ and $|\db|$ (i.e., $\exists c, \db \in \pxdb$ s.t. $\abs{\pxdb} \leq c \cdot \abs{\db})$).
This is a reasonable assumption because each block of a \bi represents entities with uncertain attributes.
This is a reasonable assumption because each block of a \bi represents entities with uncertain attributes.
In practice there is often a limited number of alternatives for each block (e.g., which of five conflicting data sources to trust). Note that all \tis trivially fulfill this condition (i.e., $c = 1$).
%That is for \bis that fulfill this restriction approximating the expectation of results of SPJU queries is only has a constant factor overhead over deterministic query processing (using one of the algorithms for which we prove the claim).
% with the same complexity as it would take to evaluate the query on a deterministic \emph{bag} database of the same size as the input PDB.
@ -101,7 +101,6 @@ Given a $\semNX$-PDB $\pxdb$ and query plan $Q$, the runtime of $Q$ over $\bagdb
\end{lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\noindent The proof is shown in in~\Cref{app:subsec-lem-lin-vs-qplan}.
We now have all the pieces to argue that using our approximation algorithm, the expected multiplicities of a SPJU query can be computed in essentially the same runtime as deterministic query processing for the same query:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Corollary}