save line
parent
c15830d003
commit
a2b8867edb
|
@ -10,8 +10,8 @@ Finally, in~\Cref{sec:momemts}, we generalize our result for expectation to othe
|
|||
\subsection{Lineage Circuits}
|
||||
\label{sec:circuits}
|
||||
|
||||
In~\Cref{sec:semnx-as-repr}, we switched to thinking of our query results as polynomials and until now, have focused on thinking of inputs this way.
|
||||
In particular, starting with~\Cref{sec:expression-trees} we considered these polynomials to be represented as an expression tree.
|
||||
In~\Cref{sec:semnx-as-repr}, we switched to thinking of our query results as polynomials and until now, have focused on thinking of inputs this way.
|
||||
In particular, starting with~\Cref{sec:expression-trees} we considered these polynomials to be represented as an expression tree.
|
||||
However, these do not capture many of the compressed polynomial representations that we can get from query processing algorithms on bags, including the recent work on worst-case optimal join algorithms~\cite{ngo-survey,skew}, factorized databases~\cite{factorized-db}, and FAQ~\cite{DBLP:conf/pods/KhamisNR16}. Intuitively, the main reason is that an expression tree does not allow for `sharing' of intermediate results, which is crucial for these algorithms (and other query processing methods as well).
|
||||
|
||||
In this section, we represent query polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way.
|
||||
|
@ -41,7 +41,7 @@ For a more detailed discussion of why~\Cref{lem:approx-alg} holds for a lineage
|
|||
So far our analysis of $\approxq$ has been in terms of the size of the compressed lineage polynomial.
|
||||
We now show that this model corresponds to the behavior of a deterministic database by proving that for any union of conjunctive queries, we can construct a compressed lineage polynomial for a query $Q$ and \bi $\pxdb$ of size (and in runtime) linear in the runtime of a general class of query processing algorithms for the same query $Q$ on a deterministic database $\db$.
|
||||
We assume a linear relationship between input sizes $|\pxdb|$ and $|\db|$ (i.e., $\exists c, \db \in \pxdb$ s.t. $\abs{\pxdb} \leq c \cdot \abs{\db})$).
|
||||
This is a reasonable assumption because each block of a \bi represents entities with uncertain attributes.
|
||||
This is a reasonable assumption because each block of a \bi represents entities with uncertain attributes.
|
||||
In practice there is often a limited number of alternatives for each block (e.g., which of five conflicting data sources to trust). Note that all \tis trivially fulfill this condition (i.e., $c = 1$).
|
||||
%That is for \bis that fulfill this restriction approximating the expectation of results of SPJU queries is only has a constant factor overhead over deterministic query processing (using one of the algorithms for which we prove the claim).
|
||||
% with the same complexity as it would take to evaluate the query on a deterministic \emph{bag} database of the same size as the input PDB.
|
||||
|
@ -101,7 +101,6 @@ Given a $\semNX$-PDB $\pxdb$ and query plan $Q$, the runtime of $Q$ over $\bagdb
|
|||
\end{lemma}
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\noindent The proof is shown in in~\Cref{app:subsec-lem-lin-vs-qplan}.
|
||||
|
||||
We now have all the pieces to argue that using our approximation algorithm, the expected multiplicities of a SPJU query can be computed in essentially the same runtime as deterministic query processing for the same query:
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\begin{Corollary}
|
||||
|
|
Loading…
Reference in New Issue