Finished rewrite of SampMon; started Iterative solution of OnePass

master
Aaron Huber 2021-02-08 13:44:50 -05:00
parent 3b6dbf35d9
commit ba6010daa8
5 changed files with 59 additions and 27 deletions

View File

@ -30,11 +30,6 @@ We now introduce useful definitions and notation related to polynomials. We use
%Note that $\circuit$ need not encode an expression in the standard monomial basis. For instance, $\circuit$ could represent a compressed form of the polynomial in~\Cref{eq:poly-eg}, such as $(x + 2y)(2x - y)$.
\revision{
\begin{Definition}[Reduced Polynomial]
For an arbitrary polynomial $\poly$, we say that $\poly$ is reduced when all exponents $e$ occurring throughout the monomials of $\poly$ such that $e > 1$ are \textit{reduced} to $1$.
\end{Definition}
\begin{Definition}[Pure Expansion]
The pure expansion of a polynomial $\poly$ is formed by computing all product of sums occurring in $\poly$, without combining like monomials. The pure expansion of $\poly$ generalizes ~\Cref{def:smb} by allowing monomials $m_i = m_j$ for $i \neq j$.
\end{Definition}
@ -59,8 +54,9 @@ The logical view of \revision{$\expansion{\circuit}$} is a list of tuples $(\mon
}
\end{Definition}
%where that the multiplication of two tuples %is the standard multiplication over monomials and the standard multiplication over coefficients to produce the product tuple, as in
%is their direct product $(\monom_1, \coef_1) \cdot (\monom_2, \coef_2) = (\monom_1 \cdot \monom_2, \coef_1 \times \coef_2)$ such that monomials $\monom_1$ and $\monom_2$ are concatenated in a product operation, while the standard product operation over reals applies to $\coef_1 \times \coef_2$. The product of $\expansion{\circuit_\lchild} \cdot \expansion{\circuit'_\rchild}$ is then the cross product of the multiplication of all such tuples returned to both $\expansion{\circuit_\lchild}$ and $\expansion{\circuit_\rchild}$. %The operator $\otimes$ is defined as the cross-product tuple multiplication of all such tuples returned by both $\expansion{\circuit_\lchild}$ and $\expansion{\circuit_\rchild}$.
\revision{
Note that $\expansion{\circuit}$ reduces all exponents $e > 1$ to $e = 1$, as seen in ~\Cref{def:reduced-bi-poly}.
}
In the following, we abuse notation and write $\monom$ to denote the monomial obtained as the products of the variables in the set.

View File

@ -15,25 +15,25 @@ In particular, starting with~\Cref{sec:expression-trees} we considered these pol
However, these do not capture many of the compressed polynomial representations that we can get from query processing algorithms on bags, including the recent work on worst-case optimal join algorithms~\cite{ngo-survey,skew}, factorized databases~\cite{factorized-db}, and FAQ~\cite{DBLP:conf/pods/KhamisNR16}. Intuitively, the main reason is that an expression tree does not allow for `sharing' of intermediate results, which is crucial for these algorithms (and other query processing methods as well).
In this section, we represent query polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way.
We present a formal treatment of {\em lineage circuit}s in~\Cref{sec:circuits-formal}, with only a quick overview to in this section.
A lineage circuit is represented by a DAG, where each source node corresponds to either one of the input variables or a constant, and the sinks to output tuples.
We present a formal treatment of {\em circuit}s in~\Cref{sec:circuits-formal}, with only a quick overview to in this section.
A circuit is represented by a DAG, where each source node corresponds to either one of the input variables or a constant, and the sinks to output tuples.
Every other node has at most two in-edges, is labeled as an addition or a multiplication node, and has no limit on its outdegree.
Note that if we limit the outdegree to one, then we get back expression trees.
In~\Cref{sec:results-circuits} we argue why results from earlier sections also hold for lineage circuits and then argue why lineage circuits capture the runtime of well-known query processing algorithms in~\Cref{sec:circuit-runtime} (\Cref{sec:cost-model} formalizes the query cost model).
In~\Cref{sec:results-circuits} we argue why results from earlier sections also hold for circuits and then argue why circuits capture the runtime of well-known query processing algorithms in~\Cref{sec:circuit-runtime} (\Cref{sec:cost-model} formalizes the query cost model).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Extending our results to lineage circuits}
\subsubsection{Extending our results to circuits}
\label{sec:results-circuits}
We first note that since expression trees are a special case of linear circuits, all of our hardness results for in~\Cref{sec:hard} are still valid for the latter.
Observe that \textsc{Approx}\textsc{imate}$\rpoly$ (\Cref{alg:mon-sam} in \Cref{sec:algo}) works for lineage circuits as long as the same guarantees on $\onepass$ and $\sampmon$ (\Cref{lem:one-pass} and \Cref{lem:sample} respectively) hold for lineage circuits as well.
Observe that \textsc{Approx}\textsc{imate}$\rpoly$ (\Cref{alg:mon-sam} in \Cref{sec:algo}) works for circuits as long as the same guarantees on $\onepass$ and $\sampmon$ (\Cref{lem:one-pass} and \Cref{lem:sample} respectively) hold for circuits as well.
It turns out that this is the case, simply because both algorithms rely on only one property of expression trees: that each node has two children;
Analogously in a circuit, each node has a maximum in-degree of two.
Put another way, our argument never used the fact that in an expression tree, each node has at most one parent.
%
For a more detailed discussion of why~\Cref{lem:approx-alg} holds for a lineage circuit, see~\Cref{app:lineage-circuit-ext}.
For a more detailed discussion of why~\Cref{lem:approx-alg} holds for a circuit, see~\Cref{app:lineage-circuit-ext}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{The cost model}
@ -74,7 +74,7 @@ It can be verified that worst-case optimal join algorithms~\cite{skew,ngo-survey
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Lineage circuits for query plans}
\label{sec:circuits-formal}
We now formalize lineage circuits and the construction of lineage circuits for SPJU queries.
We now formalize circuits and the construction of circuits for SPJU queries.
As mentioned earlier, we represent lineage polynomials as arithmetic circuits over $\mathbb N$-valued variables with $+$, $\times$.
A circuit for query $Q$ and $\semNX$-PDB $\pxdb$ is a directed acyclic graph $\tuple{V_{Q,\pxdb}, E_{Q,\pxdb}, \phi_{Q,\pxdb}, \ell_{Q,\pxdb}}$ with vertices $V_{Q,\pxdb}$ and directed edges $E_{Q,\pxdb} \subset {V_{Q,\pxdb}}^2$.
The sink function $\phi_{Q,\pxdb} : \udom^n \rightarrow V_{Q,\pxdb}$ is a partial function that maps the tuples of the $n$-ary relation $Q(\pxdb)$ to vertices.
@ -83,7 +83,7 @@ We require that $\phi_{Q,\pxdb}$'s range be limited to sink vertices (i.e., vert
A function $\ell_{Q,\pxdb} : V_{Q,\pxdb} \rightarrow \{\;+,\times\;\}\cup \mathbb N \cup \vct X$ assigns a label to each node: Source nodes (i.e., vertices with in-degree 0) are labeled with constants or variables (i.e., $\mathbb N \cup \vct X$), while the remaining nodes are labeled with the symbol $+$ or $\times$.
We require that vertices have an in-degree of at most two.
%
For the specifics on how to construct a lineage circuit to encode the polynomials of all result tuples for a query and $\semNX$-PDB see \Cref{app:subsec-rep-poly-lin-circ}. Note that we can construct lineage circuits for \bis in time linear in the time required for deterministic query processing over a possible world of the \bi under the aforementioned assumption that $\abs{\pxdb} \leq c \cdot \abs{\db}$.
For the specifics on how to construct a circuit to encode the polynomials of all result tuples for a query and $\semNX$-PDB see \Cref{app:subsec-rep-poly-lin-circ}. Note that we can construct circuits for \bis in time linear in the time required for deterministic query processing over a possible world of the \bi under the aforementioned assumption that $\abs{\pxdb} \leq c \cdot \abs{\db}$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Circuit size vs. runtime}
@ -91,8 +91,8 @@ For the specifics on how to construct a lineage circuit to encode the polynomia
\newcommand{\bagdbof}{\textsc{bag}(\pxdb)}
We now connect the size of a lineage circuit (where the size of a lineage circuit is the number of vertices in the corresponding DAG) %\footnote{since each node has indegree at most two, this also is the same up to constants to counting the number of edges in the DAG.})
for a given SPJU query $Q$ and $\semNX$-PDB $\pxdb$ to its $\qruntime{Q,\db}$ where $\db$ is one of the possible worlds of $\pxdb$. We do this formally by showing that the size of the lineage circuit is asymptotically no worse than the corresponding runtime of a large class of deterministic query processing algorithms.
We now connect the size of a circuit (where the size of a circuit is the number of vertices in the corresponding DAG) %\footnote{since each node has indegree at most two, this also is the same up to constants to counting the number of edges in the DAG.})
for a given SPJU query $Q$ and $\semNX$-PDB $\pxdb$ to its $\qruntime{Q,\db}$ where $\db$ is one of the possible worlds of $\pxdb$. We do this formally by showing that the size of the circuit is asymptotically no worse than the corresponding runtime of a large class of deterministic query processing algorithms.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{lemma}
@ -112,7 +112,7 @@ We now have all the pieces to argue that using our approximation algorithm, the
\end{Corollary}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}
This follows from~\Cref{lem:circuits-model-runtime} and (the lineage circuit counterpart-- see~\Cref{sec:results-circuits})~\Cref{cor:approx-algo-const-p} (where the latter is used with $\delta$ being substituted\footnote{Recall that~\Cref{cor:approx-algo-const-p} is stated for a single output tuple so to get the required guarantee for all (at most $n^k$) output tuples of $Q$ we get at most $\frac \delta{n^k}$ probability of failure for each output tuple and then just a union bound over all output tuples. } with $\frac \delta{n^k}$).
This follows from~\Cref{lem:circuits-model-runtime} and (the circuit counterpart-- see~\Cref{sec:results-circuits})~\Cref{cor:approx-algo-const-p} (where the latter is used with $\delta$ being substituted\footnote{Recall that~\Cref{cor:approx-algo-const-p} is stated for a single output tuple so to get the required guarantee for all (at most $n^k$) output tuples of $Q$ we get at most $\frac \delta{n^k}$ probability of failure for each output tuple and then just a union bound over all output tuples. } with $\frac \delta{n^k}$).
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

View File

@ -446,6 +446,8 @@ Applying this bound in the runtime bound in~\Cref{lem:approx-alg} gives the firs
\revision{
Please note that it is \textit{assumed} that the original call to \onepass consists of a call on an input circuit \circuit such that the values of members \vari{partial}, \vari{Lweight} and \vari{Rweight} have been initialized to Null across all gates.
}
\begin{algorithm}[h!]
\caption{\onepass$(\revision{\circuit})$}
\label{alg:one-pass}
@ -592,22 +594,52 @@ This leaves us with two possibilities for $\revision{\circuit}$. The first case
\paragraph{Run-time Analysis}
The runtime for \textsc{OnePass} is fairly straight forward. \revision{
Note first that each gate is visited at most two times. If the internal gate is revisited, then it is true by lines ~\ref{alg:one-pass-revisit1} and ~\ref{alg:one-pass-revisit2} that there are no recursive calls on its subcircuits.
Note that due to the property of each gate having potentially a linear number of outputs in the size of the circuit, each node can be upper bounded to being visited $\numvar$ times for a circuit of size $\numvar$. However, we can produce a tighter bound based on the property that each gate has at most $2$ inputs. This implies that there exist at most $2\numvar$ edges, and $O(\numvar)$ edges implies a total of $< 2\numvar$ node visitations (recall that source nodes have no inputs). It is therefore the case that we still have $O(\numvar)$ visitations across the entire circuit.
}
Next consider for each type of node visited, it can be trivially verified that there are only a constant number of operations. This concludes then with a $O\left(\size(\revision{\circuit})\right)$ runtime.
\subsection{\onepass Iterative}
\revision{
\begin{algorithm}[h!]
\caption{\onepass$(\circuit)$ Iterative Algorithm}
\label{alg:one-pass-iter}
\begin{algorithmic}[1]
\Require \circuit: Circuit
\Ensure \circuit: Annotated Circuit
\Ensure \vari{sum} $\in \reals$
\For{\circuit in \topord(\circuit)}\Comment{\topord($\cdot$) is the topological order of \circuit}
\If{\circuit.\vari{type} $=$ \var}
\State \circuit.\vari{partial} $\gets 1$
\ElsIf{\circuit.\vari{type} $=$ \tnum}
\State \circuit.\vari{partial} $\gets \abs{\circuit.\val}$
\ElsIf{\circuit.\vari{type} $= \circmult$}
\State \circuit.\vari{partial} $\gets \circuit_\linput \times \circuit_\rinput$
\Else \Comment{\circuit.\vari{type} $= \circplus$}
\State \circuit.\vari{partial} $\gets \circuit_\linput + \circuit_\rinput$
\State \circuit.\vari{Lweight} $\gets \frac{\circuit_\linput.\vari{partial}}{\circuit.\vari{partial}}$
\State \circuit.\vari{Rweight} $\gets \frac{\circuit_\rinput.\vari{partial}}{\circuit.\vari{partial}}$
\EndIf
\State \vari{sum} $\gets \circuit.\vari{partial}$
\EndFor
\State \Return \vari{sum}
\end{algorithmic}
\end{algorithm}
}
\subsection{Proof of~\Cref{lem:sample}}
\subsection{Proof of ~\Cref{lem:one-pass} Iterative}
\subsection{\sampmon Notes}
\revision{
While we would like to take advantage of the space efficiency gained in using a circuit \circuit instead an expression tree \etree, we do not know that such a method exists when computing a sample of the input polynomial representation.
The efficiency gains of circuits over trees is found in the multiplication of the same product of sum terms in the factorized polynomial that \circuit models. However, to avoid biased sampling, it is imperative to sample from both children of a multiplication gate, independently. When we perform separate, independent sampling of both children, the result is the same as performing the sampling computation over the space inefficient expression tree. However, this doesn't harm us, since as we show, that the bounded run time is not dependent on the size of the equivalent expression tree of the input circuit \circuit, but rather on the expression tree's depth, which is the same as the depth of \circuit.
}
\revision{
We use the equivalent expression tree representation discussed at the onset of ~\cref{lem:one-pass} (1st iteration--\oldstuff{grayed out}), which \sampmon essentially implements.
}
First, we need to show that $\sampmon$ indeed returns a monomial $\monom$,\footnote{Technically it returns $\var(\monom)$ but for less cumbersome notation we will refer to $\var(\monom)$ simply by $\monom$ in this proof.} such that $(\monom, \coef)$ is in $\expansion{\circuit}$, which we do by induction on the depth of $\circuit$.
\subsection{Proof of~\Cref{lem:sample}}
We first need to show that $\sampmon$ indeed returns a monomial $\monom$,\footnote{Technically it returns $\var(\monom)$ but for less cumbersome notation we will refer to $\var(\monom)$ simply by $\monom$ in this proof.} such that $(\monom, \coef)$ is in $\expansion{\circuit}$, which we do by induction on the depth of $\circuit$.
For the base case, let the depth $d$ of $\circuit$ be $0$. We have that the root node is either a constant $\coef$ for which by line ~\ref{alg:sample-num-return} we return $\{~\}$, or we have that $\circuit.\type = \var$ and $\circuit.\val = x$, and by line ~\ref{alg:sample-var-return} we return $\{x\}$. Both cases sample a monomial%satisfy ~\cref{def:monomial}
, and the base case is proven.
@ -645,10 +677,12 @@ and we obtain the desired result.
\paragraph{Run-time Analysis}
We now bound the number of recursive calls in $\sampmon$ by $O\left(k\cdot depth(\circuit)\right)$. Note that a sampled monomial corresponds to a subgraph of $\circuit$. Take an arbitrary sample subgraph of circuit $\circuit$ and note that since every monomial has degree at most $k$, the subgraph has $O(k)$ leaves and the number of nodes in each layer as one goes from leaves to the root can only go down. Since the sub-graph has depth at most $depth(\circuit)$ and that each level has $O(k)$ nodes, the sub-graph as $O(k\cdot depth(\circuit))$ nodes in it. Noting that each node in the sub-graph corresponds to a recursive call yields the desired bound.
We now bound the number of recursive calls in $\sampmon$ by $O\left(k\cdot depth(\circuit)\right)$. Note that a sampled monomial corresponds to a subcircuit of $\circuit$. Take an arbitrary sample subcircuit \subcircuit of circuit $\circuit$ and note that since every monomial has degree at most $k$, the \subcircuit has $O(k)$ leaves and the number of recursive calls in each layer as one goes from leaves to the root can only go down. Since \subcircuit has depth at most $\depth(\circuit)$ and that each level has $O(k)$ nodes, the subcircuit has $O(k\cdot \depth(\circuit))$ nodes in it. \revision{
It is important to note that since there are $O(k)$ recursive calls at any given level, the case of more than one recursive call to an arbitrary gate is accounted for in this bound, i.e., on any given level, there cannot be more than $\numvar$ calls on a particular gate such that the sum of all other calls on other gates is $m$ and $\numvar + m = k$. This yields the desired bound.
}
It is easy to check that except for~\Cref{alg:sample-times-union}, all other lines take $O(1)$ time. Thus, overall all lines except for~\Cref{alg:sample-times-union} take $O(k\cdot depth(\etree))$ time. Now consider all executions of~\Cref{alg:sample-times-union} together. We note that at each level we will be adding a given set of variables to some set at most once: since the sum of the sizes of the sets at a given level is at most $k$, each level involves $O(k\log{k})$ time. Thus, overall all executions of~\Cref{alg:sample-times-union} takes $O(k\log{k}\cdot depth(T))$ time, as desired.
It is easy to check that except for~\Cref{alg:sample-times-union}, all other lines take $O(1)$ time. Thus, overall all lines except for~\Cref{alg:sample-times-union} take $O(k\cdot depth(\circuit))$ time. Now consider all executions of~\Cref{alg:sample-times-union} together. We note that at each level we will be adding a given set of variables to some set at most once: since the sum of the sizes of the sets at a given level is at most $k$, each level involves $O(k\log{k})$ time. Thus, overall all executions of~\Cref{alg:sample-times-union} takes $O(k\log{k}\cdot \depth(\circuit))$ time, as desired.
\subsection{Experimental Results}\label{app:subsec:experiment}

View File

@ -114,6 +114,7 @@
\newcommand{\degree}{\func{deg}}
\newcommand{\size}{\func{size}}
\newcommand{\depth}{\func{depth}}
\newcommand{\topord}{\func{TopOrd}}
%saving \treesize for now to keep latex from breaking
\newcommand{\treesize}{\func{size}}
\newcommand{\sign}{\func{sgn}}

View File

@ -91,7 +91,8 @@ A \emph{\ti} is a \bi where each block contains exactly one tuple.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Problem Definition}\label{sec:expression-trees}
We first formally define expression trees, an encoding of polynomials that we use throughout much of the paper before generalizing them to circuits in~\Cref{sec:gen}.
We first formally define circuits, an encoding of polynomials that we use throughout the paper. Since we are particularly using \emph{lineage} circuits, we drop the term lineage and only refer to them as circuits.
For illustrative purposes consider the polynomial $\poly(\vct{X}) = 2X^2 + 3XY - 2Y^2$ over $\vct{X} = [X, Y]$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%