paper-BagRelationalPDBsAreHard/prob-def.tex

138 lines
7.2 KiB
TeX
Raw Normal View History

%root: main.tex
%!TEX root=./main.tex
\subsection{Problem Definition}\label{sec:expression-trees}
2021-09-02 12:06:47 -04:00
%We first formally define circuits, an encoding of polynomials that we use throughout the paper.
2021-04-07 23:27:51 -04:00
%
2021-09-02 12:06:47 -04:00
%For illustrative purposes consider the polynomial $\poly(\vct{X}) = 2X^2 + 3XY - 2Y^2$ over $\vct{X} = [X, Y]$.
2021-09-07 11:32:06 -04:00
We represent lineage polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way. Since we are particularly using circuits to model lineage polynomials, we can refer to these circuits as lineage circuits. However, when the meaning is clear, we will drop the term lineage and only refer to them as circuits.
2021-04-10 14:11:35 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Circuit]\label{def:circuit}
2021-09-08 12:17:19 -04:00
A circuit $\circuit$ is a Directed Acyclic Graph (DAG) whose source gates (in degree of $0$) consist of elements in either $\domN$ or $\vct{X}$. For each output tuple there exists one source gate. The internal gates have binary input and are either sum ($\circplus$) or product ($\circmult$) gates.
2021-04-10 14:11:35 -04:00
%
2021-09-08 12:17:19 -04:00
Each gate has the following members: \type, \vpartial, \vari{input}, \degval, \vari{Lweight}, and \vari{Rweight}, where \type is the value type $\{\circplus, \circmult, \var, \tnum\}$ and \vari{input} the list of inputs. Source gates have an additional member \val storing the value. $\circuit_\linput$ ($\circuit_\rinput$) denotes the left (right) input of \circuit.
\end{Definition}
2021-09-08 12:17:19 -04:00
When the underlying DAG is a tree (with edges pointing towards the root), the structure is an expression tree \etree. In such a case, the root of \etree is analogous to the sink of \circuit. The fields \vari{partial}, \degval, \vari{Lweight}, and \vari{Rweight} are used in the proofs of \Cref{sec:proofs-approx-alg}.
2021-09-02 12:06:47 -04:00
2021-04-10 14:11:35 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-09-02 12:06:47 -04:00
%As stated in \Cref{def:circuit}, every internal node has at most two incoming edges, is labeled as an addition or a multiplication node, and has no limit on its outdegree.
%Note that if we limit the outdegree to one, then we get expression trees.
2021-09-08 12:17:19 -04:00
The circuits in \Cref{fig:two-step} encode their respective polynomials in column $\poly$.
%\circuit in \Cref{fig:circuit-express-tree} encodes the polynomial $XY + WZ$.
Note that each circuit \circuit encodes a tree, with edges pointing towards the root.
2021-09-08 12:17:19 -04:00
%\begin{figure}[t]
% \begin{subfigure}[b]{0.45\linewidth}
% \centering
% \begin{tikzpicture}[thick]
% \node[tree_node] (a1) at (0, 0){$\boldsymbol{X}$};
% \node[tree_node] (b1) at (1, 0){$\boldsymbol{Y}$};
% \node[tree_node] (c1) at (2, 0){$\boldsymbol{W}$};
% \node[tree_node] (d1) at (3, 0){$\boldsymbol{Z}$};
%
% \node[tree_node] (a2) at (0.5, 1){$\boldsymbol{\circmult}$};
% \node[tree_node] (b2) at (2.5, 1){$\boldsymbol{\circmult}$};
%
% \node[tree_node] (a3) at (1.5, 2){$\boldsymbol{\circplus}$};
%
% \draw[->] (a1) -- (a2);
% \draw[->] (b1) -- (a2);
% \draw[->] (c1) -- (b2);
% \draw[->] (d1) -- (b2);
% \draw[->] (a2) -- (a3);
% \draw[->] (b2) -- (a3);
% \draw[->] (a3) -- (1.5, 2.5);
% \end{tikzpicture}
% \caption{Circuit encoding $XY + WZ$, a special case of an expression tree}
% \label{fig:circuit-express-tree}
% \end{subfigure}
% \hspace{5mm}
\begin{wrapfigure}{l}{0.45\linewidth}
\centering
\begin{tikzpicture}[thick]
\node[tree_node] (a1) at (0, 0) {$\boldsymbol{X}$};
\node[tree_node] (b1) at (1.5, 0) {$\boldsymbol{2}$};
\node[tree_node] (c1) at (3, 0) {$\boldsymbol{Y}$};
\node[tree_node] (d1) at (4.5, 0) {$\boldsymbol{-1}$};
2021-04-08 22:30:03 -04:00
\node[tree_node] (a2) at (0.75, 0.75) {$\boldsymbol{\circmult}$};
\node[tree_node] (b2) at (2.25, 0.75) {$\boldsymbol{\circmult}$};
\node[tree_node] (c2) at (3.75, 0.75) {$\boldsymbol{\circmult}$};
2021-04-08 22:30:03 -04:00
\node[tree_node] (a3) at (0.55, 1.5) {$\boldsymbol{\circplus}$};
\node[tree_node] (b3) at (3.75, 1.5) {$\boldsymbol{\circplus}$};
2021-04-08 22:30:03 -04:00
\node[tree_node] (a4) at (2.25, 2.25) {$\boldsymbol{\circmult}$};
\draw[->] (a1) -- (a2);
\draw[->] (a1) -- (a3);
\draw[->] (b1) -- (a2);
\draw[->] (b1) -- (b2);
\draw[->] (c1) -- (c2);
\draw[->] (c1) -- (b2);
\draw[->] (d1) -- (c2);
\draw[->] (a2) -- (b3);
\draw[->] (b2) -- (a3);
\draw[->] (c2) -- (b3);
\draw[->] (a3) -- (a4);
\draw[->] (b3) -- (a4);
\draw[->] (a4) -- (2.25, 2.75);
\end{tikzpicture}
\caption{Circuit encoding of $(X + 2Y)(2X - Y)$}
\label{fig:circuit}
2021-09-08 12:17:19 -04:00
\end{wrapfigure}
% \caption{Example circuit encodings}
%\end{figure}
We next formally define the relationship of circuits with polynomials. While the definition assumes one sink for notational convenience, it easily generalizes to the multiple sinks case.
\begin{Definition}[$\polyf(\cdot)$]\label{def:poly-func}
2021-08-25 12:36:08 -04:00
Denote $\polyf(\circuit)$ to be the function from circuit $\circuit$ to its corresponding polynomial (in \abbrSMB).\footnote{Recall our assumption that unless otherwise mentioned, all polynomials are considered in $\abbrSMB$.} $\polyf(\cdot)$ is recursively defined on $\circuit$ as follows, with addition and multiplication following the standard interpretation for polynomials:
\begin{equation*}
\polyf(\circuit) = \begin{cases}
\polyf(\circuit_\lchild) + \polyf(\circuit_\rchild) &\text{ if \circuit.\type } = \circplus\\
\polyf(\circuit_\lchild) \cdot \polyf(\circuit_\rchild) &\text{ if \circuit.\type } = \circmult\\
\circuit.\val &\text{ if \circuit.\type } = \var \text{ OR } \tnum.
\end{cases}
\end{equation*}
\end{Definition}
2021-09-02 12:06:47 -04:00
$\circuit$ need not encode $\poly\inparen{\vct{X}}$ in the same, default \abbrSMB representation. For instance, $\circuit$ could encode the factorized representation $(X + 2Y)(2X - Y)$ of $\poly\inparen{\vct{X}} = 2X^2+3XY-2Y^2$, as shown in \Cref{fig:circuit}, while $\polyf(\circuit) = \poly\inparen{\vct{X}}$, the equivalent \abbrSMB representation.
\begin{Definition}[Circuit Set]\label{def:circuit-set}
2021-09-02 12:06:47 -04:00
$\circuitset{\polyX}$ is the set of all possible circuits $\circuit$ such that $\polyf(\circuit) = \polyX$.
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-09-02 12:06:47 -04:00
The circuit of \Cref{fig:circuit} is an element of $\circuitset{2X^2+3XY-2Y^2}$. One can think of $\circuitset{\polyX}$ as the infinite set of circuits where for each element \circuit, $\polyf\inparen{\circuit} = \polyX$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\medskip
\noindent We are now ready to formally state our \textbf{main problem}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[The Expected Result Multiplicity Problem]\label{def:the-expected-multipl}
2021-09-08 12:17:19 -04:00
Let $\pdb$ be an arbitrary \abbrBIDB-PDB and $\vct{X}$ be the set of variables annotating tuples in $\dbbase$. Fix a query $\query$ and an output tuple $\tup$.
The \expectProblem is defined as follows:\\[-7mm]
\begin{center}
2021-09-08 12:17:19 -04:00
\textbf{Input}: $\circuit \in \circuitset{\polyX}$ for $\polyX = \apolyqdt$
\hspace*{2mm}
\textbf{Output}: $\expct_{\vct{W} \sim \pdassign}[\apolyqdt(\vct{W})]$
\end{center}
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
2021-04-07 23:27:51 -04:00
%%% End: