paper-BagRelationalPDBsAreHard/prob-def.tex

%root: main.tex
%!TEX root=./main.tex

\subsection{Problem Definition}\label{sec:expression-trees}

%We first formally define circuits, an encoding of polynomials that we use throughout the paper.  
%
%For illustrative purposes consider the polynomial $\poly(\vct{X}) = 2X^2 + 3XY - 2Y^2$ over $\vct{X} = [X, Y]$.

We represent lineage polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way.  Since we are particularly using circuits to model lineage polynomials, we can refer to these circuits as lineage circuits.  However, when the meaning is clear, we will drop the term lineage and only refer to them as circuits.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Circuit]\label{def:circuit}
A circuit $\circuit$ is a Directed Acyclic Graph (DAG) whose source gates (in degree of $0$) consist of elements in either $\domN$ or $\vct{X}$.  For each output tuple there exists one source gate.  The internal gates have binary input and are either sum ($\circplus$) or product ($\circmult$) gates.
%
Each gate has the following members: \type, \vpartial, \vari{input}, \degval, \vari{Lweight}, and \vari{Rweight}, where \type is the value type $\{\circplus, \circmult, \var, \tnum\}$ and \vari{input} the list of inputs. Source gates have an additional member \val storing the value.  $\circuit_\linput$ ($\circuit_\rinput$) denotes the left (right) input of \circuit.
\end{Definition}
When the underlying DAG is a tree (with edges pointing towards the root), the structure is an expression tree \etree.  In such a case, the root of \etree is analogous to the sink of \circuit.  The fields \vari{partial}, \degval, \vari{Lweight}, and \vari{Rweight} are used in the proofs of \Cref{sec:proofs-approx-alg}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%As stated in \Cref{def:circuit}, every internal node has at most two incoming edges, is labeled as an addition or a multiplication node, and has no limit on its outdegree.
%Note that if we limit the outdegree to one, then we get expression trees.

The circuits in \Cref{fig:two-step} encode their respective polynomials in column $\poly$.
%\circuit in \Cref{fig:circuit-express-tree} encodes the polynomial $XY + WZ$.  
Note that each circuit \circuit encodes a tree, with edges pointing towards the root.


%\begin{figure}[t]
%	\begin{subfigure}[b]{0.45\linewidth}
%		\centering
%		\begin{tikzpicture}[thick]
%			\node[tree_node] (a1) at (0, 0){$\boldsymbol{X}$};
%			\node[tree_node] (b1) at (1, 0){$\boldsymbol{Y}$};
%			\node[tree_node] (c1) at (2, 0){$\boldsymbol{W}$};
%			\node[tree_node] (d1) at (3, 0){$\boldsymbol{Z}$};
%
%			\node[tree_node] (a2) at (0.5, 1){$\boldsymbol{\circmult}$};
%			\node[tree_node] (b2) at (2.5, 1){$\boldsymbol{\circmult}$};
%
%			\node[tree_node] (a3) at (1.5, 2){$\boldsymbol{\circplus}$};
%
%			\draw[->] (a1) -- (a2);
%			\draw[->] (b1) -- (a2);
%			\draw[->] (c1) -- (b2);
%			\draw[->] (d1) -- (b2);
%			\draw[->] (a2) -- (a3);
%			\draw[->] (b2) -- (a3);
%			\draw[->] (a3) -- (1.5, 2.5);
%		\end{tikzpicture}
%		\caption{Circuit encoding $XY + WZ$, a special case of an expression tree}
%		\label{fig:circuit-express-tree}
%	\end{subfigure}
%	\hspace{5mm}
	\begin{wrapfigure}{l}{0.45\linewidth}
		\centering
		\begin{tikzpicture}[thick]
			\node[tree_node] (a1) at (0, 0) {$\boldsymbol{X}$};
			\node[tree_node] (b1) at (1.5, 0) {$\boldsymbol{2}$};
			\node[tree_node] (c1) at (3, 0) {$\boldsymbol{Y}$};
			\node[tree_node] (d1) at (4.5, 0) {$\boldsymbol{-1}$};

			\node[tree_node] (a2) at (0.75, 0.75) {$\boldsymbol{\circmult}$};
			\node[tree_node] (b2) at (2.25, 0.75) {$\boldsymbol{\circmult}$};
			\node[tree_node] (c2) at (3.75, 0.75) {$\boldsymbol{\circmult}$};

			\node[tree_node] (a3) at (0.55, 1.5) {$\boldsymbol{\circplus}$};
			\node[tree_node] (b3) at (3.75, 1.5) {$\boldsymbol{\circplus}$};

			\node[tree_node] (a4) at (2.25, 2.25) {$\boldsymbol{\circmult}$};

			\draw[->] (a1) -- (a2);
			\draw[->] (a1) -- (a3);
			\draw[->] (b1) -- (a2);
			\draw[->] (b1) -- (b2);
			\draw[->] (c1) -- (c2);
			\draw[->] (c1) -- (b2);
			\draw[->] (d1) -- (c2);
			\draw[->] (a2) -- (b3);
			\draw[->] (b2) -- (a3);
			\draw[->] (c2) -- (b3);
			\draw[->] (a3) -- (a4);
			\draw[->] (b3) -- (a4);
			\draw[->] (a4) -- (2.25, 2.75);
		\end{tikzpicture}
		\caption{Circuit encoding of $(X + 2Y)(2X - Y)$}
		\label{fig:circuit}
	\end{wrapfigure}
%	\caption{Example circuit encodings}
%\end{figure}
We next formally define the relationship of circuits with polynomials.  While the definition assumes one sink for notational convenience, it easily generalizes to the multiple sinks case.
\begin{Definition}[$\polyf(\cdot)$]\label{def:poly-func}
Denote $\polyf(\circuit)$ to be the function from circuit $\circuit$ to its corresponding polynomial (in \abbrSMB).\footnote{Recall our assumption that unless otherwise mentioned, all polynomials are considered in $\abbrSMB$.}  $\polyf(\cdot)$ is recursively defined on $\circuit$ as follows, with addition and multiplication following the standard interpretation for polynomials:
\begin{equation*}
	\polyf(\circuit) = \begin{cases}
					\polyf(\circuit_\lchild) + \polyf(\circuit_\rchild)			&\text{ if \circuit.\type } = \circplus\\
					\polyf(\circuit_\lchild) \cdot \polyf(\circuit_\rchild)		&\text{ if \circuit.\type } = \circmult\\
					\circuit.\val									&\text{ if \circuit.\type } = \var \text{ OR } \tnum.
				\end{cases}
\end{equation*}
\end{Definition}

$\circuit$ need not encode $\poly\inparen{\vct{X}}$ in the same, default \abbrSMB representation.  For instance, $\circuit$ could encode the factorized representation $(X + 2Y)(2X - Y)$ of $\poly\inparen{\vct{X}} = 2X^2+3XY-2Y^2$, as shown in \Cref{fig:circuit}, while $\polyf(\circuit) = \poly\inparen{\vct{X}}$,  the equivalent \abbrSMB representation.

\begin{Definition}[Circuit Set]\label{def:circuit-set}
$\circuitset{\polyX}$ is the set of all possible circuits $\circuit$ such that $\polyf(\circuit) = \polyX$.
\end{Definition}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The circuit of \Cref{fig:circuit} is an element of $\circuitset{2X^2+3XY-2Y^2}$.  One can think of $\circuitset{\polyX}$ as the infinite set of circuits where for each element \circuit, $\polyf\inparen{\circuit} = \polyX$.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\medskip

\noindent We are now ready to formally state our \textbf{main problem}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[The Expected Result Multiplicity Problem]\label{def:the-expected-multipl}
Let $\pdb$ be an arbitrary \abbrBIDB-PDB and $\vct{X}$ be the set of variables annotating tuples in $\dbbase$.  Fix a query $\query$ and an output tuple $\tup$.
  The \expectProblem is defined as follows:\\[-7mm]
\begin{center}
\textbf{Input}: $\circuit \in \circuitset{\polyX}$ for $\polyX = \apolyqdt$
\hspace*{2mm}
\textbf{Output}: $\expct_{\vct{W} \sim \pdassign}[\apolyqdt(\vct{W})]$
\end{center}
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End:
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`%root: main.tex`
Read through: Space, grammar, notation fixes 2021-04-07 01:02:46 -04:00			`%!TEX root=./main.tex`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00
			`\subsection{Problem Definition}\label{sec:expression-trees}`

Restructuring S.2. 2021-09-02 12:06:47 -04:00			`%We first formally define circuits, an encoding of polynomials that we use throughout the paper.`
Done with pass on S2 2021-04-07 23:27:51 -04:00			`%`
Restructuring S.2. 2021-09-02 12:06:47 -04:00			`%For illustrative purposes consider the polynomial $\poly(\vct{X}) = 2X^2 + 3XY - 2Y^2$ over $\vct{X} = [X, Y]$.`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00
More changes S 2 2021-09-07 11:32:06 -04:00			`We represent lineage polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way. Since we are particularly using circuits to model lineage polynomials, we can refer to these circuits as lineage circuits. However, when the meaning is clear, we will drop the term lineage and only refer to them as circuits.`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00
multi 2021-04-10 14:11:35 -04:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\begin{Definition}[Circuit]\label{def:circuit}`
Finished S2 pass. 2021-09-08 12:17:19 -04:00			`A circuit $\circuit$ is a Directed Acyclic Graph (DAG) whose source gates (in degree of $0$) consist of elements in either $\domN$ or $\vct{X}$. For each output tuple there exists one source gate. The internal gates have binary input and are either sum ($\circplus$) or product ($\circmult$) gates.`
multi 2021-04-10 14:11:35 -04:00			`%`
Finished S2 pass. 2021-09-08 12:17:19 -04:00			`Each gate has the following members: \type, \vpartial, \vari{input}, \degval, \vari{Lweight}, and \vari{Rweight}, where \type is the value type $\{\circplus, \circmult, \var, \tnum\}$ and \vari{input} the list of inputs. Source gates have an additional member \val storing the value. $\circuit_\linput$ ($\circuit_\rinput$) denotes the left (right) input of \circuit.`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\end{Definition}`
Finished S2 pass. 2021-09-08 12:17:19 -04:00			`When the underlying DAG is a tree (with edges pointing towards the root), the structure is an expression tree \etree. In such a case, the root of \etree is analogous to the sink of \circuit. The fields \vari{partial}, \degval, \vari{Lweight}, and \vari{Rweight} are used in the proofs of \Cref{sec:proofs-approx-alg}.`
Restructuring S.2. 2021-09-02 12:06:47 -04:00
multi 2021-04-10 14:11:35 -04:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00

Restructuring S.2. 2021-09-02 12:06:47 -04:00			`%As stated in \Cref{def:circuit}, every internal node has at most two incoming edges, is labeled as an addition or a multiplication node, and has no limit on its outdegree.`
			`%Note that if we limit the outdegree to one, then we get expression trees.`
Read through: Space, grammar, notation fixes 2021-04-07 01:02:46 -04:00
Finished S2 pass. 2021-09-08 12:17:19 -04:00			`The circuits in \Cref{fig:two-step} encode their respective polynomials in column $\poly$.`
			`%\circuit in \Cref{fig:circuit-express-tree} encodes the polynomial $XY + WZ$.`
			`Note that each circuit \circuit encodes a tree, with edges pointing towards the root.`
Changed \pdb to \dbbase in S 1 where appropriate. 2021-09-09 09:37:45 -04:00
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00
Finished S2 pass. 2021-09-08 12:17:19 -04:00			`%\begin{figure}[t]`
			`% \begin{subfigure}[b]{0.45\linewidth}`
			`% \centering`
			`% \begin{tikzpicture}[thick]`
			`% \node[tree_node] (a1) at (0, 0){$\boldsymbol{X}$};`
			`% \node[tree_node] (b1) at (1, 0){$\boldsymbol{Y}$};`
			`% \node[tree_node] (c1) at (2, 0){$\boldsymbol{W}$};`
			`% \node[tree_node] (d1) at (3, 0){$\boldsymbol{Z}$};`
			`%`
			`% \node[tree_node] (a2) at (0.5, 1){$\boldsymbol{\circmult}$};`
			`% \node[tree_node] (b2) at (2.5, 1){$\boldsymbol{\circmult}$};`
			`%`
			`% \node[tree_node] (a3) at (1.5, 2){$\boldsymbol{\circplus}$};`
			`%`
			`% \draw[->] (a1) -- (a2);`
			`% \draw[->] (b1) -- (a2);`
			`% \draw[->] (c1) -- (b2);`
			`% \draw[->] (d1) -- (b2);`
			`% \draw[->] (a2) -- (a3);`
			`% \draw[->] (b2) -- (a3);`
			`% \draw[->] (a3) -- (1.5, 2.5);`
			`% \end{tikzpicture}`
			`% \caption{Circuit encoding $XY + WZ$, a special case of an expression tree}`
			`% \label{fig:circuit-express-tree}`
			`% \end{subfigure}`
			`% \hspace{5mm}`
			`\begin{wrapfigure}{l}{0.45\linewidth}`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\centering`
			`\begin{tikzpicture}[thick]`
			`\node[tree_node] (a1) at (0, 0) {$\boldsymbol{X}$};`
			`\node[tree_node] (b1) at (1.5, 0) {$\boldsymbol{2}$};`
			`\node[tree_node] (c1) at (3, 0) {$\boldsymbol{Y}$};`
			`\node[tree_node] (d1) at (4.5, 0) {$\boldsymbol{-1}$};`
shorten 2021-04-08 22:30:03 -04:00
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\node[tree_node] (a2) at (0.75, 0.75) {$\boldsymbol{\circmult}$};`
			`\node[tree_node] (b2) at (2.25, 0.75) {$\boldsymbol{\circmult}$};`
			`\node[tree_node] (c2) at (3.75, 0.75) {$\boldsymbol{\circmult}$};`
shorten 2021-04-08 22:30:03 -04:00
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\node[tree_node] (a3) at (0.55, 1.5) {$\boldsymbol{\circplus}$};`
			`\node[tree_node] (b3) at (3.75, 1.5) {$\boldsymbol{\circplus}$};`
shorten 2021-04-08 22:30:03 -04:00
			`\node[tree_node] (a4) at (2.25, 2.25) {$\boldsymbol{\circmult}$};`

Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\draw[->] (a1) -- (a2);`
More changes, added Introduction (previous/current) outlines. 2021-06-17 15:21:34 -04:00			`\draw[->] (a1) -- (a3);`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\draw[->] (b1) -- (a2);`
			`\draw[->] (b1) -- (b2);`
			`\draw[->] (c1) -- (c2);`
			`\draw[->] (c1) -- (b2);`
			`\draw[->] (d1) -- (c2);`
			`\draw[->] (a2) -- (b3);`
			`\draw[->] (b2) -- (a3);`
			`\draw[->] (c2) -- (b3);`
			`\draw[->] (a3) -- (a4);`
			`\draw[->] (b3) -- (a4);`
			`\draw[->] (a4) -- (2.25, 2.75);`
			`\end{tikzpicture}`
			`\caption{Circuit encoding of $(X + 2Y)(2X - Y)$}`
			`\label{fig:circuit}`
Finished S2 pass. 2021-09-08 12:17:19 -04:00			`\end{wrapfigure}`
			`% \caption{Example circuit encodings}`
			`%\end{figure}`
			`We next formally define the relationship of circuits with polynomials. While the definition assumes one sink for notational convenience, it easily generalizes to the multiple sinks case.`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\begin{Definition}[$\polyf(\cdot)$]\label{def:poly-func}`
Partial pass on S.3 with notes. 2021-08-25 12:36:08 -04:00			`Denote $\polyf(\circuit)$ to be the function from circuit $\circuit$ to its corresponding polynomial (in \abbrSMB).\footnote{Recall our assumption that unless otherwise mentioned, all polynomials are considered in $\abbrSMB$.} $\polyf(\cdot)$ is recursively defined on $\circuit$ as follows, with addition and multiplication following the standard interpretation for polynomials:`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\begin{equation*}`
			`\polyf(\circuit) = \begin{cases}`
			`\polyf(\circuit_\lchild) + \polyf(\circuit_\rchild) &\text{ if \circuit.\type } = \circplus\\`
			`\polyf(\circuit_\lchild) \cdot \polyf(\circuit_\rchild) &\text{ if \circuit.\type } = \circmult\\`
			`\circuit.\val &\text{ if \circuit.\type } = \var \text{ OR } \tnum.`
			`\end{cases}`
			`\end{equation*}`
			`\end{Definition}`

Restructuring S.2. 2021-09-02 12:06:47 -04:00			`$\circuit$ need not encode $\poly\inparen{\vct{X}}$ in the same, default \abbrSMB representation. For instance, $\circuit$ could encode the factorized representation $(X + 2Y)(2X - Y)$ of $\poly\inparen{\vct{X}} = 2X^2+3XY-2Y^2$, as shown in \Cref{fig:circuit}, while $\polyf(\circuit) = \poly\inparen{\vct{X}}$, the equivalent \abbrSMB representation.`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00
			`\begin{Definition}[Circuit Set]\label{def:circuit-set}`
Restructuring S.2. 2021-09-02 12:06:47 -04:00			`$\circuitset{\polyX}$ is the set of all possible circuits $\circuit$ such that $\polyf(\circuit) = \polyX$.`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\end{Definition}`

			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`

Restructuring S.2. 2021-09-02 12:06:47 -04:00			`The circuit of \Cref{fig:circuit} is an element of $\circuitset{2X^2+3XY-2Y^2}$. One can think of $\circuitset{\polyX}$ as the infinite set of circuits where for each element \circuit, $\polyf\inparen{\circuit} = \polyX$.`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
			`\medskip`

			`\noindent We are now ready to formally state our \textbf{main problem}.`
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
			`\begin{Definition}[The Expected Result Multiplicity Problem]\label{def:the-expected-multipl}`
Finished S2 pass. 2021-09-08 12:17:19 -04:00			`Let $\pdb$ be an arbitrary \abbrBIDB-PDB and $\vct{X}$ be the set of variables annotating tuples in $\dbbase$. Fix a query $\query$ and an output tuple $\tup$.`
Read through: Space, grammar, notation fixes 2021-04-07 01:02:46 -04:00			`The \expectProblem is defined as follows:\\[-7mm]`
			`\begin{center}`
Finished S2 pass. 2021-09-08 12:17:19 -04:00			`\textbf{Input}: $\circuit \in \circuitset{\polyX}$ for $\polyX = \apolyqdt$`
			`\hspace*{2mm}`
			`\textbf{Output}: $\expct_{\vct{W} \sim \pdassign}[\apolyqdt(\vct{W})]$`
Read through: Space, grammar, notation fixes 2021-04-07 01:02:46 -04:00			`\end{center}`
Finished my first past implementing Reviewer Suggestions. 2021-03-10 13:28:04 -05:00			`\end{Definition}`
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`




			`%%% Local Variables:`
			`%%% mode: latex`
			`%%% TeX-master: "main"`
Done with pass on S2 2021-04-07 23:27:51 -04:00			`%%% End:`