Done with pass on S2

2021-04-07 23:27:51 -04:00 · 2021-04-07 23:27:51 -04:00 · 0708beac57
parent fcfb335581
commit 0708beac57
2 changed files with 26 additions and 17 deletions
--- a/poly-form.tex
+++ b/poly-form.tex
@ -4,24 +4,28 @@
 \subsection{Reduced Polynomials and Equivalences}

 We now introduce some terminology for polynomials and develop a reduced form for polynomials --- a closed form of the polynomial's expectation over probability distributions derived from a \bi or \ti.
-We will use $(X + Y)^2$ as a running example.
+%We will use $(X + Y)^2$ as a running example.
+Recall that a polynomial over $\vct{X}=(X_1,\dots,X_n)$ is formally defined as:
+\[Q(X_1,\dots,X_n)=\sum_{\vct{i}=(i_1,\dots,i_n)\in \semN^n} c_{\vct{i}}\cdot \prod_{j=1}^n X_j^{i_j}.\]

 \begin{Definition}[Standard Monomial Basis]\label{def:smb}
-A monomial is a product of variable terms, each raised to a non-negative integer power.
-  A polynomial in \termSMB (\abbrSMB) has the form: $\sum_{i=1}^n c_i \cdot m_i$ for each of its $n$ terms, where each $c_i \neq 0$ is an integer and each $m_i$ is a monomial and $m_i \neq m_j$ for $i \neq j$. We use $\smbOf{\poly}$ to denote the \abbrSMB of $\poly$.
+%A monomial is a product of variable terms, each raised to a non-negative integer power.
+%  A polynomial in \termSMB (\abbrSMB) has the form: $\sum_{i=1}^n c_i \cdot m_i$ for each of its $n$ terms, where each $c_i \neq 0$ is an integer and each $m_i$ is a monomial and $m_i \neq m_j$ for $i \neq j$. We use $\smbOf{\poly}$ to denote the \abbrSMB of $\poly$.
+The term $\prod_{j=1}^n X_j^{i_j}$ is a {\em monomial}. A polynomial $Q(\vct{X})$ is in standard monomial basis (or SMB) if in the above sum only terms with $c_{\vct{i}}\ne 0$ appear.
 \end{Definition}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-In this paper we consider the default representation of a polynomial to be in \abbrSMB.
-The \abbrSMB for the running example is $X^2 +2XY + Y^2$.  Note that the example's SOP expansion $X^2 + XY + XY + Y^2$ is is not $\smbOf{(X+Y)^2}$ since $XY$ appears twice.
+In this paper we consider the default representation of a polynomial to be in \abbrSMB. Sometimes when we want to stress that we want to use the SMB representation of a polynomial $\poly$ we will explicitly state $\smbOf{\poly}$.
+
+%The \abbrSMB for the running example is $X^2 +2XY + Y^2$.  Note that the example's SOP expansion $X^2 + XY + XY + Y^2$ is is not $\smbOf{(X+Y)^2}$ since $XY$ appears twice.

 % \BG{Maybe inline degree?}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{Definition}[Degree]\label{def:degree}
-The degree of polynomial $\poly(\vct{X})$ is the maximum sum of exponents, over all monomials in $\smbOf{\poly(\vct{X})}$.
+The degree of polynomial $\poly(\vct{X})$ is the largest $\sum_{j=1}^n i_j$ such that $c_{(i_1,\dots,i_n)}\ne 0$. % maximum sum of exponents, over all monomials in $\smbOf{\poly(\vct{X})}$.
 \end{Definition}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

-The degree of the running example polynomial is $2$. 
+The degree of the polynomial $X^2+2XY+Y^2$ is $2$. 
 Product terms in lineage arise only as a consequence of join operations, so intuitively, the degree of a lineage polynomial is analogous to the largest number of joins in any clause of the UCQ query that created it.
 In this paper we consider only finite degree polynomials.
 %
@ -38,7 +42,7 @@ We call a polynomial $\query(\vct{X})$ a \emph{\bi-lineage polynomial} (resp., \
 % OK: agreed w/ AH, this can be treated as implicit
 there exists a $\raPlus$ query $\query$, \bi $\pxdb$ (\ti $\pxdb$, or $\semNX$-PDB $\pxdb$), and tuple $\tup$ such that $\query(\vct{X}) = \query(\pxdb)(\tup)$. % Before proceeding, note that the following is assume that polynomials are  \bis (which subsume \tis as a special case).
 As a special case of \bis, the following applies to \tis as well.
-Recall that in a \bi $\pxdb$, tuples are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ where tuple $t_{i,j} \in \block_i$ is associated with a probability $\prob_{\tup_{i,j}} = \pd[X_{i,j} = 1]$, and is annotated with a unique variable $X_{i,j}$.\footnote{
+In a \bi $\pxdb$, tuples are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ where tuple $t_{i,j} \in \block_i$ is associated with a probability $\prob_{\tup_{i,j}} = \pd[X_{i,j} = 1]$, and is annotated with a unique variable $X_{i,j}$.\footnote{
  Although only a single independent, $[\abs{\block_i}+1]$-valued variable is customarily used per block, we decompose it into $\abs{\block_i}$ correlated $\{0,1\}$-valued variables per block that can be used directly in polynomials (without an indicator function).  For $t_j \in b_i$, the event $(X_{i,j} = 1)$ corresponds to the event $(X_i = j)$ in the customary annotation scheme.
 } 
 Because blocks are independent and tuples from the same block are disjoint, the probabilities $\prob_{\tup_{i,j}}$ and the blocks induce the probability distribution $\pd$ of $\pxdb$.
@ -91,7 +95,7 @@ Given the set of BIDB variables $\inset{X_{i,j}}$, define
  Let $\poly(\vct{X})$ be a \bi-lineage polynomial.
  The reduced form $\rpoly(\vct{X})$ of $\poly(\vct{X})$ is:
 \begin{equation*}
-\rpoly(\vct{X}) = \smbOf{\poly(\vct{X})} \mod \inparen{\mathcal{T} \cup \mathcal{B}}%X_i^2 - X_i \mod X_{\block_s, t}X_{\block_s, u}
+\rpoly(\vct{X}) = \poly(\vct{X}) \mod \inparen{\mathcal{T} \cup \mathcal{B}}%X_i^2 - X_i \mod X_{\block_s, t}X_{\block_s, u}
 \end{equation*}
 %for all $i$ in $[\numvar]$ and for all $s$ in $\ell$, such that for all $t, u$ in $[\abs{\block_s}]$, $t \neq u$.
 \end{Definition}
@ -99,7 +103,7 @@ Given the set of BIDB variables $\inset{X_{i,j}}$, define
 %

 All exponents $e > 1$ in $\smbOf{\poly(\vct{X})}$ are reduced to $e = 1$ via mod $\mathcal{T}$.  Performing the modulus of $\rpoly(\vct{X})$ with $\mathcal{B}$ ensures the disjoint condition of \bi, removing monomials with lineage variables from the same block.%, (recall the constraint on tuples from the same block being disjoint in a \bi).% any monomial containing more than one tuple from a block has $0$ probability and can be ignored). 
-
+%
 For the special case of \tis, the second step is not necessary since every block contains a single tuple.
 %Alternatively, one can think of $\rpoly$ as the \abbrSMB of $\poly(\vct{X})$ when the product operator is idempotent.
 %
@ -113,6 +117,8 @@ For the special case of \tis, the second step is not necessary since every block
 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%Removing this example to save space
+\iffalse
 \begin{Example}\label{example:qtilde}
 Consider $\poly(X, Y) = (X + Y)(X + Y)$ where $X$ and $Y$ are from different blocks.  The expanded derivation for $\rpoly(X, Y)$ is
 \begin{align*}
@ -121,6 +127,7 @@ Consider $\poly(X, Y) = (X + Y)(X + Y)$ where $X$ and $Y$ are from different blo
 = ~& X + 2XY + Y
 \end{align*}
 \end{Example}
+\fi
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %
 % Intuitively, $\rpoly(\textbf{X})$ is the \abbrSMB form of $\poly(\textbf{X})$ such that if any $X_j$ term  has an exponent $e > 1$, it is reduced to $1$, i.e. $X_j^e\mapsto X_j$ for any $e > 1$.
--- a/prob-def.tex
+++ b/prob-def.tex
@ -4,15 +4,17 @@
 \subsection{Problem Definition}\label{sec:expression-trees}

 We first formally define circuits, an encoding of polynomials that we use throughout the paper.  Since we are particularly using \emph{lineage} circuits, we drop the term lineage and only refer to them as circuits.
-
+%
 For illustrative purposes consider the polynomial $\poly(\vct{X}) = 2X^2 + 3XY - 2Y^2$ over $\vct{X} = [X, Y]$.

 We represent query polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way.

 \begin{Definition}[Circuit]\label{def:circuit}
-A circuit $\circuit$ is a Directed Acyclic Graph (DAG) whose source nodes (in degree of $0$) consist of elements in either $\reals$ or $\vct{X}$.  The internal nodes and sink node of $\circuit$ have binary input and are either sum ($\circplus$) or product ($\circmult$) gates.  
+A circuit $\circuit$ is a Directed Acyclic Graph (DAG) whose source nodes (in degree of $0$) consist of elements in either $\reals$ or $\vct{X}$.  The internal nodes and (the single) sink node of $\circuit$ (corresponding to the result tuple $t$) have binary input and are either sum ($\circplus$) or product ($\circmult$) gates.  

-$\circuit$ additionally has the following members: \type, \vari{val}, \vari{partial}, \vari{input}, \degval and \vari{Lweight}, \vari{Rweight}, where \type is the type of value stored in the node $\circuit$ (i.e. one of $\{\circplus, \circmult, \var, \tnum\}$, \val is the value stored (a constant or variable), and \vari{input} is the list of \circuit 's inputs where $\circuit_\linput$ is the left input and $\circuit_\rinput$ the right input.  The member \degval holds the degree of \circuit.  When the underlying DAG is a tree (with edges pointing towards the root), we will refer to the structure as an expression tree \etree.  Note that in such a case, the root of \etree is analogous to the sink of \circuit.
+$\circuit$ additionally has the following members: \type, \vari{val}, \vari{partial}, \vari{input}, \degval and \vari{Lweight}, \vari{Rweight}, where \type is the type of value stored in the node $\circuit$ (i.e. one of $\{\circplus, \circmult, \var, \tnum\}$, \val is the value stored (a constant or variable), and \vari{input} is the list of \circuit 's inputs where $\circuit_\linput$ is the left input and $\circuit_\rinput$ the right input.  
+%The member \degval holds the degree of \circuit.  
+When the underlying DAG is a tree (with edges pointing towards the root), we will refer to the structure as an expression tree \etree.  Note that in such a case, the root of \etree is analogous to the sink of \circuit.
 \end{Definition}

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -91,7 +93,7 @@ The circuit \circuit in \Cref{fig:circuit-express-tree} encodes the polynomial $
 \end{figure}


-The semantics of circuits follows the obvious interpretation.  We next define its realtionship with polynomials formally:
+The semantics of circuits follows the obvious interpretation.  We next define its relationship with polynomials formally:
 \begin{Definition}[$\polyf(\cdot)$]\label{def:poly-func}
 Denote $\polyf(\circuit)$ to be the function from circuit $\circuit$ to its corresponding polynomial.  $\polyf(\cdot)$ is recursively defined on $\circuit$ as follows, with addition and multiplication following the standard interpretation for polynomials:
 \begin{equation*}
@ -111,7 +113,7 @@ $\circuitset{\smb}$ is the set of all possible circuits $\circuit$ such that $\p

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

-The circuit of \Cref{fig:circuit} is an element of $\circuitset{\smb}$.  One can think of $\circuitset{\smb}$ as the infinite set of circuits each of which model an encoding (factorization) equal to $\polyf(\circuit)$.   
+The circuit of \Cref{fig:circuit} is an element of $\circuitset{2X^2+3XY-2Y^2}$.  One can think of $\circuitset{\smb}$ as the infinite set of circuits each of which model an encoding (factorization) equal to $\polyf(\circuit)$.   
 %\supset \{2X^2 + 3XY - 2Y^2, (X + 2Y)(2X - Y), X(2X - Y) + 2Y(2X - Y), 2X(X + 2Y) - Y(X + 2Y)\}$.  
 Note that \Cref{def:circuit-set} implies that $\circuit \in \circuitset{\polyf(\circuit)}$.

@ -121,7 +123,7 @@ Note that \Cref{def:circuit-set} implies that $\circuit \in \circuitset{\polyf(\
 \noindent We are now ready to formally state our \textbf{main problem}.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{Definition}[The Expected Result Multiplicity Problem]\label{def:the-expected-multipl}
-Let $\vct{X} = (X_1, \ldots, X_n)$, and $\pdb$ be an $\semNX$-PDB over $\vct{X}$ with probability distribution $\pd$ over assignments $\vct{X}  \to [0,1]$, $\query$ an n-ary query, and $t$ an n-ary tuple.
+Let $\vct{X} = (X_1, \ldots, X_n)$, and $\pdb$ be an $\semNX$-PDB over $\vct{X}$ with probability distribution $\pd$ over assignments $\vct{X}  \to \{0,1\}$, $\query$ an n-ary query, and $t$ an n-ary tuple.
  The \expectProblem is defined as follows:\\[-7mm]
 \begin{center}
 \textbf{Input}: A circuit $\circuit \in \circuitset{\smb}$ for $\poly(\vct{X}) = \query(\pxdb)(t)$
@ -136,4 +138,4 @@ Let $\vct{X} = (X_1, \ldots, X_n)$, and $\pdb$ be an $\semNX$-PDB over $\vct{X}$
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: "main"
-%%% End:
+%%% End: