2020-06-12 11:45:15 -04:00
%root: main.tex
2020-06-26 17:27:52 -04:00
%!TEX root = ./main.tex
2020-07-14 11:45:57 -04:00
%\onecolumn
2020-12-14 23:34:12 -05:00
\subsection { Reduced Polynomials and Equivalences}
2020-06-12 11:45:15 -04:00
2021-04-08 22:30:03 -04:00
We now introduce some terminology % for polynomials
2021-04-10 13:20:30 -04:00
and develop a reduced form (a closed form of the polynomial's expectation) for polynomials over probability distributions derived from a \bi or \ti .
2021-04-07 23:27:51 -04:00
%We will use $(X + Y)^2$ as a running example.
2021-06-09 12:42:26 -04:00
Note that a polynomial over $ \vct { X } = ( X _ 1 , \dots ,X _ n ) $ is formally defined as (with $ c _ \vct { i } \in \domN $ ):
2021-04-08 22:30:03 -04:00
\begin { equation}
\label { eq:sop-form}
2021-06-09 12:42:26 -04:00
\poly \inparen { X_ 1,\dots ,X_ n} =\sum _ { \vct { d} =(d_ 1,\dots ,d_ n)\in \semN ^ n} c_ { \vct { d} } \cdot \prod _ { i=1} ^ n X_ i^ { d_ i} .
2021-04-08 22:30:03 -04:00
\end { equation}
2020-12-03 10:32:09 -05:00
2021-04-08 22:30:03 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-07 15:12:39 -05:00
\begin { Definition} [Standard Monomial Basis]\label { def:smb}
2021-06-09 12:42:26 -04:00
From above, the term $ \prod _ { i = 1 } ^ n X _ i ^ { d _ i } $ is a { \em monomial} . A polynomial $ \poly \inparen { \vct { X } } $ is in standard monomial basis (\abbrSMB ) when we keep only the terms with $ c _ { \vct { i } } \ne 0 $ from \Cref { eq:sop-form} .
2020-12-03 10:32:09 -05:00
\end { Definition}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-04-10 14:39:54 -04:00
We consider \abbrSMB as the default representation of a polynomial.
2021-06-09 12:42:26 -04:00
We use $ \smbOf { \poly } $ to denote the \abbrSMB form of a polynomial $ \poly $ .
2021-04-07 23:27:51 -04:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-14 23:34:12 -05:00
\begin { Definition} [Degree]\label { def:degree}
2021-04-10 14:39:54 -04:00
The degree of polynomial $ \poly ( \vct { X } ) $ is the largest $ \sum _ { i = 1 } ^ n d _ i $ such that $ c _ { ( d _ 1 , \dots ,d _ n ) } \ne 0 $ . % maximum sum of exponents, over all monomials in $\smbOf{\poly(\vct{X})}$.
2020-12-14 23:34:12 -05:00
\end { Definition}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-09-07 12:30:07 -04:00
2021-04-08 22:30:03 -04:00
The degree of the polynomial $ X ^ 2 + 2 XY + Y ^ 2 $ is $ 2 $ .
2021-04-10 09:48:26 -04:00
Product terms in lineage arise only from join operations (\Cref { fig:nxDBSemantics} ), so intuitively, the degree of a lineage polynomial is analogous to the largest number of joins in any clause of the UCQ query that created it.
2020-12-20 17:13:52 -05:00
In this paper we consider only finite degree polynomials.
2021-06-09 12:42:26 -04:00
We call a polynomial $ \poly \inparen { \vct { X } } $ a \emph { \bi -lineage polynomial} (resp., \emph { \ti -lineage polynomial} , or simply lineage polynomial), if there exists a \AH { Which formalism? UCQ?} $ \raPlus $ query $ \query $ , \bi $ \pxdb $ (\ti $ \pxdb $ , or $ \semNX $ -PDB $ \pxdb $ ), and tuple $ \tup $ such that $ \poly \inparen { \vct { X } } = \query ( \pxdb ) ( \tup ) $ .
2020-09-07 12:30:07 -04:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-03-06 20:34:18 -05:00
\begin { Definition} [Modding with a set]\label { def:mod-set}
2020-12-19 00:45:30 -05:00
Let $ S $ be a { \em set} of polynomials over $ \vct { X } $ . Then $ \poly ( \vct { X } ) \mod { S } $ is the polynomial obtained by taking the mod of $ \poly ( \vct { X } ) $ over { \em all} polynomials in $ S $ (order does not matter).
2021-03-06 20:34:18 -05:00
\end { Definition}
2021-03-26 13:01:41 -04:00
For example for a set of polynomials $ S = \inset { X ^ 2 - X, Y ^ 2 - Y } $ , taking the polynomial $ 2 X ^ 2 + 3 XY - 2 Y ^ 2 \mod S $ yields $ 2 X + 3 XY - 2 Y $ .
2020-12-19 00:45:30 -05:00
%
2021-04-10 00:19:16 -04:00
\begin { Definition} [$ \mathcal B $ , $ \mathcal T $ ]\label { def:mod-set-polys}
2021-04-07 01:02:46 -04:00
Given the set of BIDB variables $ \inset { X _ { i,j } } $ , define
2021-03-09 11:43:38 -05:00
\setlength \parindent { 0pt}
\vspace * { -3mm}
{ \small
\begin { tabular} { @{ } l l}
\begin { minipage} [b]{ 0.45\linewidth }
\centering
\begin { equation*}
2021-04-07 01:02:46 -04:00
\mathcal { B} =\comprehension { X_ { i,j} \cdot X_ { i,j'} } { i \in [\ell ], j\neq j' \in [~\abs { \block _ i} ~]}
2021-03-09 11:43:38 -05:00
\end { equation*}
\end { minipage} %
\hspace { 13mm}
&
\begin { minipage} [b]{ 0.45\linewidth }
\centering
\begin { equation*}
2021-04-07 01:02:46 -04:00
\mathcal { T} =\comprehension { X_ { i,j} ^ 2-X_ { i,j} } { i \in [\ell ], j \in [~\abs { \block _ i} ~]}
2021-03-09 11:43:38 -05:00
\end { equation*}
\end { minipage}
\\
\end { tabular}
}
2020-12-18 11:39:38 -05:00
\end { Definition}
2020-12-19 00:45:30 -05:00
%
2020-12-14 23:34:12 -05:00
\begin { Definition} [Reduced \bi Polynomials]\label { def:reduced-bi-poly}
Let $ \poly ( \vct { X } ) $ be a \bi -lineage polynomial.
2021-04-08 22:30:03 -04:00
The reduced form $ \rpoly ( \vct { X } ) $ of $ \poly ( \vct { X } ) $ is: $ \rpoly ( \vct { X } ) = \poly ( \vct { X } ) \mod \inparen { \mathcal { T } \cup \mathcal { B } } $
% \begin{equation*}
% \rpoly(\vct{X}) = \poly(\vct{X}) \mod \inparen{\mathcal{T} \cup \mathcal{B}}%X_i^2 - X_i \mod X_{\block_s, t}X_{\block_s, u}
% \end{equation*}
2020-12-18 11:39:38 -05:00
%for all $i$ in $[\numvar]$ and for all $s$ in $\ell$, such that for all $t, u$ in $[\abs{\block_s}]$, $t \neq u$.
2020-12-11 20:19:45 -05:00
\end { Definition}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-19 00:45:30 -05:00
%
2020-12-20 18:29:52 -05:00
2021-04-10 00:19:16 -04:00
All exponents $ e > 1 $ in $ \smbOf { \poly ( \vct { X } ) } $ are reduced to $ e = 1 $ via mod $ \mathcal { T } $ . Performing the modulus of $ \rpoly ( \vct { X } ) $ with $ \mathcal { B } $ ensures the disjoint condition of \bi , removing monomials with lineage variables from the same block.
%, (recall the constraint on tuples from the same block being disjoint in a \bi).% any monomial containing more than one tuple from a block has $0$ probability and can be ignored).
2021-04-07 23:27:51 -04:00
%
2020-12-19 00:45:30 -05:00
For the special case of \tis , the second step is not necessary since every block contains a single tuple.
2020-12-20 00:13:58 -05:00
%Alternatively, one can think of $\rpoly$ as the \abbrSMB of $\poly(\vct{X})$ when the product operator is idempotent.
2020-12-19 00:45:30 -05:00
%
2020-12-14 23:34:12 -05:00
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \begin{Definition}[$\rpoly(\vct{X})$] \label{def:qtilde}
% Define $\rpoly(X_1,\ldots, X_\numvar)$ as the reduced version of $\poly(X_1,\ldots, X_\numvar)$, of the form
% $\rpoly(X_1,\ldots, X_\numvar) = $
2020-07-08 16:48:37 -04:00
2020-12-14 23:34:12 -05:00
% \[\poly(X_1,\ldots, X_\numvar) \mod X_1^2-X_1\cdots\mod X_\numvar^2 - X_\numvar.\]
% \end{Definition}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-19 00:45:30 -05:00
%
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-04-07 23:27:51 -04:00
%%Removing this example to save space
\iffalse
2020-09-16 16:27:50 -04:00
\begin { Example} \label { example:qtilde}
2020-12-19 00:45:30 -05:00
Consider $ \poly ( X, Y ) = ( X + Y ) ( X + Y ) $ where $ X $ and $ Y $ are from different blocks. The expanded derivation for $ \rpoly ( X, Y ) $ is
2020-08-20 14:01:56 -04:00
\begin { align*}
2020-12-16 12:38:21 -05:00
(& X^ 2 + 2XY + Y^ 2 \mod X^ 2 - X) \mod Y^ 2 - Y\\
= ~& X + 2XY + Y^ 2 \mod Y^ 2 - Y\\
= ~& X + 2XY + Y
2020-08-20 14:01:56 -04:00
\end { align*}
2020-09-16 16:27:50 -04:00
\end { Example}
2021-04-07 23:27:51 -04:00
\fi
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-19 00:45:30 -05:00
%
2020-12-14 23:34:12 -05:00
% Intuitively, $\rpoly(\textbf{X})$ is the \abbrSMB form of $\poly(\textbf{X})$ such that if any $X_j$ term has an exponent $e > 1$, it is reduced to $1$, i.e. $X_j^e\mapsto X_j$ for any $e > 1$.
2020-12-19 00:45:30 -05:00
%
2020-12-14 23:34:12 -05:00
%When considering $\bi$ input, it becomes necessary to redefine $\rpoly(\vct{X})$.
2020-12-19 00:45:30 -05:00
%
2020-12-20 00:13:58 -05:00
%\noindent The usefulness of this will reduction become clear in \Cref{lem:exp-poly-rpoly}.
2020-12-19 00:45:30 -05:00
%
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-19 15:04:14 -05:00
\begin { Definition} [Valid Worlds]
2021-04-08 22:30:03 -04:00
For probability distribution $ \probDist $ , % and its corresponding probability mass function $\probOf$,
2021-06-09 12:42:26 -04:00
the set of valid worlds $ \eta $ consists of all the worlds with probability value greater than $ 0 $ ; i.e., for random world variable vector $ \vct { W } $
2020-12-19 15:04:14 -05:00
\[
2021-04-07 01:02:46 -04:00
\eta = \comprehension { \vct { w} } { \probOf [\vct{W} = \vct{w}] > 0}
2020-12-19 15:04:14 -05:00
\]
\end { Definition}
2020-07-08 16:48:37 -04:00
2021-04-10 09:48:26 -04:00
%We state additional equivalences between $\poly(\vct{X})$ and $\rpoly(\vct{X})$ in \Cref{app:subsec-pre-poly-rpoly} and \Cref{app:subsec-prop-q-qtilde}.
2021-04-07 01:02:46 -04:00
\noindent Next, we show why the reduced form is useful for our purposes:
2020-12-19 15:04:14 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-14 13:58:56 -05:00
2020-12-17 17:08:48 -05:00
2020-12-18 11:39:38 -05:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-17 17:08:48 -05:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-06-12 11:45:15 -04:00
2020-07-08 16:48:37 -04:00
2020-12-14 23:34:12 -05:00
%Define all variables $X_i$ in $\poly$ to be independent.
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-06-23 19:33:28 -04:00
\begin { Lemma} \label { lem:exp-poly-rpoly}
2020-12-19 23:19:02 -05:00
Let $ \pxdb $ be a \bi over variables $ \vct { X } = \{ X _ 1 , \ldots , X _ \numvar \} $ and with probability distribution $ \probDist $ produced by the tuple probability vector $ \probAllTup = ( \prob _ 1 , \ldots , \prob _ \numvar ) $ over all $ \vct { w } $ in $ \eta $ . For any \bi -lineage polynomial $ \poly ( \vct { X } ) $ based on $ \pxdb $ and query $ \query $ we have:
2020-12-14 23:34:12 -05:00
% The expectation over possible worlds in $\poly(\vct{X})$ is equal to $\rpoly(\prob_1,\ldots, \prob_\numvar)$.
2020-06-12 11:45:15 -04:00
\begin { equation*}
2020-12-19 23:19:02 -05:00
\expct _ { \vct { W} \sim \probDist } \pbox { \poly (\vct { W} )} = \rpoly (\probAllTup ).
2020-06-12 11:45:15 -04:00
\end { equation*}
2020-06-23 19:33:28 -04:00
\end { Lemma}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-06-12 11:45:15 -04:00
2021-04-08 22:30:03 -04:00
Note that in the preceding lemma, we have assigned $ \vct { p } $
%(introduced in \Cref{subsec:def-data})
2020-12-16 12:38:21 -05:00
to the variables $ \vct { X } $ . Intuitively, \Cref { lem:exp-poly-rpoly} states that when we replace each variable $ X _ i $ with its probability $ \prob _ i $ in the reduced form of a \bi -lineage polynomial and evaluate the resulting expression in $ \mathbb { R } $ , then the result is the expectation of the polynomial.
2020-09-17 13:51:57 -04:00
2020-12-18 11:39:38 -05:00
2020-06-12 11:45:15 -04:00
2020-12-17 17:08:48 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-06-26 17:27:52 -04:00
2020-06-23 19:33:28 -04:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-06-15 18:38:10 -04:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-09-16 16:27:50 -04:00
\begin { Corollary} \label { cor:expct-sop}
2021-04-10 14:39:54 -04:00
If $ \poly $ is a \bi -lineage polynomial, then the expectation of $ \poly $ , i.e., $ \expct \pbox { \poly } = \rpoly \left ( \prob _ 1 , \ldots , \prob _ \numvar \right ) $ can be computed in $ O ( \size \inparen { \smbOf { \poly } } ) $ , where $ \size \inparen { \poly } $ (\Cref { def:size} ) is proportional to the total number of multiplication/addition operators in $ \poly $ .
2020-06-17 10:58:02 -04:00
\end { Corollary}
2020-12-19 00:45:30 -05:00
%\AH{What if $\poly$ is not in \abbrSMB form?}
2020-12-17 17:08:48 -05:00
2020-12-18 11:39:38 -05:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-17 17:08:48 -05:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: