paper-BagRelationalPDBsAreHard/poly-form.tex

173 lines
10 KiB
TeX
Raw Normal View History

%root: main.tex
2020-06-26 17:27:52 -04:00
%!TEX root = ./main.tex
2020-07-14 11:45:57 -04:00
%\onecolumn
2020-12-14 23:34:12 -05:00
\subsection{Reduced Polynomials and Equivalences}
2020-12-15 17:46:35 -05:00
Since we have shown that computing the expected multiplicity of a query result tuple is equivalent to computing the expectation of a polynomial (for that tuple) given a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$, we focus on this problem exclusively from now on.
We now introduce some basic terminology for polynomials and then develop a reduced normal form for polynomials that preserves a polynomial expectation for probability distributions that stem from \bis or \tis.
2020-12-16 12:38:21 -05:00
Let us use the expression $(X + Y)^2$ as a running example in this section.
2020-12-14 23:34:12 -05:00
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \begin{Definition}[Monomial]\label{def:monomial}
% A monomial is a product of a set of variables, each raised to a non-negative integer power.
% \end{Definition}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-14 23:34:12 -05:00
% For instance, the term $2xy$ contains a single monomial $xy$.
% \Cref{def:monomial} the monomial is $xy$.
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Standard Monomial Basis]\label{def:smb}
2020-12-14 23:34:12 -05:00
A monomial is a product of a set of variables, each raised to a non-negative integer power.
A polynomial is in \termSMB (\abbrSMB) when it is of the form:
2020-12-14 13:58:56 -05:00
\[
\sum_{i=1}^n c_i \cdot m_i
\]
2020-12-14 23:34:12 -05:00
where each $c_i$ is a positive integer and each $m_i$ is a monomial and $m_i \neq m_j$ for $i \neq j$. Given a polynomial $\poly$ we denote its \abbrSMB as $\smbOf{\poly}$.
2020-12-14 13:58:56 -05:00
% fully expanded out such that no product of sums exist and where each unique monomial appears exactly once.
\end{Definition}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-16 12:38:21 -05:00
The \abbrSMB for the running example is $X^2 +2XY + Y^2$. While $X^2 + XY + XY + Y^2$ is an expanded form of the expression, it is not the standard monomial basis since $XY$ appears more than once.
2020-12-14 13:58:56 -05:00
2020-12-14 23:34:12 -05:00
\BG{Maybe inline degree?}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-14 23:34:12 -05:00
\begin{Definition}[Degree]\label{def:degree}
The degree of polynomial $\poly(\vct{X})$ is the maximum sum of the exponents of a monomial, over all monomials in $\smbOf{\poly(\vct{X})}$.
\end{Definition}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-14 23:34:12 -05:00
The degree of the running example polynomial is $2$. In this paper we consider only finite degree polynomials.
% Throughout this paper, we also make the following \textit{assumption}.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \begin{Assumption}\label{assump:poly-smb}
% All polynomials considered are in standard monomial basis, i.e., $\poly(\vct{X}) = \sum\limits_{\vct{d} \in \mathbb{N}^\numvar}q_d \cdot \prod\limits_{i = 1, d_i \geq 1}^{\numvar}X_i^{d_i}$, where $q_d$ is the coefficient for the monomial encoded in $\vct{d}$ and $d_i$ is the $i^{th}$ element of $\vct{d}$.
% \end{Assumption}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-16 12:38:21 -05:00
We call a polynomial $\query(\vct{X})$ a \emph{\bi-lineage polynomial} (\emph{\ti-lineage polynomial}), if
\AH{Why is it required for the tuple to be n-ary? I think this slightly confuses me since we have n tuples.} there exists an n-ary $\raPlus$ query $\query$, \bi $\pxdb$ (\ti $\pxdb$), and n-ary tuple $\tup$ such that $\query(\vct{X}) = \query(\pxdb)(\tup)$. % Before proceeding, note that the following is assume that polynomials are \bis (which subsume \tis as a special case).
2020-12-14 23:34:12 -05:00
Note the \tis are a special case of \bis and, thus, the following applies to \tis as well.
2020-12-16 21:01:22 -05:00
Recall that in a \bi $\pxdb$ with tuples $t_1, \ldots, t_n$, each input tuple $t_i$ is annotated with a unique variable $X_i$. The tuples of $\pxdb$ are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ and each tuple $t_i$ is associated with a probability $\prob(\tup_i) = \pd[X_i = 1]$. Together with the assumption that blocks are assumed to be independent and tuples from the same block are disjoint events, $\prob$ and the blocks induce the probability distribution $\pd$ of $\pxdb$.
2020-12-14 23:34:12 -05:00
We will write a \bi-lineage polynomial $\poly(\vct{X})$ for a \bi with $\ell$ blocks as
$\poly(\vct{X})$ = $\poly(X_{\block_1, 1},\ldots, X_{\block_1, \abs{\block_1}},$ $\ldots, X_{\block_\ell, \abs{\block_\ell}})$, where $\abs{\block_i}$ denotes the size of $\block_i$, and $\block_{i, j}$ denotes tuple $j$ residing in block $i$ for $j$ in $[\abs{\block_i}]$.
2020-12-16 21:01:22 -05:00
\SF{Where is $\block_{i, j}$ used? Is it $X_{\block_{1, 1}}$ or $X_{\block_1, 1}$ ?}
2020-12-15 12:02:22 -05:00
% and the probability distribution of $\pxdb$ is uniquely determined based on a probability vector $\vct{p}$ that associates each tuple a probability
2020-12-14 23:34:12 -05:00
% variables are independent of each other (or disjoint if they are from the same block) and each variable $X$ is associated with a probability $\vct{p}(X) = \pd[X = 1]$. Thus, we are dealing with polynomials $\poly(\vct{X})$ that are annotations of a tuple in the result of a query $\query$ over a BIDB $\pxdb$ where $\vct{X}$ is the set of variables that occur in annotations of tuples of $\pxdb$.
% While the definition of polynomial $\poly(\vct{X})$ over a $\bi$ input doesn't change, we introduce an alternative notation which will come in handy. Given $\ell$ blocks, we write $\poly(\vct{X})$ = $\poly(X_{\block_1, 1},\ldots, X_{\block_1, \abs{\block_1}},$ $\ldots, X_{\block_\ell, \abs{\block_\ell}})$, where $\abs{\block_i}$ denotes the size of $\block_i$, and $\block_{i, j}$ denotes tuple $j$ residing in block $i$ for $j$ in $[\abs{\block_i}]$.
% The number of tuples in the $\bi$ instance can be (trivially) computed as $\numvar = \sum\limits_{i = 1}^{\ell}\abs{\block_i}$ .
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-14 23:34:12 -05:00
\begin{Definition}[Reduced \bi Polynomials]\label{def:reduced-bi-poly}
Let $\poly(\vct{X})$ be a \bi-lineage polynomial.
The reduced form $\rpoly(\vct{X})$ of $\poly(\vct{X})$ is defined as
\begin{equation*}
\rpoly(\vct{X}) = \smbOf{\poly(\vct{X})} \mod X_i^2 - X_i \mod X_{\block_s, t}X_{\block_s, u}
\end{equation*}
2020-12-15 10:06:38 -05:00
for all $i$ in $[\numvar]$ and for all $s$ in $\ell$, such that for all $t, u$ in $[\abs{\block_s}]$, $t \neq u$.
\end{Definition}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-15 10:06:38 -05:00
Intuitively, in the reduced form all exponents $e > 1$ are reduced to $e = 1$ and, all monomials containing more than one variable from the same block $\block$ are dropped (tuples from the same block are disjoint events in \bis and, thus, any world containing more than one tuple from a block has $0$ probability and can be ignored). Note that for the special case of \tis, the second step (dropping monomials with variables from the same block) is not necessary since every block contains a single tuple.
2020-12-14 23:34:12 -05:00
Alternatively, one can think of $\rpoly$ as the \abbrSMB of $\poly(\vct{X})$ when the product operator is idempotent.
2020-12-14 23:34:12 -05:00
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \begin{Definition}[$\rpoly(\vct{X})$] \label{def:qtilde}
% Define $\rpoly(X_1,\ldots, X_\numvar)$ as the reduced version of $\poly(X_1,\ldots, X_\numvar)$, of the form
% $\rpoly(X_1,\ldots, X_\numvar) = $
2020-07-08 16:48:37 -04:00
2020-12-14 23:34:12 -05:00
% \[\poly(X_1,\ldots, X_\numvar) \mod X_1^2-X_1\cdots\mod X_\numvar^2 - X_\numvar.\]
% \end{Definition}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-07-08 16:48:37 -04:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Example}\label{example:qtilde}
2020-12-16 12:38:21 -05:00
Consider $\poly(X, Y) = (X + Y)(X + Y)$ where $X$ and $Y$ are from different blocks. Then the expanded derivation for $\rpoly(X, Y)$ is
\begin{align*}
2020-12-16 12:38:21 -05:00
(&X^2 + 2XY + Y^2 \mod X^2 - X) \mod Y^2 - Y\\
= ~&X + 2XY + Y^2 \mod Y^2 - Y\\
= ~& X + 2XY + Y
\end{align*}
\end{Example}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-06-23 10:16:57 -04:00
2020-12-14 23:34:12 -05:00
% Intuitively, $\rpoly(\textbf{X})$ is the \abbrSMB form of $\poly(\textbf{X})$ such that if any $X_j$ term has an exponent $e > 1$, it is reduced to $1$, i.e. $X_j^e\mapsto X_j$ for any $e > 1$.
2020-12-14 23:34:12 -05:00
%When considering $\bi$ input, it becomes necessary to redefine $\rpoly(\vct{X})$.
2020-06-26 17:27:52 -04:00
2020-12-15 17:46:35 -05:00
The usefulness of this will reduction become clear in \Cref{lem:exp-poly-rpoly}.
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:pre-poly-rpoly}
When $\poly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}} \cdot \prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i^{d_i}$, we have then that $\rpoly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} \in \{0,\ldots, B\}^\numvar} q_{\vct{d}}\cdot\prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i$.
\end{Lemma}
2020-12-17 17:08:48 -05:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-07-08 16:48:37 -04:00
\begin{proof}
2020-12-14 23:34:12 -05:00
Follows by the construction of $\rpoly$ in \cref{def:reduced-bi-poly}. \qed
2020-07-08 16:48:37 -04:00
\end{proof}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-07-08 16:48:37 -04:00
Note the following fact:
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-14 23:34:12 -05:00
\begin{Proposition}\label{proposition:q-qtilde} For any \bi-lineage polynomial $\poly(X_1, \ldots, X_\numvar)$ and all $\vct{w} \in \{0,1\}^\numvar$,
\[
\poly(\vct{w}) = \rpoly(\vct{w}).
\]
\end{Proposition}
2020-12-17 17:08:48 -05:00
The proof is in~\Cref{sec:proofs-background}.
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-17 17:08:48 -05:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-07-08 16:48:37 -04:00
2020-12-14 23:34:12 -05:00
%Define all variables $X_i$ in $\poly$ to be independent.
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:exp-poly-rpoly}
2020-12-15 10:06:38 -05:00
Let $\pxdb$ be a \bi over variables $\vct{X} = \{X_1, \ldots, X_\numvar\}$ and with probability distribution $\vct{p} = (\prob_1, \ldots, \prob_\numvar)$. For any \bi-lineage polynomial $\poly(\vct{X})$ based on $\pxdb$ and some query $\query$ we have
2020-12-14 23:34:12 -05:00
% The expectation over possible worlds in $\poly(\vct{X})$ is equal to $\rpoly(\prob_1,\ldots, \prob_\numvar)$.
\begin{equation*}
2020-12-14 23:34:12 -05:00
\expct_{\vct{X}}\pbox{\poly(\vct{X})} = \rpoly(\vct{p}).
\end{equation*}
\end{Lemma}
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-16 12:38:21 -05:00
Note that in the preceding lemma, we have assigned $\vct{p}$
%(introduced in \Cref{subsec:def-data})
to the variables $\vct{X}$. Intuitively, \Cref{lem:exp-poly-rpoly} states that when we replace each variable $X_i$ with its probability $\prob_i$ in the reduced form of a \bi-lineage polynomial and evaluate the resulting expression in $\mathbb{R}$, then the result is the expectation of the polynomial.
2020-12-17 17:08:48 -05:00
The proof for~\Cref{lem:exp-poly-rpoly} can be seen in~\Cref{sec:proofs-background}.
2020-12-17 17:08:48 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-06-26 17:27:52 -04:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-06-15 18:38:10 -04:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Corollary}\label{cor:expct-sop}
2020-12-15 10:06:38 -05:00
If $\poly$ is a \bi-lineage polynomial, then the expectation of $\poly$, i.e., $\expct\pbox{\poly} = \rpoly\left(\prob_1,\ldots, \prob_\numvar\right)$ can be computed in $O(|\smbOf{\poly}|)$, where $|\poly|$ denotes the total number of multiplication/addition operators in $\poly$.
2020-06-17 10:58:02 -04:00
\end{Corollary}
2020-12-16 12:38:21 -05:00
\AH{What if $\poly$ is not in \abbrSMB form?}
2020-12-17 17:08:48 -05:00
We defer the proof for~\Cref{cor:expct-sop} in~\Cref{sec:proofs-background}.
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2020-12-17 17:08:48 -05:00
2020-12-14 13:58:56 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: