Since we have shown that computing the expected multiplicity of a query result tuple is equivalent to computing the expectation of a polynomial (for that tuple) given a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$, we from now on focus on this problem exclusively.
We now introduce some basic terminology for polynomials and then develop a reduced normal form for polynomials that preserves a polynomial expectation for probability distributions that stems from \bis or \tis.
where each $c_i$ is a positive integer and each $m_i$ is a monomial and $m_i \neq m_j$ for $i \neq j$. Given a polynomial $\poly$ we denote its \abbrSMB as $\smbOf{\poly}$.
The \abbrSMB for the running example is $x^2+2xy + y^2$. While $x^2+ xy + xy + y^2$ is an expanded form of the expression, it is not the standard monomial basis since $xy$ appears more than once.
The degree of the running example polynomial is $2$. In this paper we consider only finite degree polynomials.
% Throughout this paper, we also make the following \textit{assumption}.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \begin{Assumption}\label{assump:poly-smb}
% All polynomials considered are in standard monomial basis, i.e., $\poly(\vct{X}) = \sum\limits_{\vct{d} \in \mathbb{N}^\numvar}q_d \cdot \prod\limits_{i = 1, d_i \geq 1}^{\numvar}X_i^{d_i}$, where $q_d$ is the coefficient for the monomial encoded in $\vct{d}$ and $d_i$ is the $i^{th}$ element of $\vct{d}$.
% \end{Assumption}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We call a polynomial $\query(\vct{X})$ a \emph{\bi-lineage polynomial} (\emph{\ti-lineage polynomial}), if there exists an n-ary $\raPlus$ query $\query$, \bi$\pxdb$ (\ti$\pxdb$), and n-ary tuple $\tup$ such that $\query(\vct{X})=\query(\pxdb)(\tup)$. % Before proceeding, note that the following is assume that polynomials are \bis (which subsume \tis as a special case).
Note the \tis are a special case of \bis and, thus, the following applies to \tis as well.
Recall that in a \bi$\pdbx$ with tuples $t_1, \ldots, t_n$, each input tuple $t_i$ is annotated with a unique variable $X_i$. The tuples of $\pdbx$ are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ and each tuple $t_i$ is associated with a probability $\vct{p}(\tup_i)=\pd[X_i =1]$. Together with the assumption that blocks are assumed to be independent and tuples from the same block are disjoint events, $\vct{p}$ and the blocks induce a the probability distribution $\pd$ of $\pdbx$.
We will write a \bi-lineage polynomial $\poly(\vct{X})$ for a \bi with $\ell$ blocks as
$\poly(\vct{X})$ = $\poly(X_{\block_1, 1},\ldots, X_{\block_1, \abs{\block_1}},$$\ldots, X_{\block_\ell, \abs{\block_\ell}})$, where $\abs{\block_i}$ denotes the size of $\block_i$, and $\block_{i, j}$ denotes tuple $j$ residing in block $i$ for $j$ in $[\abs{\block_i}]$.
% and the probability distribution of $\pdbx$ is uniquely determined based on a probability vector $\vct{p}$ that associates each tuple a probability
% variables are independent of each other (or disjoint if they are from the same block) and each variable $X$ is associated with a probability $\vct{p}(X) = \pd[X = 1]$. Thus, we are dealing with polynomials $\poly(\vct{X})$ that are annotations of a tuple in the result of a query $\query$ over a BIDB $\pxdb$ where $\vct{X}$ is the set of variables that occur in annotations of tuples of $\pxdb$.
% While the definition of polynomial $\poly(\vct{X})$ over a $\bi$ input doesn't change, we introduce an alternative notation which will come in handy. Given $\ell$ blocks, we write $\poly(\vct{X})$ = $\poly(X_{\block_1, 1},\ldots, X_{\block_1, \abs{\block_1}},$ $\ldots, X_{\block_\ell, \abs{\block_\ell}})$, where $\abs{\block_i}$ denotes the size of $\block_i$, and $\block_{i, j}$ denotes tuple $j$ residing in block $i$ for $j$ in $[\abs{\block_i}]$.
% The number of tuples in the $\bi$ instance can be (trivially) computed as $\numvar = \sum\limits_{i = 1}^{\ell}\abs{\block_i}$ .
Intuitively, in the reduced form all exponents $e > 1$ are reduced to $e =1$ and, all monomials containing more than one variable from the same block $\block$ are dropped. Note that for the special case of \tis, there is no dropping of monomials since every block contains a single tuple.
Alternatively, one can think of $\rpoly$ as the \abbrSMB of $\poly(\vct{X})$ when the product operator is idempotent.
% Intuitively, $\rpoly(\textbf{X})$ is the \abbrSMB form of $\poly(\textbf{X})$ such that if any $X_j$ term has an exponent $e > 1$, it is reduced to $1$, i.e. $X_j^e\mapsto X_j$ for any $e > 1$.
When $\poly(X_1,\ldots, X_\numvar)=\sum\limits_{\vct{d}\in\{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot\prod\limits_{\substack{i =1\\s.t. d_i\geq1}}^{\numvar}X_i^{d_i}$, we have then that $\rpoly(X_1,\ldots, X_\numvar)=\sum\limits_{\vct{d}\in\{0,\ldots, B\}^\numvar} q_{\vct{d}}\cdot\prod\limits_{\substack{i =1\\s.t. d_i\geq1}}^{\numvar}X_i$.
Note that any $\poly$ in factorized form is equivalent to its \abbrSMB expansion. For each term in the expanded form, further note that for all $b \in\{0, 1\}$ and all $e \geq1$, $b^e = b$. \qed
Let $\pxdb$ be a \bi over variables $\vct{X}=\{X_1, \ldots, X_\numvar\}$ and with probability distribution $\vct{p}=(\prob_1, \ldots, \prob_\numvar)$. For any \bi-lineage polynomial $\poly(\vct{X})$ we have
% The expectation over possible worlds in $\poly(\vct{X})$ is equal to $\rpoly(\prob_1,\ldots, \prob_\numvar)$.
Note that in the preceding lemma, we have assigned $\vct{p}$ (introduced in \Cref{subsec:def-data}) to the variables $\vct{X}$. Intuitively, \Cref{lem:exp-poly-rpoly} states that when we replace each variable $X_i$ with its probability $\prob_i$ in the reduced form a \bi-lineage polynomial and evaluate the resulting expression in $\mathbb{R}$, then the result is the expectation of the polynomial.
%Using the fact above, we need to compute \[\sum_{(\wbit_1,\ldots, \wbit_\numvar) \in \{0, 1\}}\rpoly(\wbit_1,\ldots, \wbit_\numvar)\]. We therefore argue that
Let $\poly$ be the generalized polynomial, i.e., the polynomial of $\numvar$ variables with highest degree $= B$: %, in which every possible monomial permutation appears,
In steps \cref{p1-s1} and \cref{p1-s2}, by linearity of expectation (recall the variables are independent), the expecation can be pushed all the way inside of the product. In \cref{p1-s3}, note that $w_i \in\{0, 1\}$ which further implies that for any exponent $e \geq1$, $w_i^e = w_i$. Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.
Finally, observe \cref{p1-s5} by construction in \cref{lem:pre-poly-rpoly}, that $\rpoly(\prob_1,\ldots, \prob_\numvar)$ is exactly the product of probabilities of each variable in each monomial across the entire sum.
If $\poly$ is given as a sum of monomials, the expectation of $\poly$, i.e., $\expct\pbox{\poly}=\rpoly\left(\prob_1,\ldots, \prob_\numvar\right)$ can be computed in $O(|\poly|)$, where $|\poly|$ denotes the total number of multiplication/addition operators in $\poly$.
Note that \cref{lem:exp-poly-rpoly} shows that $\expct\pbox{\poly}=$$\rpoly(\prob_1,\ldots, \prob_\numvar)$. Therefore, if $\poly$ is already in sum of products form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $O(|\poly|)$ compututations.\qed