Since we have shown that computing the expected multiplicity of a query result tuple is equivalent to computing the expectation of a polynomial (for that tuple) given a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$, we focus on this problem exclusively from now on.
We now introduce some basic terminology for polynomials and then develop a reduced normal form for polynomials that preserves a polynomial expectation for probability distributions that stem from \bis or \tis.
where each $c_i$ is a positive integer and each $m_i$ is a monomial and $m_i \neq m_j$ for $i \neq j$. Given a polynomial $\poly$ we denote its \abbrSMB as $\smbOf{\poly}$.
The \abbrSMB for the running example is $X^2+2XY + Y^2$. While $X^2+ XY + XY + Y^2$ is an expanded form of the expression, it is not the standard monomial basis since $XY$ appears more than once.
The degree of the running example polynomial is $2$. In this paper we consider only finite degree polynomials.
% Throughout this paper, we also make the following \textit{assumption}.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \begin{Assumption}\label{assump:poly-smb}
% All polynomials considered are in standard monomial basis, i.e., $\poly(\vct{X}) = \sum\limits_{\vct{d} \in \mathbb{N}^\numvar}q_d \cdot \prod\limits_{i = 1, d_i \geq 1}^{\numvar}X_i^{d_i}$, where $q_d$ is the coefficient for the monomial encoded in $\vct{d}$ and $d_i$ is the $i^{th}$ element of $\vct{d}$.
We call a polynomial $\query(\vct{X})$ a \emph{\bi-lineage polynomial} (\emph{\ti-lineage polynomial}), if
\AH{Why is it required for the tuple to be n-ary? I think this slightly confuses me since we have n tuples.} there exists an n-ary $\raPlus$ query $\query$, \bi$\pxdb$ (\ti$\pxdb$), and n-ary tuple $\tup$ such that $\query(\vct{X})=\query(\pxdb)(\tup)$. % Before proceeding, note that the following is assume that polynomials are \bis (which subsume \tis as a special case).
Recall that in a \bi$\pxdb$ with tuples $t_1, \ldots, t_n$, each input tuple $t_i$ is annotated with a unique variable $X_i$. The tuples of $\pxdb$ are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ and each tuple $t_i$ is associated with a probability $\prob(\tup_i)=\pd[X_i =1]$. Together with the assumption that blocks are assumed to be independent and tuples from the same block are disjoint events, $\prob$ and the blocks induce the probability distribution $\pd$ of $\pxdb$.
We will write a \bi-lineage polynomial $\poly(\vct{X})$ for a \bi with $\ell$ blocks as
$\poly(\vct{X})$ = $\poly(X_{\block_1, 1},\ldots, X_{\block_1, \abs{\block_1}},$$\ldots, X_{\block_\ell, \abs{\block_\ell}})$, where $\abs{\block_i}$ denotes the size of $\block_i$, and $\block_{i, j}$ denotes tuple $j$ residing in block $i$ for $j$ in $[\abs{\block_i}]$.
% variables are independent of each other (or disjoint if they are from the same block) and each variable $X$ is associated with a probability $\vct{p}(X) = \pd[X = 1]$. Thus, we are dealing with polynomials $\poly(\vct{X})$ that are annotations of a tuple in the result of a query $\query$ over a BIDB $\pxdb$ where $\vct{X}$ is the set of variables that occur in annotations of tuples of $\pxdb$.
% While the definition of polynomial $\poly(\vct{X})$ over a $\bi$ input doesn't change, we introduce an alternative notation which will come in handy. Given $\ell$ blocks, we write $\poly(\vct{X})$ = $\poly(X_{\block_1, 1},\ldots, X_{\block_1, \abs{\block_1}},$ $\ldots, X_{\block_\ell, \abs{\block_\ell}})$, where $\abs{\block_i}$ denotes the size of $\block_i$, and $\block_{i, j}$ denotes tuple $j$ residing in block $i$ for $j$ in $[\abs{\block_i}]$.
% The number of tuples in the $\bi$ instance can be (trivially) computed as $\numvar = \sum\limits_{i = 1}^{\ell}\abs{\block_i}$ .
Intuitively, in the reduced form all exponents $e > 1$ are reduced to $e =1$ and, all monomials containing more than one variable from the same block $\block$ are dropped (tuples from the same block are disjoint events in \bis and, thus, any world containing more than one tuple from a block has $0$ probability and can be ignored). Note that for the special case of \tis, the second step (dropping monomials with variables from the same block) is not necessary since every block contains a single tuple.
% Intuitively, $\rpoly(\textbf{X})$ is the \abbrSMB form of $\poly(\textbf{X})$ such that if any $X_j$ term has an exponent $e > 1$, it is reduced to $1$, i.e. $X_j^e\mapsto X_j$ for any $e > 1$.
When $\poly(X_1,\ldots, X_\numvar)=\sum\limits_{\vct{d}\in\{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot\prod\limits_{\substack{i =1\\s.t. d_i\geq1}}^{\numvar}X_i^{d_i}$, we have then that $\rpoly(X_1,\ldots, X_\numvar)=\sum\limits_{\vct{d}\in\{0,\ldots, B\}^\numvar} q_{\vct{d}}\cdot\prod\limits_{\substack{i =1\\s.t. d_i\geq1}}^{\numvar}X_i$.
Note that any $\poly$ in factorized form is equivalent to its \abbrSMB expansion. For each term in the expanded form, further note that for all $b \in\{0, 1\}$ and all $e \geq1$, $b^e = b$. \qed
Let $\pxdb$ be a \bi over variables $\vct{X}=\{X_1, \ldots, X_\numvar\}$ and with probability distribution $\vct{p}=(\prob_1, \ldots, \prob_\numvar)$. For any \bi-lineage polynomial $\poly(\vct{X})$ based on $\pxdb$ and some query $\query$ we have
Note that in the preceding lemma, we have assigned $\vct{p}$
%(introduced in \Cref{subsec:def-data})
to the variables $\vct{X}$. Intuitively, \Cref{lem:exp-poly-rpoly} states that when we replace each variable $X_i$ with its probability $\prob_i$ in the reduced form of a \bi-lineage polynomial and evaluate the resulting expression in $\mathbb{R}$, then the result is the expectation of the polynomial.
%Using the fact above, we need to compute \[\sum_{(\wbit_1,\ldots, \wbit_\numvar) \in \{0, 1\}}\rpoly(\wbit_1,\ldots, \wbit_\numvar)\]. We therefore argue that
Let $\poly$ be the generalized polynomial, i.e., the polynomial of $\numvar$ variables with highest degree $= B$: %, in which every possible monomial permutation appears,
In steps \cref{p1-s1} and \cref{p1-s2}, by linearity of expectation (recall the variables are independent), the expecation can be pushed all the way inside of the product. In \cref{p1-s3}, note that $w_i \in\{0, 1\}$ which further implies that for any exponent $e \geq1$, $w_i^e = w_i$. Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.
Finally, observe \Cref{p1-s5} by construction in \Cref{lem:pre-poly-rpoly}, that $\rpoly(\prob_1,\ldots, \prob_\numvar)$ is exactly the product of probabilities of each variable in each monomial across the entire sum.
If $\poly$ is a \bi-lineage polynomial, then the expectation of $\poly$, i.e., $\expct\pbox{\poly}=\rpoly\left(\prob_1,\ldots, \prob_\numvar\right)$ can be computed in $O(|\smbOf{\poly}|)$, where $|\poly|$ denotes the total number of multiplication/addition operators in $\poly$.
Note that \cref{lem:exp-poly-rpoly} shows that $\expct\pbox{\poly}=$$\rpoly(\prob_1,\ldots, \prob_\numvar)$. Therefore, if $\poly$ is already in \abbrSMB form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $O(\smbOf{|\poly|})$ computations.\qed