paper-BagRelationalPDBsAreHard/app_notation-background.tex

222 lines
22 KiB
TeX

%root: main.tex
%!TEX root=./main.tex
To justify the use of $\semNX$-databases, we need to show that we can encode any $\semN$-PDB in this way and that the query semantics over this representation coincides with query semantics over its respective $\semN$-PDB. For that it will be opportune to define representation systems for $\semN$-PDBs.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Representation System]\label{def:representation-syste}
A representation system for $\semN$-PDBs is a tuple $(\reprs, \rmod)$ where $\reprs$ is a set of representations and $\rmod$ associates with each $\repr \in \reprs$ an $\semN$-PDB $\pdb$. We say that a representation system is \emph{closed} under a class of queries $\qClass$ if for any query $\query \in \qClass$ and $\repr \in \reprs$ we have:
%
\[ \rmod(\query(\repr)) = \query(\rmod(\repr)) \]
A representation system is \emph{complete} if for every $\semN$-PDB $\pdb$ there exists $\repr \in \reprs$ such that:
%
\[ \rmod(\repr) = \pdb \]
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
As mentioned above we will use $\semNX$-databases paired with a probability distribution as a representation system, referring to such databases as \abbrNXPDB\xplural.
Given \abbrNXPDB $\pxdb$, one can think of the of $\pd$ as the probability distribution across all worlds $\inset{0, 1}^\numvar$. Denote a particular world to be $\vct{W}$. For convenience let $\assign_\vct{W}: \pxdb\rightarrow\pndb$ be a function that computes the corresponding $\semN$-\abbrPDB upon assigning all values $W_i \in \vct{W}$ to $X_i \in \vct{X}$ of $\db_{\semNX}$. Note the one-to-one correspondence between elements $\vct{W}\in\inset{0, 1}^\numvar$ to the worlds encoded by $\db_{\semNX}$ when $\vct{W}$ is assigned to $\vct{X}$ (assuming a domain of $\inset{0, 1}$ for each $X_i$).
We can think of $\assign_\vct{W}(\pxdb)\inparen{\tup}$ as the semiring homomorphism $\semNX \to \semN$ that applies the assignment $\vct{W}$ to all variables $\vct{X}$ of a polynomial and evaluates the resulting expression in $\semN$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[$\rmod\inparen{\pxdb}$]\label{def:semnx-pdbs}
Given an \abbrNXPDB$\pxdb$, we compute its equivalent $\semN$-\abbrPDB $\pndb = \rmod\inparen{\pxdb} = \inparen{\idb, \pd'}$ as:
\begin{align*}
\idb & = \{ \assign_{\vct{W}}(\pxdb) \mid \vct{W} \in \{0,1\}^n \} \\
\forall \db \in \idb: \probOf(\db) & = \sum_{\vct{W} \in \{0,1\}^n: \assign_{\vct{W}}(\pxdb) = \db} \probOf(\vct{W})
\end{align*}
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
For instance, consider a $\pxdb$ consisting of a single tuple $\tup_1 = (1)$ annotated with $X_1 + X_2$ with probability distribution $\probOf([0,0]) = 0$, $\probOf([0,1]) = 0$, $\probOf([1,0]) = 0.3$ and $\probOf([1,1]) = 0.7$. This \abbrNXPDB encodes two possible worlds (with non-zero probability) that we denote using their world vectors.
%
\[
D_{[0,1]}(\tup_1) = 1 \hspace{0.3cm} \mathbf{and} \hspace{0.3cm} D_{[1,1]}(\tup_1) = 2
\]
%
Importantly, as the following proposition shows, any finite $\semN$-PDB can be encoded as an \abbrNXPDB and \abbrNXPDB\xplural are closed under $\raPlus$\cite{DBLP:conf/pods/GreenKT07}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Proposition}\label{prop:semnx-pdbs-are-a-}
\abbrNXPDB\xplural are a complete representation system for $\semN$-PDBs that is closed under $\raPlus$ queries.
\end{Proposition}
\begin{proof}
To prove that \abbrNXPDB\xplural are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an \abbrNXPDB $\pxdb = (\db_{\semNX}, \pd')$ such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}.$
For each world $D_i$ we create a corresponding variable $X_i$.
In $\db_{\semNX}$ we assign each tuple $\tup$ the polynomial:
%
\[
\db_{\semNX}(\tup) = \sum_{i=1}^{\abs{\idb}} D_i(\tup)\cdot X_{i}
\]
The probability distribution $\pd'$ assigns all world vectors zero probability except for $\abs{\idb}$ world vectors (representing the possible worlds) $\vct{W}_i$. All elements of $\vct{W}_i$ are zero except for the position corresponding to variables $X_{i}$ which is set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, \abbrNXPDB\xplural are a complete representation system.
Since $\semNX$ is the free object in the variety of semirings, Birkhoff's HSP theorem implies that any assignment $\vct{X} \to \semN$, which includes as a special case the assignments $\assign_{\vct{W}}$ used here, uniquely extends to the semiring homomorphism alluded to above, $\assign_\vct{W}\inparen{\pxdb}\inparen{\tup}: \semNX \to \semN$. For a polynomial $\assign_\vct{W}\inparen{\pxdb}\inparen{\tup}$ substitutes variables based on $\vct{W}$ and then evaluates the resulting expression in $\semN$. For instance, consider the polynomial $\pxdb\inparen{\tup} = \poly = X + Y$ and assignment $\vct{W} := X = 0, Y=1$. We get $\assign_\vct{W}\inparen{\pxdb}\inparen{\tup} = 0 + 1 = 1$.
Closure under $\raPlus$ queries follows from this and from \cite{DBLP:conf/pods/GreenKT07}'s Proposition 3.5, which states that semiring homomorphisms commute with queries over $\semK$-relations.
\qed
\end{proof}
\subsubsection{\tis and \bis in the \abbrNXPDB model}\label{subsec:supp-mat-ti-bi-def}
Two important subclasses of \abbrNXPDB\xplural that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob_\tup$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is at most $1$.
The probability of such a world is the product of the probabilities of all tuples present in the world and the product of the probabilities that no tuple is present in each block $\block$ for which no tuple exists in that world.
For bag \tis and \bis, we define the probability of a tuple to be the probability that the tuple exists with multiplicity at least $1$.
In this work, we define \tis and \bis as subclasses of \abbrNXPDB\xplural defined over variables $\vct{X}$ (\Cref{def:semnx-pdbs}) where $\vct{X}$ can be partitioned into blocks that satisfy the conditions of a \ti or \bi (stated formally in \Cref{subsec:tidbs-and-bidbs}).
In this work, we consider one further deviation from the standard: We use bag semantics for queries.
Even though tuples cannot occur more than once in the input \ti or \bi, they can occur with a multiplicity larger than one in the result of a query.
Since in \tis and \bis, there is a one-to-one correspondence between tuples in the database and variables, we can interpret a vector $\vct{W} \in \{0,1\}^n$ as denoting which tuples exist in the possible world $\assign_{\vct{W}}(\pxdb)$ (the ones where $W_i = 1$).
For BIDBs specifically, note that at most one of the bits corresponding to tuples in each of the $\numblock$ blocks will be set (i.e., for any pair of bits $W_j$, $W_{j'}$ that are part of the same block $\block_i \supseteq \{t_{j}, t_{j'}\}$, at most one of them will be set).
Denote the vector $\vct{p}$ to be a vector whose elements are the individual probabilities $\prob_i$ of each tuple $\tup_i$. Given \abbrPDB $\pdb$ where $\pd$ is the distribution induced by $\vct{p}$, which we will denote $\pd^{\inparen{\vct{\prob}}}$.
%
\begin{align}\label{eq:tidb-expectation}
\expct_{\vct{W} \sim \pd^{(\vct{p})}}\pbox{\poly(\vct{W})}
= \sum\limits_{\substack{\vct{W} \in \{0, 1\}^\numvar\\ \suchthat W_j,W_{j'} = 1 \rightarrow \not \exists \block_i \supseteq \{t_{j}, t_{j'}\}}} \poly(\vct{W})\prod_{\substack{j \in [\numvar]\\ \suchthat W_j = 1}}\prob_j \prod_{\substack{i\in\pbox{\numblock}\suchthat\\ \forall \tup_j \in \block_i, W_j = 0}}%\not\exists j\in [\numvar]\\\suchthat \tup_{i, j}\subseteq\block_i, W_j = 1}}
\left(1 - \sum_{\tup_{j} \in \block_i}\prob_j\right)
\end{align}
%
Recall that tuple blocks in a TIDB always have size 1, so the outer summation of \cref{eq:tidb-expectation} is over the full set of vectors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Proof of~\Cref{prop:ctidb-reduct}}
\begin{proof}[Proof of~\Cref{prop:ctidb-reduct}]
We first need to prove that any \abbrCTIDB $\pdb$ can be reduced to the \abbrOneBIDB created by~\Cref{prop:ctidb-reduct}. By definition, any $\tup\in\tupset$ can be present $c'\in\pbox{\bound}$ times in the possible worlds it appears in, with a disjoint probability distribution across the multiplicities $\pbox{\bound}$. Then the construction of~\Cref{prop:ctidb-reduct} of the block of tuples $\inset{\intuple{\tup, j}_{j\in\pbox{\bound}}}$ for each $\tup\in\tupset$ indeed encodes the disjoint behavior across multiplicities. $\pdb$ further requires that all $\tup\in\tupset$ are independent, a property which is enforced by independence constraint across all blocks of tuples in~\Cref{def:one-bidb}. Then the construction of $\pdb'$ in~\Cref{prop:ctidb-reduct} is an equivalent representation of $\pdb$.
Next we need to show that the distributions over $\pdb$ and $\pdb'$ are equivalent. The distribution $\bpd$ is a distribution disjoint across the set of multiplicities $\pbox{\bound}$ and independent across all $\tup\in\tupset$. By definition,~\Cref{prop:ctidb-reduct} creates $\pdb'$ by producing a block of disjoint tuples $\tup_j = \intuple{\tup, j}$ for each $\tup\in\tupset$ and $j\in\pbox{\bound}$, where $\vct{\prob} = \inparen{\prob_{\tup, j}}_{\tup\in\tupset, j\in\pbox{\bound}}$. Since the probability vector $\vct{\prob_{\pdb}} = \inparen{\prob_{\tup, j}}_{\tup\in\tupset, j\in\pbox{\bound}}$ \emph{and} each $\prob_{\tup, j}, \prob_{\tup, j'}$ for $j\neq j'\in\pbox{\bound}$ are disjoint, the distributions are hence the same.
\end{proof}
\subsection{Equating Expectations of Tuple Multiplicities and Polynomials}
\label{subsec:expectation-of-polynom-proof}
\begin{Proposition}[Expectation of polynomials]\label{prop:expection-of-polynom}
Given a \abbrBPDB $\pdb = (\Omega,\bpd)$, $\raPlus$ query $\query$, and lineage polynomial $\apolyqdt$ for arbitrary result tuple $\tup$,
we have (denoting $\randDB$ as the random variable over $\Omega$):
$ \expct_{\randDB \sim \bpd}[\query(\randDB)(t)] = \expct_{\vct{\randWorld}\sim \pdassign}\pbox{\apolyqdt\inparen{\vct{\randWorld}}}. $
\end{Proposition}
Although \Cref{prop:expection-of-polynom} follows, e.g., as an obvious consequence of~\cite{IL84a}'s Theorem 7.1, we are unaware of any prior formal proof for bag-probabilistic databases.
\begin{proof}
We need to prove for $\semN$-PDB $\pdb = (\idb,\pd)$ and \abbrNXPDB $\pxdb = (\db_{\semNX},\pd')$ where $\rmod(\pxdb) = \pdb$ that $\expct_{\randDB\sim \pd}[\query(\db)(t)] = \expct_{\vct{W} \sim \pd'}\pbox{\nxpolyqdt(\vct{W})}$
By expanding $\nxpolyqdt$ and the expectation we have:
\begin{align*}
\expct_{\vct{W} \sim \pd'}\pbox{\poly(\vct{W})}
& = \sum_{\vct{W} \in \{0,1\}^n}\probOf(\vct{W}) \cdot Q(\db_{\semNX})(t)(\vct{W})\\
\intertext{From $\rmod(\pxdb) = \pdb$, we have that the range of $\assign_{\vct{W}(\pxdb)}$ is $\idb$, so}
& = \sum_{\db \in \idb}\;\;\sum_{\vct{W} \in \{0,1\}^n : \assign_{\vct{W}}(\pxdb) = \db}\probOf(\vct{W}) \cdot Q(\db_{\semNX})(t)(\vct{W})\\
\intertext{The inner sum is only over $\vct{W}$ where $\assign_{\vct{W}}(\pxdb) = \db$ (i.e., $Q(\db_{\semNX})(t)(\vct{W}) = \query(\db)(t))$}
& = \sum_{\db \in \idb}\;\;\sum_{\vct{W} \in \{0,1\}^n : \assign_{\vct{W}}(\pxdb) = \db}\probOf(\vct{W}) \cdot \query(\db)(t)\\
\intertext{By distributivity of $+$ over $\times$}
& = \sum_{\db \in \idb}\query(\db)(t)\sum_{\vct{W} \in \{0,1\}^n : \assign_{\vct{W}}(\pxdb) = \db}\probOf(\vct{W})\\
\intertext{From the definition of $\pd$ in \cref{def:semnx-pdbs}, given $\rmod(\pxdb) = \pdb$, we get}
& = \sum_{\db \in \idb}\query(\db)(t) \cdot \probOf(D) \quad = \expct_{\randDB \sim \pd}[\query(\db)(t)]
\end{align*}
\qed
\end{proof}
\subsection{Construction of the lineage (polynomial) for an $\raPlus$ query $\query$ over $\gentupset'$}
\label{fig:lin-poly-bidb}
The following rules are analogous to \Cref{fig:nxDBSemantics}, differing only in the base case. They are presented here for completeness.
\begin{align*}
\poly'\pbox{\project_A\inparen{\query}, \gentupset', \tup_j} =
&~\sum_{\substack{\tup_{j'},\\\project_{A}\inparen{\tup_{j'}} = \tup_j}}\poly'\pbox{\query, \gentupset', \tup_{j'}}\\
%
\poly'\pbox{\query_1\union\query_2, \gentupset', \tup_j} =
&\qquad\poly'\pbox{\query_1, \gentupset', \tup_j}+\poly'\pbox{\query_2, \gentupset', \tup_j}\\
%
\poly'\pbox{\select_\theta\inparen{\query}, \gentupset', \tup_j} =
&~\begin{cases}\theta = 1 &\poly'\pbox{\query, \gentupset', \tup_j}\\\theta = 0& 0\\\end{cases} \\
%
\poly'\pbox{\query_1\join\query_2, \gentupset', \tup_j} =
&\qquad \poly'\pbox{\query_1, \gentupset', \project_{attr\inparen{\query_1}}\inparen{\tup_j}}\\ &\qquad\cdot\poly'\pbox{\query_2, \gentupset', \project_{attr\inparen{\query_2}} \inparen{\tup_j}}\\
%
\poly'\pbox{\rel,\gentupset', \tup_j} =& j\cdot X_{\tup, j}.
\end{align*}\\%[-10mm]
\subsection{Proposition~\ref{proposition:q-qtilde}}\label{app:subsec-prop-q-qtilde}
\noindent Note the following fact:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Proposition}\label{proposition:q-qtilde} For any \bi-lineage polynomial $\poly(X_1, \ldots, X_\numvar)$ and all $\vct{W}$ such that $\probOf\pbox{\vct{W}} > 0$,
it holds that
$% \[
\poly(\vct{W}) = \rpoly(\vct{W}).
$% \]
\end{Proposition}
\begin{proof}
Note that any $\poly$ in factorized form is equivalent to its \abbrSMB expansion. For each term in the expanded form, further note that for all $b \in \{0, 1\}$ and all $e \geq 1$, $b^e = b$. By definition (see~\Cref{def:reduced-poly-one-bidb}), $\rpoly\inparen{\vct{X}}$ is the \abbrSMB expansion of $\poly\inparen{\vct{X}}$ followed by reducing every exponent $e > 1$ to $1$ and eliminating all cross terms for the \abbrBIDB case. Note that it must be that no cross terms exist in $\poly\inparen{\vct{X}}$, since by the proposition statement, $\probOf\pbox{\vct{W}} > 0$. Thus, since all monomials are indeed the same, it follows that $\poly\inparen{\vct{W}} = \rpoly\inparen{\vct{W}}$.
%Finally, note that there are exactly three cases where the expectation of a monomial term $\expct\left[c_{\vct{d}}\prod_{i = 1\; s.t.\; \vct{d}_i \geq 1}^\numvar X_i\right]$ is zero:
%(i) when $c_{\vct{d}} = 0$,
%(ii) when $p_i = 0$ for some $i$ where $\vct{d}_i \geq 1$, and
%(iii) when $X_i$ and $X_j$ are in the same block for some $i,j$ where $\vct{d}_i, \vct{d}_j \geq 1$.
\qed
\end{proof}
\subsection{Proof for~\Cref{lem:bin-bidb-phi-eq-redphi}}\label{subsec:proof-exp-poly-rpoly}
\begin{proof}
Let $\poly$ be a polynomial of $\numvar$ variables with highest degree $= \hideg$, defined as follows:
\[\poly(X_1,\ldots, X_\numvar) = \sum_{\vct{d} \in \{0,\ldots, \hideg\}^\numvar}c_{\vct{d}}\cdot \prod_{i = 1}^\numvar X_i^{d_i},\]
where $\tupset'$ has $\numvar$ tuples, we can equivalently write $\prod\limits_{\tup\in\tupset'}X_\tup^{d_\tup}$ for the product term.
%If we can prove that $\poly\inparen{\vct{W}} = \rpoly\inparen{\vct{W}}$ for any $\poly\inparen{\vct{X}}$ and any $\vct{W}$, then the proof holds for any $\refpoly{}\inparen{\vct{W}}$ since $\refpoly{}\inparen{\vct{W}}$ is \emph{itself} a polynomial as defined above.\footnote{This can be seen in converting $\refpoly{}\inparen{\vct{X}}$ into \abbrSMB.}
Let the boolean function $\isInd{\cdot}$ take $\vct{d}$ as input and return true if there does not exist any dependent variables in the monomial encoded by $\vct{d}$, i.e., for any block $\block\in\tupset'$, $\not\exists \tup, \tup' \in\block~|~\vct{d}_\tup, \vct{d}_{\tup'}\geq 1$.% \and $\tup, \tup'\in\tupset\implies \inparen{\vct{d}_t=0\vee\vct{d}_{\tup'} = 0}\$, $\not\exists ~\block, i\neq j\suchthat \block\supseteq\inset{\tup_i, \tup_j} \wedge d_{i}, d_{j} \geq 1$.
%\footnote{For generality of the proof, we are using a slightly different notation than the main paper, which treats a specific form of \abbrBIDB} For clarity, a \abbrOneBIDB polynomial $\poly\inparen{\vct{X}}$ with any variable $X_i$ such that $X_i\in\inset{0, \bound_\tup}, \bound_\tup\neq 1$ can equivalently replace $X_i$ with $\bound_\tup X_i$ while coercing the domain of $X_i$ to be $\inset{0, 1}$. Note that this setup addresses the general \abbrBIDB. In what follows, we assume that $\vct{X}$ (and hence $\vct{W}$) has a domain of $\inset{0, 1}$.
Then, given \abbrOneBIDB $\pdb$, query $\query$, and polynomial $\poly\inparen{\vct{W}} = \poly\pbox{\query, \tupset, \tup}$, in expectation we have
\begin{align}
\expct_{\vct{\randWorld}}\pbox{\poly(\vct{\randWorld})} &= \expct_{\vct{\randWorld}}\pbox{\sum_{\substack{\vct{d} \in \{0,\ldots,\hideg\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\tup\in\tupset'} \randWorld_i^{d_i} + \sum_{\substack{\vct{d} \in \{0,\ldots, \hideg\}^\numvar\\\wedge ~\neg\isInd{\vct{d}}}} c_{\vct{d}}\cdot\prod_{\tup\in\tupset'}\randWorld_i^{d_i}}\label{p1-s1a}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,\hideg\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \expct_{\vct{\randWorld}}\pbox{\prod_{\tup\in\tupset'} \randWorld_i^{d_i}} + \sum_{\substack{\vct{d} \in \{0,\ldots, \hideg\}^\numvar\\\wedge ~\neg\isInd{\vct{d}}}} c_{\vct{d}}\cdot\expct_{\vct{\randWorld}}\pbox{\prod_{\tup\in\tupset'}\randWorld_i^{d_i}}\label{p1-s1b}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,\hideg\}^\numvar\\~\wedge\isInd{\vct{d}}}}c_{\vct{d}}\cdot \expct_{\vct{\randWorld}}\pbox{\prod_{\tup\in\tupset'} \randWorld_i^{d_i}}\label{p1-s1c}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,\hideg\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\tup\in\tupset'} \expct_{\vct{\randWorld}}\pbox{\randWorld_i^{d_i}}\label{p1-s2}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,\hideg\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\tup\in\tupset'} \expct_{\vct{\randWorld}}\pbox{\randWorld_i}\label{p1-s3}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,\hideg\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\tup\in\tupset'} \prob_i\label{p1-s4}\\
&= \rpoly(\prob_1,\ldots, \prob_\numvar).\label{p1-s5}
\end{align}
\Cref{p1-s1a} is the result of substituting in the definition of $\poly$ given above. Then we arrive at \cref{p1-s1b} by linearity of expectation. Next, \cref{p1-s1c} is the result of the independence constraint of \abbrBIDB\xplural, specifically that any monomial composed of dependent variables, i.e., variables from the same block $\block$, has a probability of $0$. \Cref{p1-s2} is obtained by the fact that all variables in each monomial are independent, which allows for the expectation to be pushed through the product. In \cref{p1-s3}, recall the lineage construction semantics of~\Cref{fig:lin-poly-bidb} for a \abbrOneBIDB, where the annotation of a tuple $\tup$ with multiplicity $j$ is written as $j^{d_\tup}\cdot X_{\tup}^{d_\tup}$ such that $\domain\inparen{\vct{W}_\tup} = \inset{0, 1}$. Then $c_{\vct{d}}$ absorbs all such $j^{d_\tup}$ factors. Since $\randWorld_\tup \in \{0, 1\}$ it is the case that for any exponent $e \geq 1$, $\vct{W}_\tup^e = \vct{W}_\tup$. Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.
Finally, it can be verified that \Cref{p1-s5} follows since \cref{p1-s4} satisfies the construction of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ in \Cref{def:reduced-poly}.
\qed
\end{proof}
\subsection{Proof For Corollary~\ref{cor:expct-sop}}
\begin{proof}
Note that~\Cref{lem:tidb-reduce-poly} shows that $\expct\pbox{\poly} =$ $\rpoly(\prob_1,\ldots, \prob_\numvar)$. Therefore, if $\poly$ is already in \abbrSMB form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $\bigO{\abs{\poly}}$ computations.
\qed
\end{proof}
\subsection{Definition of $\polyf(\cdot)$}
\label{def:poly-func}
We formally define the relationship of circuits with polynomials. While the definition assumes one sink for notational convenience, it easily generalizes to the multiple sinks case.
$\polyf(\circuit)$ maps the sink of circuit $\circuit$ to its corresponding polynomial in \abbrSMB. $\polyf(\cdot)$ is recursively defined on $\circuit$ as follows, with addition and multiplication following the standard interpretation for polynomials:
\begin{equation*}
\polyf(\circuit) = \begin{cases}
\polyf(\circuit_\lchild) + \polyf(\circuit_\rchild) &\text{ if \circuit.\type } = \circplus\\
\polyf(\circuit_\lchild) \cdot \polyf(\circuit_\rchild) &\text{ if \circuit.\type } = \circmult\\
\circuit.\val &\text{ if \circuit.\type } = \var \text{ OR } \tnum.
\end{cases}
\end{equation*}
$\circuit$ need not encode $\poly\inparen{\vct{X}}$ in the same, default \abbrSMB representation. For instance, $\circuit$ could encode the factorized representation $(X + 2Y)(2X - Y)$ of $\poly\inparen{\vct{X}} = 2X^2+3XY-2Y^2$, as shown in \Cref{fig:circuit}, while $\polyf(\circuit) = \poly\inparen{\vct{X}}$ is always the equivalent \abbrSMB representation.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: