paper-BagRelationalPDBsAreHard/app_notation-background.tex

221 lines
21 KiB
TeX
Raw Normal View History

2021-04-06 11:43:34 -04:00
%root: main.tex
2021-04-09 22:00:34 -04:00
%!TEX root=./main.tex
\input{k-relations}
2021-04-06 11:43:34 -04:00
2021-06-11 11:22:58 -04:00
To justify the use of $\semNX$-databases, we need to show that we can encode any $\semN$-PDB in this way and that the query semantics over this representation coincides with query semantics over its respective $\semN$-PDB. For that it will be opportune to define representation systems for $\semN$-PDBs.\BG{cite}
2021-04-06 11:43:34 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Representation System]\label{def:representation-syste}
2021-09-18 12:16:56 -04:00
A representation system for $\semN$-PDBs is a tuple $(\reprs, \rmod)$ where $\reprs$ is a set of representations and $\rmod$ associates with each $\repr \in \reprs$ an $\semN$-PDB $\pdb$. We say that a representation system is \emph{closed} under a class of queries $\qClass$ if for any query $\query \in \qClass$ and $\repr \in \reprs$ we have:
2021-04-06 11:43:34 -04:00
%
\[ \rmod(\query(\repr)) = \query(\rmod(\repr)) \]
A representation system is \emph{complete} if for every $\semN$-PDB $\pdb$ there exists $\repr \in \reprs$ such that:
%
\[ \rmod(\repr) = \pdb \]
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-09-17 22:17:30 -04:00
As mentioned above we will use $\semNX$-databases paired with a probability distribution as a representation system, referring to such databases as \abbrNXPDB\xplural.
2021-09-18 12:16:56 -04:00
Given \abbrNXPDB $\pxdb$, one can think of the of $\pd$ as the probability distribution across all worlds $\inset{0, 1}^\numvar$. Denote a particular world to be $\vct{w}$. For convenience let $\assign_\vct{w}: \pxdb\rightarrow\pndb$ be a function that computes the corresponding $\semN$-\abbrPDB upon assigning all values $w_i \in \vct{w}$ to $X_i \in \vct{X}$ of $\db_{\semNX}$. Note the one-to-one correspondence between elements $\vct{w}\in\inset{0, 1}^\numvar$ to the worlds encoded by $\db_{\semNX}$ when $\vct{w}$ is assigned to $\vct{X}$ (assuming a domain of $\inset{0, 1}$ for each $X_i$). %and a probability distribution $\pd$ over assignments $\assign$ of the variables $\vct{X} = \{X_1, \ldots, X_\numvar\}$ occurring in annotations of $\idb_{\semNX}$ to $\{0,1\}$.
2021-06-11 11:22:58 -04:00
\AH{There was a big ICDT reviewer complaint in this section, but I don't know that I think it confuses things to think of them both an assignment and/or a vector of variables.}
2021-09-18 12:16:56 -04:00
%Note that an assignment $\assign: \vct{X} \to \{0,1\}^\numvar$ can be represented as a vector $\vct{w} \in \{0,1\}^n$ where $\vct{w}[i]$ records the value assigned to variable $X_i$. Thus, from now on we will solely use such vectors which we refer to as \emph{world vectors} and implicitly understand them to represent assignments.
We can think of $\assign_\vct{w}(\pxdb)\inparen{\tup}$ as the semiring homomorphism $\semNX \to \semN$ that applies the assignment $\vct{w}$ to all variables $\vct{X}$ of a polynomial and evaluates the resulting expression in $\semN$.
\BG{explain connection to homomorphism lifting in K-relations}
2021-04-06 11:43:34 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-09-18 12:16:56 -04:00
\begin{Definition}[$\rmod\inparen{\pxdb}$]\label{def:semnx-pdbs}
Given an \abbrNXPDB$\pxdb$, we compute its equivalent $\semN$-\abbrPDB $\pndb = \rmod\inparen{\pxdb} = \inparen{\idb, \pd'}$ as:
% over variables $\vct{X} = \{X_1, \ldots, X_n\}$ is a tuple $(\idb_{\semNX},\pd)$ where $\db$ is an $\semNX$-database and $\pd$ is a probability distribution over $\vct{w} \in \{0,1\}^n$. We use $\assign_{\vct{w}}$ to denote the assignment corresponding to $\vct{w} \in \{0,1\}^n$. The $\semN$-PDB $\rmod(\pxdb) = (\idb, \pd')$ encoded by $\pxdb$ is defined as:
2021-04-06 11:43:34 -04:00
\begin{align*}
\idb & = \{ \assign_{\vct{w}}(\pxdb) \mid \vct{w} \in \{0,1\}^n \} \\
2021-06-11 11:22:58 -04:00
\forall \db \in \idb: \probOf(\db) & = \sum_{\vct{w} \in \{0,1\}^n: \assign_{\vct{w}}(\pxdb) = \db} \probOf(\vct{w})
2021-04-06 11:43:34 -04:00
\end{align*}
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-09-17 22:17:30 -04:00
For instance, consider a $\pxdb$ consisting of a single tuple $\tup_1 = (1)$ annotated with $X_1 + X_2$ with probability distribution $\probOf([0,0]) = 0$, $\probOf([0,1]) = 0$, $\probOf([1,0]) = 0.3$ and $\probOf([1,1]) = 0.7$. This \abbrNXPDB encodes two possible worlds (with non-zero probability) that we denote using their world vectors.
2021-04-06 11:43:34 -04:00
%
\[
D_{[0,1]}(\tup_1) = 1 \hspace{0.3cm} \mathbf{and} \hspace{0.3cm} D_{[1,1]}(\tup_1) = 2
\]
%
2021-06-11 11:22:58 -04:00
\AH{I get the notation above, but we never formally introduced it.}
2021-09-18 12:16:56 -04:00
Importantly, as the following proposition shows, any finite $\semN$-PDB can be encoded as an \abbrNXPDB and \abbrNXPDB\xplural are closed under $\raPlus$\cite{}.
2021-09-17 22:17:30 -04:00
\AH{Is it a known result that \abbrNXPDB\xplural are closed under $\raPlus$ queries?}
2021-04-06 11:43:34 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Proposition}\label{prop:semnx-pdbs-are-a-}
2021-09-17 22:17:30 -04:00
\abbrNXPDB\xplural are a complete representation system for $\semN$-PDBs that is closed under $\raPlus$ queries.
2021-04-06 11:43:34 -04:00
\end{Proposition}
2021-04-10 13:21:10 -04:00
%\subsection{Proof of~\Cref{prop:semnx-pdbs-are-a-}}
2021-04-06 11:43:34 -04:00
\begin{proof}
2021-09-18 12:16:56 -04:00
To prove that \abbrNXPDB\xplural are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an \abbrNXPDB $\pxdb = (\db_{\semNX}, \pd')$ such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}.$ %and let $max(D_i)$
2021-06-11 11:22:58 -04:00
\AH{What are we using $max(D_i)$ for?}
denote $max_{\tup} D_i(\tup)$. For each world $D_i$ we create a corresponding variable $X_i$.
2021-04-06 11:43:34 -04:00
%variables $X_{i1}$, \ldots, $X_{im}$ where $m = max(D_i)$.
2021-09-18 12:16:56 -04:00
In $\db_{\semNX}$ we assign each tuple $\tup$ the polynomial:
2021-04-06 11:43:34 -04:00
%
\[
2021-09-18 12:16:56 -04:00
\db_{\semNX}(\tup) = \sum_{i=1}^{\abs{\idb}} D_i(\tup)\cdot X_{i}
2021-04-06 11:43:34 -04:00
\]
2021-09-18 12:16:56 -04:00
The probability distribution $\pd'$ assigns all world vectors zero probability except for $\abs{\idb}$ world vectors (representing the possible worlds) $\vct{w}_i$. All elements of $\vct{w}_i$ are zero except for the position corresponding to variables $X_{i}$ which is set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, \abbrNXPDB\xplural are a complete representation system.
Since $\semNX$ is the free object in the variety of semirings, Birkhoff's HSP theorem implies that any assignment $\vct{X} \to \semN$, which includes as a special case the assignments $\assign_{\vct{w}}$ used here, uniquely extends to the semiring homomorphism alluded to above, $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}: \semNX \to \semN$. For a polynomial $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}$ substitutes variables based on $\vct{w}$ and then evaluates the resulting expression in $\semN$. For instance, consider the polynomial $\pxdb\inparen{\tup} = \poly = X + Y$ and assignment $\vct{w} := X = 0, Y=1$. We get $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup} = 0 + 1 = 1$. % It is trivial to show that an assignment is a semiring homomorphism.
Closure under $\raPlus$ queries follows from this and from \cite{DBLP:conf/pods/GreenKT07}'s Proposition 3.5, which states that semiring homomorphisms commute with queries over $\semK$-relations.
2021-04-06 11:43:34 -04:00
2021-06-11 11:22:58 -04:00
2021-04-06 11:43:34 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-09-17 22:17:30 -04:00
\OK{Removed text here. c.f., Reviewer 1: "Proof of Proposition A.3. I seems the proof should end after l.687, since you already
proved everything from the statement of the proposition. I dont understand what it is
that you do after this line."}
% Now let us consider computing the expected multiplicity of a tuple $\tup$ in the result of a query $\query$ over an $\semN$-PDB $\pdb$ using the annotation of $\tup$ in the result of evaluating $\query$ over an \abbrNXPDB $\pxdb$ for which $\rmod(\pxdb) = \pdb$. The expectation of the polynomial $\poly = \query(\pxdb)(\tup)$ based on the probability distribution of $\pxdb$ over the variables in $\pxdb$ is:
% \AH{The wording ``...over the variables...'' I {\emph think} can be misleading since we also discuss a probability distribution $\pd$ being induced by a vector of probability assignments $\vct{p}$ to each variable $\pVar_i$.}
2021-04-06 11:43:34 -04:00
2021-09-17 22:17:30 -04:00
% \begin{equation}
% \expct_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})} = \sum_{\vct{w} \in \{0,1\}^n} \assign_{\vct{w}}(\query(\pxdb)(\tup)) \cdot \probOf(\vct{w})\label{eq:expect-q-nx}
% \end{equation}
2021-04-06 11:43:34 -04:00
2021-09-17 22:17:30 -04:00
% Since \abbrNXPDB\xplural $\pxdb$ are a complete representation system for $\semN$-PDBs which are closed under $\raPlus$, computing the expectation of the multiplicity of a tuple $t$ in the result of an $\raPlus$ query over the $\semN$-PDB $\rmod(\pxdb)$, is the same as computing the expectation of the polynomial $\query(\pxdb)(t)$.
2021-04-06 11:43:34 -04:00
\qed
\end{proof}
2021-04-09 22:00:34 -04:00
2021-09-17 22:17:30 -04:00
\subsection{\tis and \bis in the \abbrNXPDB model}\label{subsec:supp-mat-ti-bi-def}
Two important subclasses of \abbrNXPDB\xplural that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob_\tup$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is at most $1$. \AH{Reviewer complaint: This is not true by definition.}
2021-06-11 11:22:58 -04:00
The probability of such a world is the product of the probabilities of all tuples present in the world. %and one minus the sum of the probabilities of all tuples from blocks for which no tuple is present in the world.
2021-04-09 22:00:34 -04:00
For bag \tis and \bis, we define the probability of a tuple to be the probability that the tuple exists with multiplicity at least $1$.
2021-09-17 23:21:37 -04:00
In this work, we define \tis and \bis as subclasses of \abbrNXPDB\xplural defined over variables $\vct{X}$ (\Cref{def:semnx-pdbs}) where $\vct{X}$ can be partitioned into blocks that satisfy the conditions of a \ti or \bi (stated formally in \Cref{subsec:tidbs-and-bidbs}).
2021-04-09 22:00:34 -04:00
In this work, we consider one further deviation from the standard: We use bag semantics for queries.
2021-09-17 23:30:51 -04:00
Even though tuples cannot occur more than once in the input \ti or \bi, they can occur with a multiplicity larger than one in the result of a query.
Since in \tis and \bis, there is a one-to-one correspondence between tuples in the database and variables, we can interpret a vector $\vct{w} \in \{0,1\}^n$ as denoting which tuples exist in the possible world $\assign_{\vct{w}}(\pxdb)$ (the ones where $\vct{w}[j] = 1$).
2021-08-30 22:50:21 -04:00
For BIDBs specifically, note that at most one of the bits corresponding to tuples in each block will be set (i.e., for any pair of bits $w_j$, $w_{j'}$ that are part of the same block $b_i \supseteq \{t_{i,j}, t_{i,j'}\}$, at most one of them will be set).
2021-04-09 22:00:34 -04:00
Denote the vector $\vct{p}$ to be a vector whose elements are the individual probabilities $\prob_i$ of each tuple $\tup_i$. Let $\pd^{(\vct{p})}$ denote the distribution induced by $\vct{p}$.
%
\begin{align}\label{eq:tidb-expectation}
2021-09-17 23:30:51 -04:00
\expct_{\vct{W} \sim \pd^{(\vct{p})}}\pbox{\poly(\vct{W})}
2021-04-09 22:00:34 -04:00
= \sum\limits_{\substack{\vct{w} \in \{0, 1\}^\numvar\\ s.t. w_j,w_{j'} = 1 \rightarrow \not \exists b_i \supseteq \{t_{i,j}, t_{i',j}\}}} \poly(\vct{w})\prod_{\substack{j \in [\numvar]\\ s.t. \wElem_j = 1}}\prob_j \prod_{\substack{j \in [\numvar]\\s.t. w_j = 0}}\left(1 - \prob_i\right)
\end{align}
%
2021-04-10 13:21:10 -04:00
Recall that tuple blocks in a TIDB always have size 1, so the outer summation of \cref{eq:tidb-expectation} is over the full set of vectors.
\AH{Have cut and pasted the subsequent text. Need to verify this is the appropriate place for it.}
Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_\numvar)$ with natural number coefficients and exponents.
2021-09-17 23:30:51 -04:00
We model incomplete relations using Green et. al.'s $\semNX$-databases~\cite{DBLP:conf/pods/GreenKT07}, discussed in detail in \Cref{subsec:supp-mat-krelations}.
$\semNX$-databases are functions from tuples to elements of $\semNX$, typically called annotations.
2021-09-17 23:30:51 -04:00
Given an $\semNX$-database $\db$, it is common to use $\db(\tup)$ to denote the polynomial annotating tuple $\tup$ in $\db$.
%Note that based on this definition of $\rel$, $\rel(\tup)$ is the lineage polynomial for $\tup$.
Let $\numvar$ be the number of tuples in $\pdb$. Then, each possible world is defined by an assignment of $\numvar$ binary values $\vct{\wElem} \in \{0, 1\}^{\numvar}$ to $\vct{X}$.
The multiplicity of $\tup \in \db$, denoted $\db(\tup)(\vct{\wElem})$, is obtained by evaluating the polynomial annotating $\tup$ on $\vct{\wElem}$.
$\semNX$-relations are closed under $\raPlus$ (\Cref{fig:nxDBSemantics}).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-09-17 22:17:30 -04:00
We will use \abbrNXPDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$ over the assignments to $\vct{X}$.
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit \abbrNXPDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{\numvar} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
\abbrNXPDB\xplural and a function $\rmod$ (which transforms an \abbrNXPDB to a classical bag-\abbrPDB, or $\semN$-\abbrPDB~\cite{DBLP:conf/pods/GreenKT07,feng:2019:sigmod:uncertainty}) are both formalized in \Cref{subsec:supp-mat-background}.
2021-04-09 22:00:34 -04:00
\BG{Oliver's conjecture: Bag-\tis + Q can express any finite bag-PDB:
A well-known result for set semantics PDBs is that while not all finite PDBs can be encoded as \tis, any finite PDB can be encoded using a \ti and a query. An analog result holds in our case: any finite $\semN$-PDB can be encoded as a bag \ti and a query (WHAT CLASS? ADD PROOF)
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-04-10 13:21:10 -04:00
\subsection{Proof of~\Cref{prop:expection-of-polynom}}
2021-04-06 11:43:34 -04:00
\label{subsec:expectation-of-polynom-proof}
\begin{proof}
2021-09-17 22:17:30 -04:00
We need to prove for $\semN$-PDB $\pdb = (\idb,\pd)$ and \abbrNXPDB $\pxdb = (\db',\pd')$ where $\rmod(\pxdb) = \pdb$ that $\expct_{\randDB\sim \pd}[\query(\db)(t)] = \expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}$
2021-04-06 11:43:34 -04:00
By expanding $\polyForTuple$ and the expectation we have:
\begin{align*}
\expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}
2021-06-11 11:22:58 -04:00
& = \sum_{\vct{w} \in \{0,1\}^n}\probOf(\vct{w}) \cdot Q(\pxdb)(t)(\vct{w})\\
2021-04-06 11:43:34 -04:00
\intertext{From $\rmod(\pxdb) = \pdb$, we have that the range of $\assign_{\vct{w}(\pxdb)}$ is $\idb$, so}
2021-06-11 11:22:58 -04:00
& = \sum_{\db \in \idb}\;\;\sum_{\vct{w} \in \{0,1\}^n : \assign_{\vct{w}}(\pxdb) = \db}\probOf(\vct{w}) \cdot Q(\pxdb)(t)(\vct{w})\\
2021-04-06 11:43:34 -04:00
\intertext{In the inner sum, $\assign_{\vct{w}}(\pxdb) = \db$, so by distributivity of $+$ over $\times$}
2021-06-11 11:22:58 -04:00
& = \sum_{\db \in \idb}\query(\db)(t)\sum_{\vct{w} \in \{0,1\}^n : \assign_{\vct{w}}(\pxdb) = \db}\probOf(\vct{w})\\
\intertext{From the definition of $\pd$ in \cref{def:semnx-pdbs}, given $\rmod(\pxdb) = \pdb$, we get}
2021-04-06 11:43:34 -04:00
& = \sum_{\db \in \idb}\query(\db)(t) \cdot \probOf(D) \quad = \expct_{\db \sim \pd}[\query(\db)(t)]
\end{align*}
\qed
\end{proof}
2021-04-09 22:00:34 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-04-10 13:21:10 -04:00
\subsection{~\Cref{lem:pre-poly-rpoly}}\label{app:subsec-pre-poly-rpoly}
2021-04-06 11:43:34 -04:00
\begin{Lemma}\label{lem:pre-poly-rpoly}
If
2021-06-11 11:22:58 -04:00
$\poly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} = \{d_1,\ldots, d_\numvar\}\in \domN^\numvar}c_{\vct{d}} \cdot \prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i^{d_i}$
2021-04-06 11:43:34 -04:00
then
2021-06-11 11:22:58 -04:00
$\rpoly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} = \{d_1,\ldots, d_\numvar\}\in \semN^\numvar} c_{\vct{d}}\cdot\prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i$% \;\;\; for some $\eta \subseteq \{0,\ldots, B\}^\numvar$
2021-04-06 11:43:34 -04:00
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-04-10 13:21:10 -04:00
\begin{proof}%[Proof for~\Cref{lem:pre-poly-rpoly}]
2021-09-17 23:30:51 -04:00
Follows by the construction of $\rpoly$ in \cref{def:reduced-bi-poly}.
2021-04-06 11:43:34 -04:00
\qed
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-04-10 16:18:04 -04:00
\subsection{Proposition~\ref{proposition:q-qtilde}}\label{app:subsec-prop-q-qtilde}
2021-04-06 11:43:34 -04:00
\noindent Note the following fact:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2021-09-17 22:17:30 -04:00
\begin{Proposition}\label{proposition:q-qtilde} For any \bi-lineage polynomial $\poly(X_1, \ldots, X_\numvar)$ and all $\vct{w}$ such that $\probOf[\vct{W} = \vct{w}] > 0$, it holds that
2021-04-06 11:43:34 -04:00
$% \[
\poly(\vct{w}) = \rpoly(\vct{w}).
$% \]
\end{Proposition}
2021-04-10 13:21:10 -04:00
\begin{proof}%[Proof for~\Cref{proposition:q-qtilde}]
2021-09-17 23:30:51 -04:00
Note that any $\poly$ in factorized form is equivalent to its \abbrSMB expansion. For each term in the expanded form, further note that for all $b \in \{0, 1\}$ and all $e \geq 1$, $b^e = b$.
Finally, note that there are exactly three cases where the expectation of a monomial term $\expct\left[c_{\vct{d}}\prod_{i = n\; s.t.\; \vct{d}_i \geq 1}X_i\right]$ is zero:
(i) when $c_{\vct{d}} = 0$,
(ii) when $p_i = 0$ for some $i$ where $\vct{d}_i \geq 1$, and
2021-09-17 22:17:30 -04:00
(iii) when $X_i$ and $X_j$ are in the same block for some $i,j$ where $\vct{d}_i, \vct{d}_j \geq 1$.
2021-04-06 11:43:34 -04:00
\qed
\end{proof}
\subsection{Proof for Lemma~\ref{lem:exp-poly-rpoly}}\label{subsec:proof-exp-poly-rpoly}
2021-04-06 11:43:34 -04:00
\begin{proof}
Let $\poly$ be the generalized polynomial, i.e., the polynomial of $\numvar$ variables with highest degree $= B$: %, in which every possible monomial permutation appears,
\[\poly(X_1,\ldots, X_\numvar) = \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar X_i^{d_i}.\]
2021-08-31 15:06:12 -04:00
Let the boolean function $\isInd{\cdot}$ take $\vct{d}$ as input and return true if there does not exist any dependent variables in $\vct{d}$, i.e., $\not\exists ~\block, i\neq j\suchthat d_{\block, i}, d_{\block, j} \geq 1$.\footnote{This \abbrBIDB notation is used and discussed in \cref{subsec:tidbs-and-bidbs}}.
Then in expectation we have
2021-04-06 11:43:34 -04:00
\begin{align}
2021-08-31 15:06:12 -04:00
\expct_{\vct{\randWorld}}\pbox{\poly(\vct{\randWorld})} &= \expct_{\vct{\randWorld}}\pbox{\sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i} + \sum_{\substack{\vct{d} \in \{0,\ldots, B\}^\numvar\\\wedge ~\neg\isInd{\vct{d}}}} c_{\vct{d}}\cdot\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar\randWorld_i^{d_i}}\label{p1-s1a}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \expct_{\vct{\randWorld}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i}} + \sum_{\substack{\vct{d} \in \{0,\ldots, B\}^\numvar\\\wedge ~\neg\isInd{\vct{d}}}} c_{\vct{d}}\cdot\expct_{\vct{\randWorld}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar\randWorld_i^{d_i}}\label{p1-s1b}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\~\wedge\isInd{\vct{d}}}}c_{\vct{d}}\cdot \expct_{\vct{\randWorld}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i}}\label{p1-s1c}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{\randWorld}}\pbox{\randWorld_i^{d_i}}\label{p1-s2}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{\randWorld}}\pbox{\randWorld_i}\label{p1-s3}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \prob_i\label{p1-s4}\\
2021-08-30 22:50:21 -04:00
&= \rpoly(\prob_1,\ldots, \prob_\numvar).\label{p1-s5}
2021-04-06 11:43:34 -04:00
\end{align}
2021-09-01 11:27:11 -04:00
\Cref{p1-s1a} is the result of substituting in the definition of $\poly$ given above. Then we arrive at \cref{p1-s1b} by linearity of expectation. Next, \cref{p1-s1c} is the result of the independence constraint of \abbrBIDB\xplural, specifically that any monomial composed of dependent variables, i.e., variables from the same block $\block$, has a probability of $0$. \Cref{p1-s2} is obtained by the fact that all variables in each monomial are independent, which allows for the expectation to be pushed through the product. In \cref{p1-s3}, since $\randWorld_i \in \{0, 1\}$ it is the case that for any exponent $e \geq 1$, $\randWorld_i^e = \randWorld_i$. Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.
2021-04-06 11:43:34 -04:00
2021-09-01 11:27:11 -04:00
Finally, it can be verified that \Cref{p1-s5} follows since \cref{p1-s4} satisfies the construction of \Cref{lem:pre-poly-rpoly}, i.e. $\rpoly(\prob_1,\ldots, \prob_\numvar)$ is exactly the product of probabilities of each variable in each monomial and its corresponding coefficient, across the entire sum.
2021-04-06 11:43:34 -04:00
\qed
\end{proof}
2021-04-10 16:18:04 -04:00
\subsection{Proof For Corollary~\ref{cor:expct-sop}}
2021-04-06 11:43:34 -04:00
\begin{proof}
2021-06-11 11:22:58 -04:00
Note that \cref{lem:exp-poly-rpoly} shows that $\expct\pbox{\poly} =$ $\rpoly(\prob_1,\ldots, \prob_\numvar)$. Therefore, if $\poly$ is already in \abbrSMB form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $\bigO{\size\inparen{\smbOf{\poly}}}$ computations.
2021-04-06 11:43:34 -04:00
\qed
2021-09-17 23:30:51 -04:00
\end{proof}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: