paper-BagRelationalPDBsAreHard/app_notation-background.tex

%root: main.tex
%!TEX root=./main.tex

\input{k-relations}

To justify the use of $\semNX$-databases, we need to show that we can encode any $\semN$-PDB in this way and that the query semantics over this representation coincides with query semantics over its respective $\semN$-PDB. For that it will be opportune to define representation systems for $\semN$-PDBs.\BG{cite}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Representation System]\label{def:representation-syste}
  A representation system for $\semN$-PDBs is a tuple $(\reprs, \rmod)$ where $\reprs$ is a set of representations and $\rmod$ associates with each $\repr \in \reprs$ an $\semN$-PDB $\pdb$. We say that a representation system is \emph{closed} under a class of queries $\qClass$ if for any query $\query \in \qClass$ and $\repr \in \reprs$ we have:
%
  \[ \rmod(\query(\repr)) = \query(\rmod(\repr)) \]

  A representation system is \emph{complete} if for every $\semN$-PDB $\pdb$ there exists $\repr \in \reprs$ such that:
%
  \[ \rmod(\repr) = \pdb \]

\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

As mentioned above we will use $\semNX$-databases paired with a probability distribution as a representation system, referring to such databases as \abbrNXPDB\xplural.
Given \abbrNXPDB $\pxdb$, one can think of the of $\pd$ as the probability distribution across all worlds $\inset{0, 1}^\numvar$.  Denote a particular world to be $\vct{w}$.  For convenience let $\assign_\vct{w}: \pxdb\rightarrow\pndb$ be a function that computes the corresponding $\semN$-\abbrPDB upon assigning all values $w_i \in \vct{w}$ to $X_i \in \vct{X}$ of $\db_{\semNX}$.  Note the one-to-one correspondence between elements $\vct{w}\in\inset{0, 1}^\numvar$ to the worlds encoded by $\db_{\semNX}$ when $\vct{w}$ is assigned to $\vct{X}$ (assuming a domain of $\inset{0, 1}$ for each $X_i$).   %and a probability distribution $\pd$ over assignments $\assign$ of the variables $\vct{X} = \{X_1, \ldots, X_\numvar\}$  occurring in annotations of $\idb_{\semNX}$ to $\{0,1\}$.
\AH{There was a big ICDT reviewer complaint in this section, but I don't know that I think it confuses things to think of them both an assignment and/or a vector of variables.}
%Note that an assignment $\assign: \vct{X} \to \{0,1\}^\numvar$ can be represented as a vector $\vct{w} \in \{0,1\}^n$ where $\vct{w}[i]$ records the value assigned to variable $X_i$. Thus, from now on we will solely use such vectors which we refer to as \emph{world vectors} and implicitly understand them to represent assignments. 
We can think of $\assign_\vct{w}(\pxdb)\inparen{\tup}$ as the semiring homomorphism $\semNX \to \semN$ that applies the assignment $\vct{w}$ to all variables $\vct{X}$ of a polynomial and evaluates the resulting expression in $\semN$.
\BG{explain connection to homomorphism lifting in K-relations}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[$\rmod\inparen{\pxdb}$]\label{def:semnx-pdbs}
  Given an \abbrNXPDB$\pxdb$, we compute its equivalent $\semN$-\abbrPDB $\pndb = \rmod\inparen{\pxdb} = \inparen{\idb, \pd'}$ as:
  % over variables $\vct{X} = \{X_1, \ldots, X_n\}$ is a tuple $(\idb_{\semNX},\pd)$ where $\db$ is an $\semNX$-database and $\pd$ is a probability distribution over $\vct{w} \in \{0,1\}^n$. We use $\assign_{\vct{w}}$ to denote the assignment corresponding to $\vct{w} \in \{0,1\}^n$. The $\semN$-PDB $\rmod(\pxdb) = (\idb, \pd')$ encoded by $\pxdb$ is defined as:
  \begin{align*}
    \idb      & = \{ \assign_{\vct{w}}(\pxdb) \mid \vct{w} \in  \{0,1\}^n \} \\
    \forall \db \in \idb: \probOf(\db) & = \sum_{\vct{w} \in \{0,1\}^n: \assign_{\vct{w}}(\pxdb) = \db} \probOf(\vct{w})
  \end{align*}
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

For instance, consider a $\pxdb$ consisting of a single tuple $\tup_1 = (1)$ annotated with $X_1 + X_2$ with probability distribution $\probOf([0,0]) = 0$, $\probOf([0,1]) = 0$, $\probOf([1,0]) = 0.3$ and $\probOf([1,1]) = 0.7$. This \abbrNXPDB encodes two possible worlds (with non-zero probability) that we denote using their world vectors.
%
\[
  D_{[0,1]}(\tup_1) = 1 \hspace{0.3cm} \mathbf{and} \hspace{0.3cm} D_{[1,1]}(\tup_1) = 2
\]
%
\AH{I get the notation above, but we never formally introduced it.}
Importantly, as the following proposition shows, any finite $\semN$-PDB can be encoded as an \abbrNXPDB and \abbrNXPDB\xplural are closed under $\raPlus$\cite{}.
\AH{Is it a known result that \abbrNXPDB\xplural are closed under $\raPlus$ queries?}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Proposition}\label{prop:semnx-pdbs-are-a-}
\abbrNXPDB\xplural are a complete representation system for $\semN$-PDBs that is closed under $\raPlus$ queries.
\end{Proposition}

%\subsection{Proof of~\Cref{prop:semnx-pdbs-are-a-}}
\begin{proof}
To prove that \abbrNXPDB\xplural are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an \abbrNXPDB $\pxdb = (\db_{\semNX}, \pd')$  such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}.$ %and let $max(D_i)$
\AH{What are we using $max(D_i)$ for?}
 denote $max_{\tup} D_i(\tup)$. For each world $D_i$ we create a corresponding variable $X_i$.
%variables $X_{i1}$, \ldots, $X_{im}$ where $m = max(D_i)$.
In $\db_{\semNX}$ we assign each tuple $\tup$ the polynomial:
%
  \[
 \db_{\semNX}(\tup) = \sum_{i=1}^{\abs{\idb}} D_i(\tup)\cdot X_{i}
  \]
The probability distribution $\pd'$ assigns all world vectors zero probability except for $\abs{\idb}$ world vectors (representing the possible worlds) $\vct{w}_i$. All elements of $\vct{w}_i$ are zero except for the position corresponding to variables $X_{i}$ which is set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, \abbrNXPDB\xplural are a complete representation system.

Since $\semNX$ is the free object in the variety of semirings, Birkhoff's HSP theorem implies that any assignment $\vct{X} \to \semN$, which includes as a special case the assignments $\assign_{\vct{w}}$ used here, uniquely extends to the semiring homomorphism alluded to above, $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}: \semNX \to \semN$. For a polynomial $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}$ substitutes variables based on $\vct{w}$ and then evaluates the resulting expression in $\semN$. For instance, consider the polynomial $\pxdb\inparen{\tup} = \poly = X + Y$ and assignment $\vct{w} := X = 0, Y=1$. We get $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup} = 0 + 1 = 1$. % It is trivial to show that an assignment  is a semiring homomorphism.
Closure under $\raPlus$ queries follows from this and from \cite{DBLP:conf/pods/GreenKT07}'s Proposition 3.5, which states that semiring homomorphisms commute with queries over $\semK$-relations.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\OK{Removed text here. c.f., Reviewer 1: "Proof of Proposition A.3. I seems the proof should end after l.687, since you already
proved everything from the statement of the proposition. I don’t understand what it is
that you do after this line."}
% Now let us consider computing the expected multiplicity of a tuple $\tup$ in the result of a query $\query$ over an $\semN$-PDB $\pdb$ using the annotation of $\tup$ in the result of evaluating $\query$ over an \abbrNXPDB $\pxdb$ for which $\rmod(\pxdb) = \pdb$. The expectation of the polynomial $\poly = \query(\pxdb)(\tup)$ based on the probability distribution of $\pxdb$ over the variables in $\pxdb$ is:
% \AH{The wording ``...over the variables...'' I {\emph think} can be misleading since we also discuss a probability distribution $\pd$ being induced by a vector of probability assignments $\vct{p}$ to each variable $\pVar_i$.}

% \begin{equation}
%   \expct_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})} = \sum_{\vct{w} \in \{0,1\}^n} \assign_{\vct{w}}(\query(\pxdb)(\tup)) \cdot \probOf(\vct{w})\label{eq:expect-q-nx}
% \end{equation}

% Since \abbrNXPDB\xplural $\pxdb$ are a complete representation system for $\semN$-PDBs which are closed under $\raPlus$, computing the expectation of the  multiplicity of a tuple $t$ in the result of an $\raPlus$ query over the $\semN$-PDB $\rmod(\pxdb)$, is the same as computing the expectation of the polynomial $\query(\pxdb)(t)$.
\qed
\end{proof}


\subsection{\tis and \bis in the \abbrNXPDB model}\label{subsec:supp-mat-ti-bi-def}
Two important subclasses of \abbrNXPDB\xplural that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob_\tup$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated  as independent  random events. In a \bi, we also  assign each tuple a  probability,  but  additionally partition  $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets  of $\db$ that contain at most one tuple  from each block.  Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is at most $1$.  \AH{Reviewer complaint:  This is not true by definition.}
The probability of such a world is the product of the probabilities of all tuples present in the world.  %and one minus the sum of the probabilities of all tuples from blocks for which no  tuple is present in the world.
For bag \tis and \bis, we define the probability of a tuple to  be the probability that the tuple exists with multiplicity at least $1$.

In this work, we define \tis and \bis as subclasses of \abbrNXPDB\xplural defined over variables $\vct{X}$ (\Cref{def:semnx-pdbs}) where $\vct{X}$ can be partitioned into blocks that satisfy the conditions of a \ti or \bi (stated formally in \Cref{subsec:tidbs-and-bidbs}).
In this work, we consider one further deviation from the standard: We use bag semantics for queries.
Even though tuples cannot occur more than once in the input \ti or \bi, they can occur with a multiplicity larger than one in the result of a query.
Since in \tis and \bis, there is a one-to-one correspondence between tuples in the database and variables, we can interpret a vector $\vct{w} \in \{0,1\}^n$ as denoting which tuples exist in the possible world $\assign_{\vct{w}}(\pxdb)$ (the ones where $\vct{w}[j] = 1$).
For BIDBs specifically, note that at most one of the bits corresponding to tuples in each block will be set (i.e., for any pair of bits $w_j$, $w_{j'}$ that are part of the same block $b_i \supseteq \{t_{i,j}, t_{i,j'}\}$, at most one of them will be set).
Denote the vector $\vct{p}$ to be a vector whose elements are the individual probabilities $\prob_i$ of each tuple $\tup_i$.  Let $\pd^{(\vct{p})}$ denote the distribution induced by $\vct{p}$.
%
\begin{align}\label{eq:tidb-expectation}
\expct_{\vct{W} \sim \pd^{(\vct{p})}}\pbox{\poly(\vct{W})}
  = \sum\limits_{\substack{\vct{w} \in \{0, 1\}^\numvar\\ s.t. w_j,w_{j'} = 1 \rightarrow \not \exists b_i \supseteq \{t_{i,j}, t_{i',j}\}}} \poly(\vct{w})\prod_{\substack{j \in [\numvar]\\ s.t. \wElem_j = 1}}\prob_j \prod_{\substack{j \in [\numvar]\\s.t. w_j = 0}}\left(1 - \prob_i\right)
\end{align}
%
Recall that tuple blocks in a TIDB always have size 1, so the outer summation of \cref{eq:tidb-expectation} is over the full set of vectors.
\AH{Have cut and pasted the subsequent text.  Need to verify this is the appropriate place for it.}
Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_\numvar)$ with natural number coefficients and exponents.
We model incomplete relations using Green et. al.'s $\semNX$-databases~\cite{DBLP:conf/pods/GreenKT07}, discussed in detail in \Cref{subsec:supp-mat-krelations}.
 $\semNX$-databases are functions from tuples to elements of $\semNX$, typically called annotations.
Given an $\semNX$-database $\db$,  it is common to use $\db(\tup)$ to denote the polynomial annotating tuple $\tup$ in $\db$.
%Note that based on this definition of $\rel$, $\rel(\tup)$ is the lineage polynomial for $\tup$.
Let $\numvar$ be the number of tuples in $\pdb$.  Then, each possible world is defined by an assignment of $\numvar$ binary values $\vct{\wElem} \in \{0, 1\}^{\numvar}$ to $\vct{X}$.
The multiplicity of $\tup \in \db$, denoted $\db(\tup)(\vct{\wElem})$, is obtained by evaluating the polynomial annotating $\tup$ on $\vct{\wElem}$.
$\semNX$-relations are closed under $\raPlus$ (\Cref{fig:nxDBSemantics}).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

We will use \abbrNXPDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$ over the assignments to $\vct{X}$.
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit \abbrNXPDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{\numvar} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
\abbrNXPDB\xplural and a function $\rmod$ (which transforms an \abbrNXPDB to  a classical bag-\abbrPDB, or $\semN$-\abbrPDB~\cite{DBLP:conf/pods/GreenKT07,feng:2019:sigmod:uncertainty}) are both formalized in \Cref{subsec:supp-mat-background}.

\BG{Oliver's conjecture: Bag-\tis + Q can express any finite bag-PDB:
A well-known result for set semantics PDBs is that while not all finite PDBs can be encoded as \tis, any finite PDB can be encoded using a \ti and a query. An analog result holds in our case: any finite $\semN$-PDB can be encoded as a bag \ti and a query (WHAT CLASS? ADD PROOF)
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Proof of~\Cref{prop:expection-of-polynom}}
\label{subsec:expectation-of-polynom-proof}
\begin{proof}
We need to prove for $\semN$-PDB $\pdb = (\idb,\pd)$ and \abbrNXPDB $\pxdb = (\db',\pd')$ where $\rmod(\pxdb) = \pdb$ that $\expct_{\randDB\sim \pd}[\query(\db)(t)] = \expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}$
By expanding $\polyForTuple$ and the expectation we have:
\begin{align*}
\expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}
& = \sum_{\vct{w} \in \{0,1\}^n}\probOf(\vct{w}) \cdot Q(\pxdb)(t)(\vct{w})\\
\intertext{From $\rmod(\pxdb) = \pdb$, we have that the range of $\assign_{\vct{w}(\pxdb)}$ is $\idb$, so}
& = \sum_{\db \in \idb}\;\;\sum_{\vct{w} \in \{0,1\}^n : \assign_{\vct{w}}(\pxdb) = \db}\probOf(\vct{w}) \cdot Q(\pxdb)(t)(\vct{w})\\
\intertext{In the inner sum, $\assign_{\vct{w}}(\pxdb) = \db$, so by distributivity of $+$ over $\times$}
& = \sum_{\db \in \idb}\query(\db)(t)\sum_{\vct{w} \in \{0,1\}^n : \assign_{\vct{w}}(\pxdb) = \db}\probOf(\vct{w})\\
\intertext{From the definition of $\pd$ in \cref{def:semnx-pdbs}, given $\rmod(\pxdb) = \pdb$, we get}
& = \sum_{\db \in \idb}\query(\db)(t) \cdot \probOf(D) \quad = \expct_{\db \sim \pd}[\query(\db)(t)]
\end{align*}
\qed
\end{proof}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{~\Cref{lem:pre-poly-rpoly}}\label{app:subsec-pre-poly-rpoly}
\begin{Lemma}\label{lem:pre-poly-rpoly}
If
$\poly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} = \{d_1,\ldots, d_\numvar\}\in \domN^\numvar}c_{\vct{d}} \cdot \prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i^{d_i}$
then
$\rpoly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} = \{d_1,\ldots, d_\numvar\}\in \semN^\numvar} c_{\vct{d}}\cdot\prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i$% \;\;\;  for some $\eta \subseteq \{0,\ldots, B\}^\numvar$
\end{Lemma}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}%[Proof for~\Cref{lem:pre-poly-rpoly}]
Follows by the construction of $\rpoly$ in \cref{def:reduced-bi-poly}.
\qed
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\subsection{Proposition~\ref{proposition:q-qtilde}}\label{app:subsec-prop-q-qtilde}
\noindent Note the following fact:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Proposition}\label{proposition:q-qtilde} For any \bi-lineage polynomial $\poly(X_1, \ldots, X_\numvar)$ and all $\vct{w}$ such that $\probOf[\vct{W} = \vct{w}] > 0$, it holds that
$%  \[
    \poly(\vct{w}) = \rpoly(\vct{w}).
$%    \]
\end{Proposition}

\begin{proof}%[Proof for~\Cref{proposition:q-qtilde}]
Note that any $\poly$ in factorized form is equivalent to its \abbrSMB expansion.  For each term in the expanded form, further note that for all $b \in \{0, 1\}$ and all $e \geq 1$, $b^e = b$.
Finally, note that there are exactly three cases where the expectation of a monomial term $\expct\left[c_{\vct{d}}\prod_{i = n\; s.t.\; \vct{d}_i \geq 1}X_i\right]$ is zero:
(i) when $c_{\vct{d}} = 0$,
(ii) when $p_i = 0$ for some $i$ where $\vct{d}_i \geq 1$, and
(iii) when $X_i$ and $X_j$ are in the same block for some $i,j$ where $\vct{d}_i, \vct{d}_j \geq 1$.
\qed
\end{proof}


\subsection{Proof for Lemma~\ref{lem:exp-poly-rpoly}}\label{subsec:proof-exp-poly-rpoly}
\begin{proof}
Let $\poly$ be the generalized polynomial, i.e., the polynomial of $\numvar$ variables with highest degree $= B$: %, in which every possible monomial permutation appears,
\[\poly(X_1,\ldots, X_\numvar) = \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar X_i^{d_i}.\]
Let the boolean function $\isInd{\cdot}$ take $\vct{d}$ as input and return true if there does not exist any dependent variables in $\vct{d}$, i.e., $\not\exists ~\block, i\neq j\suchthat d_{\block, i}, d_{\block, j} \geq 1$.\footnote{This \abbrBIDB notation is used and discussed in \cref{subsec:tidbs-and-bidbs}}.
Then in expectation we have
\begin{align}
\expct_{\vct{\randWorld}}\pbox{\poly(\vct{\randWorld})} &= \expct_{\vct{\randWorld}}\pbox{\sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i} + \sum_{\substack{\vct{d} \in \{0,\ldots, B\}^\numvar\\\wedge ~\neg\isInd{\vct{d}}}} c_{\vct{d}}\cdot\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar\randWorld_i^{d_i}}\label{p1-s1a}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \expct_{\vct{\randWorld}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i}} + \sum_{\substack{\vct{d} \in \{0,\ldots, B\}^\numvar\\\wedge ~\neg\isInd{\vct{d}}}} c_{\vct{d}}\cdot\expct_{\vct{\randWorld}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar\randWorld_i^{d_i}}\label{p1-s1b}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\~\wedge\isInd{\vct{d}}}}c_{\vct{d}}\cdot \expct_{\vct{\randWorld}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i}}\label{p1-s1c}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{\randWorld}}\pbox{\randWorld_i^{d_i}}\label{p1-s2}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{\randWorld}}\pbox{\randWorld_i}\label{p1-s3}\\
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \prob_i\label{p1-s4}\\
&= \rpoly(\prob_1,\ldots, \prob_\numvar).\label{p1-s5}
\end{align}
\Cref{p1-s1a} is the result of substituting in the definition of $\poly$ given above.  Then we arrive at \cref{p1-s1b} by linearity of expectation.  Next, \cref{p1-s1c} is the result of the independence constraint of \abbrBIDB\xplural, specifically that any monomial composed of dependent variables, i.e., variables from the same block $\block$, has a probability of $0$.  \Cref{p1-s2} is obtained by the fact that all variables in each monomial are independent, which allows for the expectation to be pushed through the product.  In \cref{p1-s3}, since $\randWorld_i \in \{0, 1\}$ it is the case that for any exponent $e \geq 1$, $\randWorld_i^e = \randWorld_i$.  Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.

Finally, it can be verified that \Cref{p1-s5} follows since \cref{p1-s4} satisfies the construction of \Cref{lem:pre-poly-rpoly}, i.e. $\rpoly(\prob_1,\ldots, \prob_\numvar)$ is exactly the product of probabilities of each variable in each monomial and its corresponding coefficient, across the entire sum.
\qed
\end{proof}


\subsection{Proof For Corollary~\ref{cor:expct-sop}}
\begin{proof}
Note that \cref{lem:exp-poly-rpoly} shows that $\expct\pbox{\poly} =$ $\rpoly(\prob_1,\ldots, \prob_\numvar)$.  Therefore, if $\poly$ is already in \abbrSMB form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $\bigO{\size\inparen{\smbOf{\poly}}}$ computations.
\qed
\end{proof}


%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End:
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								%root: main.tex
-												Cleaning up appendix

											
										
										
											2021-04-09 22:00:34 -04:00
+								%!TEX root=./main.tex
 								\input{k-relations}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								To justify the use of $\semNX$-databases, we need to show that we can encode any $\semN$-PDB in this way and that the query semantics over this representation coincides with query semantics over its respective $\semN$-PDB. For that it will be opportune to define representation systems for $\semN$-PDBs.\BG{cite}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 								\begin{Definition}[Representation System]\label{def:representation-syste}
-												Rewrote S2.

											
										
										
											2021-09-18 12:16:56 -04:00
+								  A representation system for $\semN$-PDBs is a tuple $(\reprs, \rmod)$ where $\reprs$ is a set of representations and $\rmod$ associates with each $\repr \in \reprs$ an $\semN$-PDB $\pdb$. We say that a representation system is \emph{closed} under a class of queries $\qClass$ if for any query $\query \in \qClass$ and $\repr \in \reprs$ we have:
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								%
 								  \[ \rmod(\query(\repr)) = \query(\rmod(\repr)) \]
 								  A representation system is \emph{complete} if for every $\semN$-PDB $\pdb$ there exists $\repr \in \reprs$ such that:
 								%
 								  \[ \rmod(\repr) = \pdb \]
 								\end{Definition}
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								As mentioned above we will use $\semNX$-databases paired with a probability distribution as a representation system, referring to such databases as \abbrNXPDB\xplural.
-												Rewrote S2.

											
										
										
											2021-09-18 12:16:56 -04:00
+								Given \abbrNXPDB $\pxdb$, one can think of the of $\pd$ as the probability distribution across all worlds $\inset{0, 1}^\numvar$.  Denote a particular world to be $\vct{w}$.  For convenience let $\assign_\vct{w}: \pxdb\rightarrow\pndb$ be a function that computes the corresponding $\semN$-\abbrPDB upon assigning all values $w_i \in \vct{w}$ to $X_i \in \vct{X}$ of $\db_{\semNX}$.  Note the one-to-one correspondence between elements $\vct{w}\in\inset{0, 1}^\numvar$ to the worlds encoded by $\db_{\semNX}$ when $\vct{w}$ is assigned to $\vct{X}$ (assuming a domain of $\inset{0, 1}$ for each $X_i$).   %and a probability distribution $\pd$ over assignments $\assign$ of the variables $\vct{X} = \{X_1, \ldots, X_\numvar\}$  occurring in annotations of $\idb_{\semNX}$ to $\{0,1\}$.
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								\AH{There was a big ICDT reviewer complaint in this section, but I don't know that I think it confuses things to think of them both an assignment and/or a vector of variables.}
-												Rewrote S2.

											
										
										
											2021-09-18 12:16:56 -04:00
+								%Note that an assignment $\assign: \vct{X} \to \{0,1\}^\numvar$ can be represented as a vector $\vct{w} \in \{0,1\}^n$ where $\vct{w}[i]$ records the value assigned to variable $X_i$. Thus, from now on we will solely use such vectors which we refer to as \emph{world vectors} and implicitly understand them to represent assignments.
 								We can think of $\assign_\vct{w}(\pxdb)\inparen{\tup}$ as the semiring homomorphism $\semNX \to \semN$ that applies the assignment $\vct{w}$ to all variables $\vct{X}$ of a polynomial and evaluates the resulting expression in $\semN$.
 								\BG{explain connection to homomorphism lifting in K-relations}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												Rewrote S2.

											
										
										
											2021-09-18 12:16:56 -04:00
+								\begin{Definition}[$\rmod\inparen{\pxdb}$]\label{def:semnx-pdbs}
 								  Given an \abbrNXPDB$\pxdb$, we compute its equivalent $\semN$-\abbrPDB $\pndb = \rmod\inparen{\pxdb} = \inparen{\idb, \pd'}$ as:
 								  % over variables $\vct{X} = \{X_1, \ldots, X_n\}$ is a tuple $(\idb_{\semNX},\pd)$ where $\db$ is an $\semNX$-database and $\pd$ is a probability distribution over $\vct{w} \in \{0,1\}^n$. We use $\assign_{\vct{w}}$ to denote the assignment corresponding to $\vct{w} \in \{0,1\}^n$. The $\semN$-PDB $\rmod(\pxdb) = (\idb, \pd')$ encoded by $\pxdb$ is defined as:
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								  \begin{align*}
 								    \idb      & = \{ \assign_{\vct{w}}(\pxdb) \mid \vct{w} \in  \{0,1\}^n \} \\
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								    \forall \db \in \idb: \probOf(\db) & = \sum_{\vct{w} \in \{0,1\}^n: \assign_{\vct{w}}(\pxdb) = \db} \probOf(\vct{w})
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								  \end{align*}
 								\end{Definition}
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								For instance, consider a $\pxdb$ consisting of a single tuple $\tup_1 = (1)$ annotated with $X_1 + X_2$ with probability distribution $\probOf([0,0]) = 0$, $\probOf([0,1]) = 0$, $\probOf([1,0]) = 0.3$ and $\probOf([1,1]) = 0.7$. This \abbrNXPDB encodes two possible worlds (with non-zero probability) that we denote using their world vectors.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								%
 								\[
 								  D_{[0,1]}(\tup_1) = 1 \hspace{0.3cm} \mathbf{and} \hspace{0.3cm} D_{[1,1]}(\tup_1) = 2
 								\]
 								%
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								\AH{I get the notation above, but we never formally introduced it.}
-												Rewrote S2.

											
										
										
											2021-09-18 12:16:56 -04:00
+								Importantly, as the following proposition shows, any finite $\semN$-PDB can be encoded as an \abbrNXPDB and \abbrNXPDB\xplural are closed under $\raPlus$\cite{}.
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								\AH{Is it a known result that \abbrNXPDB\xplural are closed under $\raPlus$ queries?}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 								\begin{Proposition}\label{prop:semnx-pdbs-are-a-}
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								\abbrNXPDB\xplural are a complete representation system for $\semN$-PDBs that is closed under $\raPlus$ queries.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\end{Proposition}
-												Minor tweaks

											
										
										
											2021-04-10 13:21:10 -04:00
+								%\subsection{Proof of~\Cref{prop:semnx-pdbs-are-a-}}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\begin{proof}
-												Rewrote S2.

											
										
										
											2021-09-18 12:16:56 -04:00
+								To prove that \abbrNXPDB\xplural are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an \abbrNXPDB $\pxdb = (\db_{\semNX}, \pd')$  such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}.$ %and let $max(D_i)$
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								\AH{What are we using $max(D_i)$ for?}
 								 denote $max_{\tup} D_i(\tup)$. For each world $D_i$ we create a corresponding variable $X_i$.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								%variables $X_{i1}$, \ldots, $X_{im}$ where $m = max(D_i)$.
-												Rewrote S2.

											
										
										
											2021-09-18 12:16:56 -04:00
+								In $\db_{\semNX}$ we assign each tuple $\tup$ the polynomial:
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								%
 								  \[
-												Rewrote S2.

											
										
										
											2021-09-18 12:16:56 -04:00
+								 \db_{\semNX}(\tup) = \sum_{i=1}^{\abs{\idb}} D_i(\tup)\cdot X_{i}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								  \]
-												Rewrote S2.

											
										
										
											2021-09-18 12:16:56 -04:00
+								The probability distribution $\pd'$ assigns all world vectors zero probability except for $\abs{\idb}$ world vectors (representing the possible worlds) $\vct{w}_i$. All elements of $\vct{w}_i$ are zero except for the position corresponding to variables $X_{i}$ which is set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, \abbrNXPDB\xplural are a complete representation system.
 								Since $\semNX$ is the free object in the variety of semirings, Birkhoff's HSP theorem implies that any assignment $\vct{X} \to \semN$, which includes as a special case the assignments $\assign_{\vct{w}}$ used here, uniquely extends to the semiring homomorphism alluded to above, $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}: \semNX \to \semN$. For a polynomial $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}$ substitutes variables based on $\vct{w}$ and then evaluates the resulting expression in $\semN$. For instance, consider the polynomial $\pxdb\inparen{\tup} = \poly = X + Y$ and assignment $\vct{w} := X = 0, Y=1$. We get $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup} = 0 + 1 = 1$. % It is trivial to show that an assignment  is a semiring homomorphism.
 								Closure under $\raPlus$ queries follows from this and from \cite{DBLP:conf/pods/GreenKT07}'s Proposition 3.5, which states that semiring homomorphisms commute with queries over $\semK$-relations.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								\OK{Removed text here. c.f., Reviewer 1: "Proof of Proposition A.3. I seems the proof should end after l.687, since you already
 								proved everything from the statement of the proposition. I don’t understand what it is
 								that you do after this line."}
 								% Now let us consider computing the expected multiplicity of a tuple $\tup$ in the result of a query $\query$ over an $\semN$-PDB $\pdb$ using the annotation of $\tup$ in the result of evaluating $\query$ over an \abbrNXPDB $\pxdb$ for which $\rmod(\pxdb) = \pdb$. The expectation of the polynomial $\poly = \query(\pxdb)(\tup)$ based on the probability distribution of $\pxdb$ over the variables in $\pxdb$ is:
 								% \AH{The wording ``...over the variables...'' I {\emph think} can be misleading since we also discuss a probability distribution $\pd$ being induced by a vector of probability assignments $\vct{p}$ to each variable $\pVar_i$.}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								% \begin{equation}
 								%   \expct_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})} = \sum_{\vct{w} \in \{0,1\}^n} \assign_{\vct{w}}(\query(\pxdb)(\tup)) \cdot \probOf(\vct{w})\label{eq:expect-q-nx}
 								% \end{equation}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								% Since \abbrNXPDB\xplural $\pxdb$ are a complete representation system for $\semN$-PDBs which are closed under $\raPlus$, computing the expectation of the  multiplicity of a tuple $t$ in the result of an $\raPlus$ query over the $\semN$-PDB $\rmod(\pxdb)$, is the same as computing the expectation of the polynomial $\query(\pxdb)(t)$.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\qed
 								\end{proof}
-												Cleaning up appendix

											
										
										
											2021-04-09 22:00:34 -04:00
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								\subsection{\tis and \bis in the \abbrNXPDB model}\label{subsec:supp-mat-ti-bi-def}
 								Two important subclasses of \abbrNXPDB\xplural that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob_\tup$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated  as independent  random events. In a \bi, we also  assign each tuple a  probability,  but  additionally partition  $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets  of $\db$ that contain at most one tuple  from each block.  Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is at most $1$.  \AH{Reviewer complaint:  This is not true by definition.}
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								The probability of such a world is the product of the probabilities of all tuples present in the world.  %and one minus the sum of the probabilities of all tuples from blocks for which no  tuple is present in the world.
-												Cleaning up appendix

											
										
										
											2021-04-09 22:00:34 -04:00
+								For bag \tis and \bis, we define the probability of a tuple to  be the probability that the tuple exists with multiplicity at least $1$.
-												clarifying an N[X]-BIDB

											
										
										
											2021-09-17 23:21:37 -04:00
+								In this work, we define \tis and \bis as subclasses of \abbrNXPDB\xplural defined over variables $\vct{X}$ (\Cref{def:semnx-pdbs}) where $\vct{X}$ can be partitioned into blocks that satisfy the conditions of a \ti or \bi (stated formally in \Cref{subsec:tidbs-and-bidbs}).
-												Cleaning up appendix

											
										
										
											2021-04-09 22:00:34 -04:00
+								In this work, we consider one further deviation from the standard: We use bag semantics for queries.
-												NX comment

											
										
										
											2021-09-17 23:30:51 -04:00
+								Even though tuples cannot occur more than once in the input \ti or \bi, they can occur with a multiplicity larger than one in the result of a query.
 								Since in \tis and \bis, there is a one-to-one correspondence between tuples in the database and variables, we can interpret a vector $\vct{w} \in \{0,1\}^n$ as denoting which tuples exist in the possible world $\assign_{\vct{w}}(\pxdb)$ (the ones where $\vct{w}[j] = 1$).
-												Changes addressing reviewer comments.

											
										
										
											2021-08-30 22:50:21 -04:00
+								For BIDBs specifically, note that at most one of the bits corresponding to tuples in each block will be set (i.e., for any pair of bits $w_j$, $w_{j'}$ that are part of the same block $b_i \supseteq \{t_{i,j}, t_{i,j'}\}$, at most one of them will be set).
-												Cleaning up appendix

											
										
										
											2021-04-09 22:00:34 -04:00
+								Denote the vector $\vct{p}$ to be a vector whose elements are the individual probabilities $\prob_i$ of each tuple $\tup_i$.  Let $\pd^{(\vct{p})}$ denote the distribution induced by $\vct{p}$.
 								%
 								\begin{align}\label{eq:tidb-expectation}
-												NX comment

											
										
										
											2021-09-17 23:30:51 -04:00
+								\expct_{\vct{W} \sim \pd^{(\vct{p})}}\pbox{\poly(\vct{W})}
-												Cleaning up appendix

											
										
										
											2021-04-09 22:00:34 -04:00
+								  = \sum\limits_{\substack{\vct{w} \in \{0, 1\}^\numvar\\ s.t. w_j,w_{j'} = 1 \rightarrow \not \exists b_i \supseteq \{t_{i,j}, t_{i',j}\}}} \poly(\vct{w})\prod_{\substack{j \in [\numvar]\\ s.t. \wElem_j = 1}}\prob_j \prod_{\substack{j \in [\numvar]\\s.t. w_j = 0}}\left(1 - \prob_i\right)
 								\end{align}
 								%
-												Minor tweaks

											
										
										
											2021-04-10 13:21:10 -04:00
+								Recall that tuple blocks in a TIDB always have size 1, so the outer summation of \cref{eq:tidb-expectation} is over the full set of vectors.
-												Moved commented out material into the appendix.

											
										
										
											2021-09-17 18:10:41 -04:00
+								\AH{Have cut and pasted the subsequent text.  Need to verify this is the appropriate place for it.}
 								Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_\numvar)$ with natural number coefficients and exponents.
-												NX comment

											
										
										
											2021-09-17 23:30:51 -04:00
+								We model incomplete relations using Green et. al.'s $\semNX$-databases~\cite{DBLP:conf/pods/GreenKT07}, discussed in detail in \Cref{subsec:supp-mat-krelations}.
-												Moved commented out material into the appendix.

											
										
										
											2021-09-17 18:10:41 -04:00
+								 $\semNX$-databases are functions from tuples to elements of $\semNX$, typically called annotations.
-												NX comment

											
										
										
											2021-09-17 23:30:51 -04:00
+								Given an $\semNX$-database $\db$,  it is common to use $\db(\tup)$ to denote the polynomial annotating tuple $\tup$ in $\db$.
 								%Note that based on this definition of $\rel$, $\rel(\tup)$ is the lineage polynomial for $\tup$.
-												Moved commented out material into the appendix.

											
										
										
											2021-09-17 18:10:41 -04:00
+								Let $\numvar$ be the number of tuples in $\pdb$.  Then, each possible world is defined by an assignment of $\numvar$ binary values $\vct{\wElem} \in \{0, 1\}^{\numvar}$ to $\vct{X}$.
 								The multiplicity of $\tup \in \db$, denoted $\db(\tup)(\vct{\wElem})$, is obtained by evaluating the polynomial annotating $\tup$ on $\vct{\wElem}$.
 								$\semNX$-relations are closed under $\raPlus$ (\Cref{fig:nxDBSemantics}).
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								We will use \abbrNXPDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$ over the assignments to $\vct{X}$.
 								We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit \abbrNXPDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{\numvar} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
 								\abbrNXPDB\xplural and a function $\rmod$ (which transforms an \abbrNXPDB to  a classical bag-\abbrPDB, or $\semN$-\abbrPDB~\cite{DBLP:conf/pods/GreenKT07,feng:2019:sigmod:uncertainty}) are both formalized in \Cref{subsec:supp-mat-background}.
-												Moved commented out material into the appendix.

											
										
										
											2021-09-17 18:10:41 -04:00
-												Cleaning up appendix

											
										
										
											2021-04-09 22:00:34 -04:00
+								\BG{Oliver's conjecture: Bag-\tis + Q can express any finite bag-PDB:
 								A well-known result for set semantics PDBs is that while not all finite PDBs can be encoded as \tis, any finite PDB can be encoded using a \ti and a query. An analog result holds in our case: any finite $\semN$-PDB can be encoded as a bag \ti and a query (WHAT CLASS? ADD PROOF)
 								}
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												Minor tweaks

											
										
										
											2021-04-10 13:21:10 -04:00
+								\subsection{Proof of~\Cref{prop:expection-of-polynom}}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\label{subsec:expectation-of-polynom-proof}
 								\begin{proof}
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								We need to prove for $\semN$-PDB $\pdb = (\idb,\pd)$ and \abbrNXPDB $\pxdb = (\db',\pd')$ where $\rmod(\pxdb) = \pdb$ that $\expct_{\randDB\sim \pd}[\query(\db)(t)] = \expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}$
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								By expanding $\polyForTuple$ and the expectation we have:
 								\begin{align*}
 								\expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								& = \sum_{\vct{w} \in \{0,1\}^n}\probOf(\vct{w}) \cdot Q(\pxdb)(t)(\vct{w})\\
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\intertext{From $\rmod(\pxdb) = \pdb$, we have that the range of $\assign_{\vct{w}(\pxdb)}$ is $\idb$, so}
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								& = \sum_{\db \in \idb}\;\;\sum_{\vct{w} \in \{0,1\}^n : \assign_{\vct{w}}(\pxdb) = \db}\probOf(\vct{w}) \cdot Q(\pxdb)(t)(\vct{w})\\
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\intertext{In the inner sum, $\assign_{\vct{w}}(\pxdb) = \db$, so by distributivity of $+$ over $\times$}
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								& = \sum_{\db \in \idb}\query(\db)(t)\sum_{\vct{w} \in \{0,1\}^n : \assign_{\vct{w}}(\pxdb) = \db}\probOf(\vct{w})\\
 								\intertext{From the definition of $\pd$ in \cref{def:semnx-pdbs}, given $\rmod(\pxdb) = \pdb$, we get}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								& = \sum_{\db \in \idb}\query(\db)(t) \cdot \probOf(D) \quad = \expct_{\db \sim \pd}[\query(\db)(t)]
 								\end{align*}
 								\qed
 								\end{proof}
-												Cleaning up appendix

											
										
										
											2021-04-09 22:00:34 -04:00
+								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												Minor tweaks

											
										
										
											2021-04-10 13:21:10 -04:00
+								\subsection{~\Cref{lem:pre-poly-rpoly}}\label{app:subsec-pre-poly-rpoly}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\begin{Lemma}\label{lem:pre-poly-rpoly}
 								If
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								$\poly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} = \{d_1,\ldots, d_\numvar\}\in \domN^\numvar}c_{\vct{d}} \cdot \prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i^{d_i}$
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								then
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								$\rpoly(X_1,\ldots, X_\numvar) = \sum\limits_{\vct{d} = \{d_1,\ldots, d_\numvar\}\in \semN^\numvar} c_{\vct{d}}\cdot\prod\limits_{\substack{i = 1\\s.t. d_i\geq 1}}^{\numvar}X_i$% \;\;\;  for some $\eta \subseteq \{0,\ldots, B\}^\numvar$
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\end{Lemma}
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												Minor tweaks

											
										
										
											2021-04-10 13:21:10 -04:00
+								\begin{proof}%[Proof for~\Cref{lem:pre-poly-rpoly}]
-												NX comment

											
										
										
											2021-09-17 23:30:51 -04:00
+								Follows by the construction of $\rpoly$ in \cref{def:reduced-bi-poly}.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\qed
 								\end{proof}
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												Moved commented out material into the appendix.

											
										
										
											2021-09-17 18:10:41 -04:00
-												Fixed ~\ref in appendix.

											
										
										
											2021-04-10 16:18:04 -04:00
+								\subsection{Proposition~\ref{proposition:q-qtilde}}\label{app:subsec-prop-q-qtilde}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\noindent Note the following fact:
 								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								\begin{Proposition}\label{proposition:q-qtilde} For any \bi-lineage polynomial $\poly(X_1, \ldots, X_\numvar)$ and all $\vct{w}$ such that $\probOf[\vct{W} = \vct{w}] > 0$, it holds that
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								$%  \[
 								    \poly(\vct{w}) = \rpoly(\vct{w}).
 								$%    \]
 								\end{Proposition}
-												Minor tweaks

											
										
										
											2021-04-10 13:21:10 -04:00
+								\begin{proof}%[Proof for~\Cref{proposition:q-qtilde}]
-												NX comment

											
										
										
											2021-09-17 23:30:51 -04:00
+								Note that any $\poly$ in factorized form is equivalent to its \abbrSMB expansion.  For each term in the expanded form, further note that for all $b \in \{0, 1\}$ and all $e \geq 1$, $b^e = b$.
 								Finally, note that there are exactly three cases where the expectation of a monomial term $\expct\left[c_{\vct{d}}\prod_{i = n\; s.t.\; \vct{d}_i \geq 1}X_i\right]$ is zero:
 								(i) when $c_{\vct{d}} = 0$,
 								(ii) when $p_i = 0$ for some $i$ where $\vct{d}_i \geq 1$, and
-												a few fixes

											
										
										
											2021-09-17 22:17:30 -04:00
+								(iii) when $X_i$ and $X_j$ are in the same block for some $i,j$ where $\vct{d}_i, \vct{d}_j \geq 1$.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\qed
 								\end{proof}
-												More changes based on @atri's 072021 suggestions.

											
										
										
											2021-07-27 12:23:06 -04:00
+								\subsection{Proof for Lemma~\ref{lem:exp-poly-rpoly}}\label{subsec:proof-exp-poly-rpoly}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\begin{proof}
 								Let $\poly$ be the generalized polynomial, i.e., the polynomial of $\numvar$ variables with highest degree $= B$: %, in which every possible monomial permutation appears,
-												More changes based on @atri's 072021 suggestions.

											
										
										
											2021-07-27 12:23:06 -04:00
+								\[\poly(X_1,\ldots, X_\numvar) = \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar X_i^{d_i}.\]
-												Fixed Lemma 2.8 proof.

											
										
										
											2021-08-31 15:06:12 -04:00
+								Let the boolean function $\isInd{\cdot}$ take $\vct{d}$ as input and return true if there does not exist any dependent variables in $\vct{d}$, i.e., $\not\exists ~\block, i\neq j\suchthat d_{\block, i}, d_{\block, j} \geq 1$.\footnote{This \abbrBIDB notation is used and discussed in \cref{subsec:tidbs-and-bidbs}}.
 								Then in expectation we have
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\begin{align}
-												Fixed Lemma 2.8 proof.

											
										
										
											2021-08-31 15:06:12 -04:00
+								\expct_{\vct{\randWorld}}\pbox{\poly(\vct{\randWorld})} &= \expct_{\vct{\randWorld}}\pbox{\sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i} + \sum_{\substack{\vct{d} \in \{0,\ldots, B\}^\numvar\\\wedge ~\neg\isInd{\vct{d}}}} c_{\vct{d}}\cdot\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar\randWorld_i^{d_i}}\label{p1-s1a}\\
 								&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \expct_{\vct{\randWorld}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i}} + \sum_{\substack{\vct{d} \in \{0,\ldots, B\}^\numvar\\\wedge ~\neg\isInd{\vct{d}}}} c_{\vct{d}}\cdot\expct_{\vct{\randWorld}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar\randWorld_i^{d_i}}\label{p1-s1b}\\
 								&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\~\wedge\isInd{\vct{d}}}}c_{\vct{d}}\cdot \expct_{\vct{\randWorld}}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \randWorld_i^{d_i}}\label{p1-s1c}\\
 								&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{\randWorld}}\pbox{\randWorld_i^{d_i}}\label{p1-s2}\\
 								&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\vct{\randWorld}}\pbox{\randWorld_i}\label{p1-s3}\\
 								&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \prob_i\label{p1-s4}\\
-												Changes addressing reviewer comments.

											
										
										
											2021-08-30 22:50:21 -04:00
+								&= \rpoly(\prob_1,\ldots, \prob_\numvar).\label{p1-s5}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\end{align}
-												Small changes and notes.

											
										
										
											2021-09-01 11:27:11 -04:00
+								\Cref{p1-s1a} is the result of substituting in the definition of $\poly$ given above.  Then we arrive at \cref{p1-s1b} by linearity of expectation.  Next, \cref{p1-s1c} is the result of the independence constraint of \abbrBIDB\xplural, specifically that any monomial composed of dependent variables, i.e., variables from the same block $\block$, has a probability of $0$.  \Cref{p1-s2} is obtained by the fact that all variables in each monomial are independent, which allows for the expectation to be pushed through the product.  In \cref{p1-s3}, since $\randWorld_i \in \{0, 1\}$ it is the case that for any exponent $e \geq 1$, $\randWorld_i^e = \randWorld_i$.  Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
-												Small changes and notes.

											
										
										
											2021-09-01 11:27:11 -04:00
+								Finally, it can be verified that \Cref{p1-s5} follows since \cref{p1-s4} satisfies the construction of \Cref{lem:pre-poly-rpoly}, i.e. $\rpoly(\prob_1,\ldots, \prob_\numvar)$ is exactly the product of probabilities of each variable in each monomial and its corresponding coefficient, across the entire sum.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\qed
 								\end{proof}
-												Fixed ~\ref in appendix.

											
										
										
											2021-04-10 16:18:04 -04:00
+								\subsection{Proof For Corollary~\ref{cor:expct-sop}}
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\begin{proof}
-												More changes to notation, etc.

											
										
										
											2021-06-11 11:22:58 -04:00
+								Note that \cref{lem:exp-poly-rpoly} shows that $\expct\pbox{\poly} =$ $\rpoly(\prob_1,\ldots, \prob_\numvar)$.  Therefore, if $\poly$ is already in \abbrSMB form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $\bigO{\size\inparen{\smbOf{\poly}}}$ computations.
-												Restructured file system for appendix.

											
										
										
											2021-04-06 11:43:34 -04:00
+								\qed
-												NX comment

											
										
										
											2021-09-17 23:30:51 -04:00
+								\end{proof}
 								%%% Local Variables:
 								%%% mode: latex
 								%%% TeX-master: "main"
 								%%% End: