Made a pass on S2.

master
Aaron Huber 2020-12-16 12:38:21 -05:00
parent b4d0814133
commit ac684a8d47
2 changed files with 37 additions and 27 deletions

View File

@ -5,7 +5,7 @@
Since we have shown that computing the expected multiplicity of a query result tuple is equivalent to computing the expectation of a polynomial (for that tuple) given a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$, we focus on this problem exclusively from now on.
We now introduce some basic terminology for polynomials and then develop a reduced normal form for polynomials that preserves a polynomial expectation for probability distributions that stem from \bis or \tis.
Let us use the expression $(x + y)^2$ as a running example in this section.
Let us use the expression $(X + Y)^2$ as a running example in this section.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \begin{Definition}[Monomial]\label{def:monomial}
@ -28,7 +28,7 @@ where each $c_i$ is a positive integer and each $m_i$ is a monomial and $m_i \ne
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The \abbrSMB for the running example is $x^2 +2xy + y^2$. While $x^2 + xy + xy + y^2$ is an expanded form of the expression, it is not the standard monomial basis since $xy$ appears more than once.
The \abbrSMB for the running example is $X^2 +2XY + Y^2$. While $X^2 + XY + XY + Y^2$ is an expanded form of the expression, it is not the standard monomial basis since $XY$ appears more than once.
\BG{Maybe inline degree?}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -47,9 +47,10 @@ The degree of the running example polynomial is $2$. In this paper we consider o
% \end{Assumption}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We call a polynomial $\query(\vct{X})$ a \emph{\bi-lineage polynomial} (\emph{\ti-lineage polynomial}), if there exists an n-ary $\raPlus$ query $\query$, \bi $\pxdb$ (\ti $\pxdb$), and n-ary tuple $\tup$ such that $\query(\vct{X}) = \query(\pxdb)(\tup)$. % Before proceeding, note that the following is assume that polynomials are \bis (which subsume \tis as a special case).
We call a polynomial $\query(\vct{X})$ a \emph{\bi-lineage polynomial} (\emph{\ti-lineage polynomial}), if
\AH{Why is it required for the tuple to be n-ary? I think this slightly confuses me since we have n tuples.} there exists an n-ary $\raPlus$ query $\query$, \bi $\pxdb$ (\ti $\pxdb$), and n-ary tuple $\tup$ such that $\query(\vct{X}) = \query(\pxdb)(\tup)$. % Before proceeding, note that the following is assume that polynomials are \bis (which subsume \tis as a special case).
Note the \tis are a special case of \bis and, thus, the following applies to \tis as well.
Recall that in a \bi $\pxdb$ with tuples $t_1, \ldots, t_n$, each input tuple $t_i$ is annotated with a unique variable $X_i$. The tuples of $\pxdb$ are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ and each tuple $t_i$ is associated with a probability $\vct{p}(\tup_i) = \pd[X_i = 1]$. Together with the assumption that blocks are assumed to be independent and tuples from the same block are disjoint events, $\vct{p}$ and the blocks induce a the probability distribution $\pd$ of $\pxdb$.
Recall that in a \bi $\pxdb$ with tuples $t_1, \ldots, t_n$, each input tuple $t_i$ is annotated with a unique variable $X_i$. The tuples of $\pxdb$ are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ and each tuple $t_i$ is associated with a probability $\prob(\tup_i) = \pd[X_i = 1]$. Together with the assumption that blocks are assumed to be independent and tuples from the same block are disjoint events, $\prob$ and the blocks induce a the probability distribution $\pd$ of $\pxdb$.
We will write a \bi-lineage polynomial $\poly(\vct{X})$ for a \bi with $\ell$ blocks as
$\poly(\vct{X})$ = $\poly(X_{\block_1, 1},\ldots, X_{\block_1, \abs{\block_1}},$ $\ldots, X_{\block_\ell, \abs{\block_\ell}})$, where $\abs{\block_i}$ denotes the size of $\block_i$, and $\block_{i, j}$ denotes tuple $j$ residing in block $i$ for $j$ in $[\abs{\block_i}]$.
% and the probability distribution of $\pxdb$ is uniquely determined based on a probability vector $\vct{p}$ that associates each tuple a probability
@ -87,11 +88,11 @@ Alternatively, one can think of $\rpoly$ as the \abbrSMB of $\poly(\vct{X})$ whe
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Example}\label{example:qtilde}
Consider $\poly(x, y) = (x + y)(x + y)$ where $x$ and $y$ are from different blocks. Then the expanded derivation for $\rpoly(x, y)$ is
Consider $\poly(X, Y) = (X + Y)(X + Y)$ where $X$ and $Y$ are from different blocks. Then the expanded derivation for $\rpoly(X, Y)$ is
\begin{align*}
(&x^2 + 2xy + y^2 \mod x^2 - x) \mod y^2 - y\\
= ~&x + 2xy + y^2 \mod y^2 - y\\
= ~& x + 2xy + y
(&X^2 + 2XY + Y^2 \mod X^2 - X) \mod Y^2 - Y\\
= ~&X + 2XY + Y^2 \mod Y^2 - Y\\
= ~& X + 2XY + Y
\end{align*}
\end{Example}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -140,7 +141,9 @@ Let $\pxdb$ be a \bi over variables $\vct{X} = \{X_1, \ldots, X_\numvar\}$ and w
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note that in the preceding lemma, we have assigned $\vct{p}$ (introduced in \Cref{subsec:def-data}) to the variables $\vct{X}$. Intuitively, \Cref{lem:exp-poly-rpoly} states that when we replace each variable $X_i$ with its probability $\prob_i$ in the reduced form a \bi-lineage polynomial and evaluate the resulting expression in $\mathbb{R}$, then the result is the expectation of the polynomial.
Note that in the preceding lemma, we have assigned $\vct{p}$
%(introduced in \Cref{subsec:def-data})
to the variables $\vct{X}$. Intuitively, \Cref{lem:exp-poly-rpoly} states that when we replace each variable $X_i$ with its probability $\prob_i$ in the reduced form of a \bi-lineage polynomial and evaluate the resulting expression in $\mathbb{R}$, then the result is the expectation of the polynomial.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof for Lemma ~\ref{lem:exp-poly-rpoly}]
@ -180,9 +183,10 @@ Finally, observe \Cref{p1-s5} by construction in \Cref{lem:pre-poly-rpoly}, that
\begin{Corollary}\label{cor:expct-sop}
If $\poly$ is a \bi-lineage polynomial, then the expectation of $\poly$, i.e., $\expct\pbox{\poly} = \rpoly\left(\prob_1,\ldots, \prob_\numvar\right)$ can be computed in $O(|\smbOf{\poly}|)$, where $|\poly|$ denotes the total number of multiplication/addition operators in $\poly$.
\end{Corollary}
\AH{What if $\poly$ is not in \abbrSMB form?}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof For Corollary ~\ref{cor:expct-sop}]
Note that \cref{lem:exp-poly-rpoly} shows that $\expct\pbox{\poly} =$ $\rpoly(\prob_1,\ldots, \prob_\numvar)$. Therefore, if $\poly$ is already in \abbrSMB form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $O(\smb{|\poly|})$ computations.\qed
Note that \cref{lem:exp-poly-rpoly} shows that $\expct\pbox{\poly} =$ $\rpoly(\prob_1,\ldots, \prob_\numvar)$. Therefore, if $\poly$ is already in \abbrSMB form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which indeed has $O(\smbOf{|\poly|})$ computations.\qed
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

View File

@ -66,11 +66,11 @@ Consider now the semiring $(\semNX, +, \cdot, 0, 1)$ whose domain is $\semNX$ an
As mentioned above we will use $\semNX$-databases paired with a probability distribution as a representation system.
We refer to such databases as $\semNX$-PDBs and use bold symbols to distinguish them from possible worlds (which are $\semN$-databases).
Formally, a $\semNX$-PDB is a $\semNX$-database $\db$ and a probability distribution $\pd$ over assignments $\assign$ of the variables $\vct{X} = \{X_1, \ldots, X_n\}$ occurring in annotations of $\db$ to $\{0,1\}$. Note that an assignment $\assign: \vct{X} \to \{0,1\}$ can be represented as a vector $\vct{w} \in \{0,1\}^n$ where $\vct{w}[i]$ records the value assigned to variable $X_i$. Thus, from now on we will solely use such vectors which we refer to as \emph{world vectors} and implicitly understand them to represent assignments. Given an assignment $\assign$ we use $\assign(\pxdb)$ to denote the semiring homomorphism $\semNX \to \semN$ that applies the assignment $\assign$ to all variables of a polynomial and evaluates the resulting expression in $\semN$.\BG{explain connection to homomorphism lifting in K-relations}
Formally, an $\semNX$-PDB is an $\semNX$-database $\db$ and a probability distribution $\pd$ over assignments $\assign$ of the variables $\vct{X} = \{X_1, \ldots, X_n\}$ occurring in annotations of $\db$ to $\{0,1\}$. Note that an assignment $\assign: \vct{X} \to \{0,1\}$ can be represented as a vector $\vct{w} \in \{0,1\}^n$ where $\vct{w}[i]$ records the value assigned to variable $X_i$. Thus, from now on we will solely use such vectors which we refer to as \emph{world vectors} and implicitly understand them to represent assignments. Given an assignment $\assign$ we use $\assign(\pxdb)$ to denote the semiring homomorphism $\semNX \to \semN$ that applies the assignment $\assign$ to all variables of a polynomial and evaluates the resulting expression in $\semN$.\BG{explain connection to homomorphism lifting in K-relations}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[$\semNX$-PDBs]\label{def:semnx-pdbs}
A $\semNX$-PDB $\pxdb$ over variables $\vct{X} = \{X_1, \ldots, X_n\}$ is a tuple $(\db,\pd)$ where $\db$ is an $\semNX$-database and $\pd$ is a probability distribution over $\vct{w} \in \{0,1\}^n$. We use $\assign_{\vct{w}}$ to denote the assignment corresponding to $\vct{w} \in \{0,1\}^n$. The $\semN$-PDB $\rmod(\pxdb) = (\idb, \pd')$ encoded by $\pxdb$ is defined as:
An $\semNX$-PDB $\pxdb$ over variables $\vct{X} = \{X_1, \ldots, X_n\}$ is a tuple $(\db,\pd)$ where $\db$ is an $\semNX$-database and $\pd$ is a probability distribution over $\vct{w} \in \{0,1\}^n$. We use $\assign_{\vct{w}}$ to denote the assignment corresponding to $\vct{w} \in \{0,1\}^n$. The $\semN$-PDB $\rmod(\pxdb) = (\idb, \pd')$ encoded by $\pxdb$ is defined as:
\begin{align*}
\idb & = \{ \assign_{\vct{w}}(\pxdb) \mid \vct{w} \in \{0,1\}^n \} \\
\pd'(\db) & = \sum_{\vct{w} \in \{0,1\}^n: \assign_{\vct{w}}(\pxdb) = \db} \pd(\vct{w})
@ -84,35 +84,38 @@ For instance, consider a $\pxdb$ consisting of a single tuple $\tup_1 = (1)$ ann
D_{[0,1]}(\tup_1) = 1 \hspace{0.3cm} \mathbf{and} \hspace{0.3cm} D_{[1,1]}(\tup_1) = 2
\]
%
Importantly, as the following proposition shows, any finite $\semN$-PDB can be encoded as a $\semNX$-PDBs and $\semNX$-PDBs are closed under positive relational algebra queries, the class of queries we are interested in in this work.
Importantly, as the following proposition shows, any finite $\semN$-PDB can be encoded as an $\semNX$-PDB and $\semNX$-PDBs are closed under positive relational algebra queries, the class of queries we are interested in in this work.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Proposition}\label{prop:semnx-pdbs-are-a-}
$\semNX$-PDBs are a complete representation system for $\semN$-PDBs that is closed under $\raPlus$ queries.
\end{Proposition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\AH{I made small changes to the proof, noteably the summation, the variable definition and the world subscript, the latter of which I am not sure if it is the best notation or not.}
\begin{proof}
To prove that $\semNX$-PDBs are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces a $\semNX$-PDB $\pxdb = (\db, \pd')$ such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_n\}$ and let $max(D_i)$ denote $max_{\tup} D_i(\tup)$. For each world $D_i$ we create variables $X_{i1}$, \ldots, $X_{im}$ where $m = max(D_i)$. In $\db$ we assign each tuple $\tup$ the polynomial:
To prove that $\semNX$-PDBs are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an $\semNX$-PDB $\pxdb = (\db, \pd')$ such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}$ and let $max(D_i)$ denote $max_{\tup} D_i(\tup)$. For each world $D_i$ we create a corresponding variable $X_i$.
%variables $X_{i1}$, \ldots, $X_{im}$ where $m = max(D_i)$.
In $\db$ we assign each tuple $\tup$ the polynomial:
%
\[
\db(\tup) = \sum_{i=1}^{n} \sum_{j=1}^{D_i(\tup)} X_{ij}
\db(\tup) = \sum_{i=1}^{\abs{\idb}} D_i(\tup)\cdot X_{i}
\]
The probability distribution $\pd'$ assigns all world vectors zero probability except for $n$ world vectors (representing the possible worlds) $\vct{w_i}$. All elements of $\vct{w_i}$ are zero except for the positions corresponding to variables $X_{ij}$ for $j \in \{1, \ldots \}$ which are set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, $\semNX$ are a complete representation system.
The probability distribution $\pd'$ assigns all world vectors zero probability except for $\abs{\idb}$ world vectors (representing the possible worlds) $\vct{w_i}$. All elements of $\vct{w_i}$ are zero except for the positions corresponding to variables $X_{ij}$ for $j \in \{1, \ldots \}$ which are set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, $\semNX$ are a complete representation system.
The closure under $\raPlus$ queries follows from the fact that an assignment $\vct{X} \to \{0,1\}$ is a semiring homomorphism and that semiring homomorphisms commute with queries over $\semK$-relations.
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Now let us consider computing the expected multiplicity of a tuple $\tup$ in the result of a query $\query$ over a $\semN$-PDB $\pdb$ using the annotation of $\tup$ in the result of evaluating $\query$ over an $\semNX$-PDB $\pxdb$ for which $\rmod{\pxdb} = \pdb$. The expectation of the polynomial $\poly = \query(\pxdb)(\tup)$ based on the probability distribution of $\pxdb$ over the variables in $\pxdb$ is:
Now let us consider computing the expected multiplicity of a tuple $\tup$ in the result of a query $\query$ over an $\semN$-PDB $\pdb$ using the annotation of $\tup$ in the result of evaluating $\query$ over an $\semNX$-PDB $\pxdb$ for which $\rmod(\pxdb) = \pdb$. The expectation of the polynomial $\poly = \query(\pxdb)(\tup)$ based on the probability distribution of $\pxdb$ over the variables in $\pxdb$ is:
\[
\expct_{\vct{X} \sim \pd}\pbox{\poly(\vct{X})} = \sum_{\vct{w} \in \{0,1\}^n} \query(\assign_{\vct{w}}(\pxdb))(\tup) \cdot \pd(\vct{w})
\]
Since $\semNX$-PDBs $\pxdb$ are a complete representation system closed under $\raPlus$, computing the expectation of the multiplicity of a tuple $t$ in the result a $\raPlus$ query over the $\semN$-PDB $\rmod(\pxdb)$, is the same as computing the expectation of the polynomial $\query(\pxdb)(t)$.
Since $\semNX$-PDBs $\pxdb$ are a complete representation system for $\semN$-PDBs which are closed under $\raPlus$, computing the expectation of the multiplicity of a tuple $t$ in the result of an $\raPlus$ query over the $\semN$-PDB $\rmod(\pxdb)$, is the same as computing the expectation of the polynomial $\query(\pxdb)(t)$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Proposition}[Expectation of polynomials]\label{prop:expection-of-polynom}
Given a $\semN$-PDB $\pdb = (\idb,\pd)$ and $\semNX$-PDB $\pxdb = (\db,\pd')$ such that $\rmod(\pxdb) = \pdb$, we have:
Given an $\semN$-PDB $\pdb = (\idb,\pd)$ and $\semNX$-PDB $\pxdb = (\db,\pd')$ such that $\rmod(\pxdb) = \pdb$, we have:
\[ \expct_{\idb \sim \pd}[\query(\db)(t)] = \expct_{\vct{X} \sim \pd'}\pbox{\poly(\vct{X})} \]
\end{Proposition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -121,18 +124,20 @@ Since $\semNX$-PDBs $\pxdb$ are a complete representation system closed under $\
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Two important subclasses of $\semNX$-PDBs that are of interested to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\vct{p}(\tup)$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. The probability of such a world is the product of the probabilities of all tuples present in the world and one minus the sum of the probabilities of all tuples from blocks for which no tuple is present in the world. For bag \tis and \bis, we define the probability of a tuple to be the probability that the tuple exists with multiplicity at least $1$.
Two important subclasses of $\semNX$-PDBs that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob(\tup)$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is $1$. The probability of such a world is the product of the probabilities of all tuples present in the world. %and one minus the sum of the probabilities of all tuples from blocks for which no tuple is present in the world.
For bag \tis and \bis, we define the probability of a tuple to be the probability that the tuple exists with multiplicity at least $1$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[\tis and \bis]\label{def:tidbs-and-bidbs}
A \emph{\ti} $\pxdb = (\db, \pd)$ is a $\semNX$-PDB such that (i) every tuple is annotated with either $0$ or a unique variable $X_i$ and (ii) the probability distribution $\pd$ is such that all variables are independent.
A \emph{\ti} $\pxdb = (\db, \pd)$ is an $\semNX$-PDB such that (i) every tuple is annotated with either $0$ or a unique variable $X_i$ and (ii) the probability distribution $\pd$ is such that all variables are independent.
A \emph{\bi} $\pxdb = (\db, \pd)$ is a $\semNX$-PDB such that (i) every tuple is annotated with either $0$ or a unique variable $X_i$ and (ii) that the tuples $\tup$ of $\pxdb$ for which $\pxdb(\tup) \neq 0$ can be partitioned into a set of blocks such that variables from separate blocks are independent of each other and variables from the same blocks are disjoint events.
A \emph{\bi} $\pxdb = (\db, \pd)$ is an $\semNX$-PDB such that (i) every tuple is annotated with either $0$ or a unique variable $X_i$ and (ii) that the tuples $\tup$ of $\pxdb$ for which $\pxdb(\tup) \neq 0$ can be partitioned into a set of blocks such that variables from separate blocks are independent of each other and variables from the same blocks are disjoint events.
\BG{Should this be written in math?}
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note that the main difference to the standard definitions of \tis and \bis is that we define them as subclasses of $\semNX$-PDBs and that we use bag semantics. Even though tuples cannot occur more than once in the input \ti or \bi, they can occur with a multiplicity larger than one in the result of a query. Since in \tis and \bis, there is a one-to-one correspondence between tuples in the database and variables, we can interpret a vector $\vct{w} \in \{0,1\}^n$ as denoting which tuples exist in the possible world $\assign_{\vct{w}}(\pxdb)$ (the ones where $\vct{w}[i] = 1$). Denote the vector $\vct{p}$ to be a vector whose elements are the individual probabilities $\prob_i$ of each tuple $\tup_i$. Let $\pd^{(\vct{p})}$ denote the distribution induced by $\vct{p}$.
%
\begin{align}\label{eq:tidb-expectation}
\expct_{\vct{X} \sim \pd^{(\vct{p})}}\pbox{\poly(\vct{X})} = \sum\limits_{\vct{w} \in \{0, 1\}^\numvar} \poly(\vct{w})\prod_{\substack{i \in [\numvar]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numvar]\\s.t. w_i = 0}}\left(1 - \prob_i\right).
@ -149,7 +154,7 @@ A well-known result for set semantics PDBs is that while not all finite PDBs can
\subsection{Expression Trees}\label{sec:expression-trees}
In the following we will make use of expression trees to encode polynomials which we define formally in this subsection.
For illustrative purposes consider the polynomial $\poly(\vct{X}) = 2x^2 + 3xy - 2y^2$ over $\vct{X} = (x,y)$.
For illustrative purposes consider the polynomial $\poly(\vct{X}) = 2X_1^2 + 3X_1X_2 - 2X_2^2$ over $\vct{X} = [X_1, X_2]$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Expression Tree]\label{def:express-tree}
@ -159,11 +164,11 @@ tree, whose internal nodes are from the set $\{+, \times\}$, with leaf nodes bei
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We ignore the remaining fields (\vari{partial} and \vari{weight}) for now. Their purpose will become clear in~\Cref{sec:approximation-algo}. Note that $\etree$ need not encode an expression in standard monomial basis. For instance, $\etree$ could represent a compressed form of the running example, such as $(x + 2y)(2x - y)$.
We ignore the remaining fields (\vari{partial} and \vari{weight}) for now. Their purpose will become clear in~\Cref{sec:algo}. Note that $\etree$ need not encode an expression in standard monomial basis. For instance, $\etree$ could represent a compressed form of the running example, such as $(X_1 + 2X_2)(2X_1 - X_2)$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[poly$(\cdot)$]\label{def:poly-func}
Denote $poly(\etree)$ to be the function that takes as input expression tree $\etree$ and outputs its corresponding polynomial. $poly(\cdot)$ is recursively defined on $\etree$ as follows, where $\etree_\lchild$ and $\etree_\rchild$ denote the left and right child of $\etree$ respectively.
Denote $poly(\etree)$ to be the function that takes as input expression tree $\etree$ and outputs its corresponding polynomial in \abbrSMB. $poly(\cdot)$ is recursively defined on $\etree$ as follows, where $\etree_\lchild$ and $\etree_\rchild$ denote the left and right child of $\etree$ respectively.
% \begin{align*}
% &\etree.\type = +\mapsto&& \polyf(\etree_\lchild) + \polyf(\etree_\rchild)\\
@ -190,7 +195,7 @@ Note that addition and multiplication above follow the standard interpretation o
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
For our running example, $\etreeset{\smb} = \{2x^2 + 3xy - 2y^2, (x + 2y)(2x - y), x(2x - y) + 2y(2x - y), 2x(x + 2y) - y(x + 2y)\}$. Note that \cref{def:express-tree-set} implies that $\etree \in \etreeset{poly(\etree)}$.
For our running example, $\etreeset{\smb} = \{2X_1^2 + 3X_1X_2 - 2X_2^2, (X_1 + 2X_2)(2X_1 - X_2), X_1(2X_1 - X_2) + 2X_2(2X_1 - X_2), 2X_1(X_1 + 2X_2) - X_2(X_1 + 2X_2)\}$. Note that \cref{def:express-tree-set} implies that $\etree \in \etreeset{poly(\etree)}$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Problem Definition}\label{sec:problem-definition}
@ -201,8 +206,9 @@ We are now ready to formally state the main problem addressed in this work.
\begin{Definition}[The Expected Result Multiplicity Problem]\label{def:the-expected-multipl}
Let $\vct{X} = (X_1, \ldots, X_n)$, and $\pdb$ be an $\semNX$-PDB over $\vct{X}$ with probability distribution $\pd$ over assignments $\vct{X} \to [0,1]$, $\query$ an n-ary query, and $t$ an n-ary tuple.
The \expectProblem is defined as follows:
\AH{I think we mean $\poly(\vct{X}) = \query(\pxdb)(t)$ instead of $\poly(\vct{X}) = \query(\pdb)(t)$. I changed the following to reflect this.}
\begin{itemize}
\item \textbf{Input}: Given an expression tree $\etree \in \etreeset{\smb}$ for $\poly(\vct{X}) = \query(\pdb)(t)$
\item \textbf{Input}: Given an expression tree $\etree \in \etreeset{\smb}$ for $\poly(\vct{X}) = \query(\pxdb)(t)$
\item \textbf{Output}: $\expct_{\vct{X} \sim \pd}[\poly(\vct{X})]$
\end{itemize}
\end{Definition}