%\AH{This section will involve the set of queries (RA+) that we are interested in, the probabilistic/incomplete models we address, and the outer aggregate functions we perform over the output \textit{annotation}
An incomplete database $\idb$ is a set of deterministic databases $\db_i$ where each element is known as a possible world. Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db_i \in\idb$ has the same named set of relations, $\{\rel_1,\ldots, \rel_n\}$ (albeit not equivalent across all instances), whose schemas $(\sch(\rel_i))$ are unchanging across each $\db_j$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ is the set of possible worlds and $\pd$ is the probability distribution over $\wSet$.
%Below may possibly need to be used again...we'll see.
%probability space $\left(\Omega, \mathcal{A}, P\right)$ over that set. \AR{I'm not sure why you are using the notation $\mathcal{A}$ and $P$, which you do not seem to use beyond this section. I would recommend that you only introduce a notation if you plan to use them later on.} Since the set of possible outcomes is the set of possible worlds, $\wSet$, and the set of outcomes is equivalent to the set of events, we will simplify notation and use $\left(\wSet, P\right)$ to denote the probability space of $\idb$. \AR{If you want to use $(\wSet,P)$ make sure you use the same notation in Sec 1.3 as well. If not, then use the notation from Sec 1.3 here}
Further define $\idb$ as an $\mathbb{N}[\vct{X}]$ database, i.e., an incomplete/probabilistic database model where each tuple $\tup\in\idb$ is annotated with a polynomial over variables $X_1,\ldots, X_M$ for some value of $M$ that will be specified later. Intuitively, one can think of $\idb$ as a parameterized database, whose abstract form maps to each deterministic $\db_i \in\idb$.
Since $\idb$ is a database that maps tuples to polynomials, it is customary for arbitrary table $\rel$ to be viewed as a function $\rel: \tup\in\idb\mapsto\mathbb{N}[\vct{X}]$, where $\rel(\tup)$ denotes the polynomial mapped to tuple $\tup$.
It has been shown in previous work that commutative semirings precisely model translations of RA+ query operations to set annotations. Since $\idb$ is an $\mathbb{N}[\vct{X}]$ database, we are then working with the commutative semiring $\{\mathbb{N}[\vct{X}], +, \times, 0, 1\}$, where $\mathbb{N}[\vct{X}]$ is the set from which all annotations originate.
Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is better modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.
\AR{This is how this subsection should be structured. First you should connect the variables $X_1,\dots.X_m$ to $W$. Basically say that a vector in $\{0,1\}^M$ (so we only assign binary values to the $M$ variables) corresponds to a {\em potential} world $\vct{w}$ (for TIDB $N=M$ and there is a one to one correspondence between $W$ and $\{0,1\}^M$ but for say BI not every vector in $\{0,1\}^M$ would correspond to a world-- some of them would not correspond to any world. Then a probability distribution over $\{0,1\}^M$ implies a distribution over $W$, which is how you connect back to the $P$ from Section 1.1. More specific comments follow.}
Define $\pd$ to be the probability distribution for $\idb$. \AR{You should connect $\pd$ back to the $P$ from Section 1.1} Let $\vct{w}$ be a $\left\lceil\log_2\left(\left|\wSet\right|\right)\right\rceil=\numTup$ binary bit vector, uniquely identifying possible world $\db_i \in\idb$. \AR{The correspondence between $W$ and $\{0,1\}^N$ belongs to Sec 1.1} Let $\prob(X_i)$$\left(\prob(\vct{X})\right)$ denote the probability that a given variable (set of variables) occur(s). \AR{This sentence has many issues: (1) the variables $X_1,\dots,X_M$ are just there-- it does not make sense to say if they ``occur"; (2) The probability should have $\pd$ explicitly in it and (3) $p(\cdot)$ conflicts with the $p$ that we will use in TIDB.
Here is my suggestion to fix this. First we need a notation for a {\em random} world. We are already using $\vct{w}$ to denote a {\em specific} world. So for now let's say we use $\overline{\vct{w}}$ to denote the random variable. Then to denote the probability that the randomly chosen $\overline{\vct{w}}$ is $\vct{w}$ use the notation $\text{Pr}_{\overline{\vct{w}}\sim\pd}[\overline{\vct{w}}=\vct{w}]$. I would like to stress that $\overline{\vct{w}}$ is just a suggestion-- there is probably a better notation for the random variable. {\bf Propagate} this notation change.} We can substitute $\wVec$ for $\vct{X}$ where the $i^{th}$ bit of $\wVec$ is bound to it's corresponding $X_i$ variable, and it follows that $\prob(\wVec)$ denotes the probability that a given world occurs.
A specific probabilistic data model is the Tuple Independent Database (\ti). This is a database model in which each table is a set of tuples, each of which are independent of one another, and individually occur with a specific probability, $\prob_\tup$.
There are features of $\ti$ that we can exploit. Note that a $\ti$ with $\numTup$ tuples naturally has $2^\numTup$ possible worlds, each of which can be conveniently modeled by an $\numTup$ bit string. The bit-string world value can be used as an index to determine which tuples are present in the $\wVec$ world. Given an $\numTup$ vector $\vct{p}$, where the $i^{th}$ element, $\prob_i$ is the probability of the $i^{th}$ tuple, we can then write an equivalent expectation for $\ti$ models,