Done with pass on Sec 1.2

This commit is contained in:
Atri Rudra 2020-07-09 00:23:09 -04:00
parent 98d0f7b7b2
commit db5449b733

View file

@ -10,21 +10,21 @@
\subsection{Introduction}
An incomplete database $\idb$ is a set of deterministic databases $\db_i$ where each element is known as a possible world. Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db_i \in \idb$ has the same named set of relations, $\{\rel_1,\ldots, \rel_n\}$ (albeit not equivalent across all instances), whose schemas $(\sch(\rel_i))$ are unchanging across each $\db_j$. For the set of possible worlds, $\wSet$, i.e. each $\db_i \in \idb$, define an injective mapping to the set $\{0, 1\}^M$, where for each vector $\vct{w} \in \{0, 1\}^M$ there is at most one element $\db_i \in \idb$ mapped to $\vct{w}$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is the probability distribution over $\wSet$.
An incomplete database $\idb$ is a set of deterministic databases $\db_i$ where each element is known as a possible world. Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db_i \in \idb$ has the same named set of relations, $\{\rel_1,\ldots, \rel_n\}$ (albeit not equivalent across all instances), whose schemas $(\sch(\rel_i))$ are unchanging across each $\db_j$. For the set of possible worlds, $\wSet$, i.e. the set of all $\db_i \in \idb$, define an injective mapping to the set $\{0, 1\}^M$, where for each vector $\vct{w} \in \{0, 1\}^M$ there is at most one element $\db_i \in \idb$ mapped to $\vct{w}$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is the probability distribution over $\wSet$.
%Below may possibly need to be used again...we'll see.
%probability space $\left(\Omega, \mathcal{A}, P\right)$ over that set. \AR{I'm not sure why you are using the notation $\mathcal{A}$ and $P$, which you do not seem to use beyond this section. I would recommend that you only introduce a notation if you plan to use them later on.} Since the set of possible outcomes is the set of possible worlds, $\wSet$, and the set of outcomes is equivalent to the set of events, we will simplify notation and use $\left(\wSet, P\right)$ to denote the probability space of $\idb$. \AR{If you want to use $(\wSet,P)$ make sure you use the same notation in Sec 1.3 as well. If not, then use the notation from Sec 1.3 here}
\subsection{Modeling and Semantics}
Define $\vct{X}$ denote variables $X_1,\dots,X_M$.
Further define $\idb$ as an $\mathbb{N}[\vct{X}]$ database,\AR{There is a type error here: $\idb$ has alredy been defined as a PDB-- while here we are talking about an annotated DB: they are technically not the same thing so you cannot use the same notation. $\idb$ is used heavily in this sub-section so this change needs to be propagated. Am not sure if there is a standard notation-- if not $D(\vct{X})$ should work fine.} i.e., an incomplete/probabilistic database model where each tuple $\tup \in \idb$ is annotated with a polynomial over variables $X_1,\ldots, X_M$ for some value of $M$ that will be specified later. Intuitively, one can think of $\idb$ as a parameterized database, whose abstract form maps to each deterministic $\db_i \in \idb$.\AR{There is not need to connect back to possible world etc. in this sub-section.}
Further define $\idb$ as an $\mathbb{N}[\vct{X}]$ database, i.e., an incomplete/probabilistic database model where each tuple $\tup \in \idb$ is annotated with a polynomial over variables $X_1,\ldots, X_M$ for some value of $M$ that will be specified later. Intuitively, one can think of $\idb$ as a parameterized database, whose abstract form maps to each deterministic $\db_i \in \idb$.
Since $\idb$ is a database that maps tuples to polynomials, it is customary for arbitrary table $\rel$ to be viewed as a function $\rel: \tup \in \idb \mapsto \mathbb{N}[\vct{X}]$,\AR{function notation is always a map from domain to range. Also you need a notation for set of all tuples.} where $\rel(\tup)$ denotes the polynomial mapped to tuple $\tup$.
Since $\idb$ is a database that maps tuples to polynomials, it is customary for arbitrary table $\rel$ to be viewed as a function $\rel: \tup \in \idb \mapsto \mathbb{N}[\vct{X}]$, where $\rel(\tup)$ denotes the polynomial mapped to tuple $\tup$.
It has been shown in previous work that commutative semirings precisely model translations of RA+ query operations to set annotations. Since $\idb$ is an $\mathbb{N}[\vct{X}]$ database, we are then working with the commutative semiring $\{\mathbb{N}[\vct{X}], +, \times, 0, 1\}$, where $\mathbb{N}[\vct{X}]$ is the set from which all annotations originate.
It has been shown in previous work that commutative semirings precisely model translations of RA+ query operations to set annotations. Since $\idb$ is an $\mathbb{N}[\vct{X}]$ database, we are then working with the commutative semiring $\{\mathbb{N}[\vct{X}], +, \times, 0, 1\}$. %, where $\mathbb{N}[\vct{X}]$ is the set from which all annotations originate.
Given a query $\query$, operations in $\query$ are translated into the following polynomial operations.
\AR{Explicitly mention what $\llbracket \cdot \rrbracket$ notation means.}
\begin{align*}
@ -36,6 +36,7 @@ Given a query $\query$, operations in $\query$ are translated into the following
0 &\text{otherwise}.
\end{cases}
\end{align*}
\AR{You should have the base case of the reduction explicitly stated as well-- i.e. what the poly of a tuple is. Also, in the RHS of the equality should also have the evaluation notation. Finally why is the join not just the product of $R_1(t)$ and $R_2(t)$, or more precisely $\llbracket R_1\rrbracket(t)\times \llbracket R_2\rrbracket(t)$?}
Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is better modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.