From db5449b733eed1db7ea7b6bbf0de732b6972f703 Mon Sep 17 00:00:00 2001 From: Atri Rudra Date: Thu, 9 Jul 2020 00:23:09 -0400 Subject: [PATCH] Done with pass on Sec 1.2 --- ra-to-poly.tex | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/ra-to-poly.tex b/ra-to-poly.tex index 6fe2f95..1e7c020 100644 --- a/ra-to-poly.tex +++ b/ra-to-poly.tex @@ -10,21 +10,21 @@ \subsection{Introduction} -An incomplete database $\idb$ is a set of deterministic databases $\db_i$ where each element is known as a possible world. Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db_i \in \idb$ has the same named set of relations, $\{\rel_1,\ldots, \rel_n\}$ (albeit not equivalent across all instances), whose schemas $(\sch(\rel_i))$ are unchanging across each $\db_j$. For the set of possible worlds, $\wSet$, i.e. each $\db_i \in \idb$, define an injective mapping to the set $\{0, 1\}^M$, where for each vector $\vct{w} \in \{0, 1\}^M$ there is at most one element $\db_i \in \idb$ mapped to $\vct{w}$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is the probability distribution over $\wSet$. +An incomplete database $\idb$ is a set of deterministic databases $\db_i$ where each element is known as a possible world. Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db_i \in \idb$ has the same named set of relations, $\{\rel_1,\ldots, \rel_n\}$ (albeit not equivalent across all instances), whose schemas $(\sch(\rel_i))$ are unchanging across each $\db_j$. For the set of possible worlds, $\wSet$, i.e. the set of all $\db_i \in \idb$, define an injective mapping to the set $\{0, 1\}^M$, where for each vector $\vct{w} \in \{0, 1\}^M$ there is at most one element $\db_i \in \idb$ mapped to $\vct{w}$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is the probability distribution over $\wSet$. %Below may possibly need to be used again...we'll see. %probability space $\left(\Omega, \mathcal{A}, P\right)$ over that set. \AR{I'm not sure why you are using the notation $\mathcal{A}$ and $P$, which you do not seem to use beyond this section. I would recommend that you only introduce a notation if you plan to use them later on.} Since the set of possible outcomes is the set of possible worlds, $\wSet$, and the set of outcomes is equivalent to the set of events, we will simplify notation and use $\left(\wSet, P\right)$ to denote the probability space of $\idb$. \AR{If you want to use $(\wSet,P)$ make sure you use the same notation in Sec 1.3 as well. If not, then use the notation from Sec 1.3 here} \subsection{Modeling and Semantics} +Define $\vct{X}$ denote variables $X_1,\dots,X_M$. +Further define $\idb$ as an $\mathbb{N}[\vct{X}]$ database,\AR{There is a type error here: $\idb$ has alredy been defined as a PDB-- while here we are talking about an annotated DB: they are technically not the same thing so you cannot use the same notation. $\idb$ is used heavily in this sub-section so this change needs to be propagated. Am not sure if there is a standard notation-- if not $D(\vct{X})$ should work fine.} i.e., an incomplete/probabilistic database model where each tuple $\tup \in \idb$ is annotated with a polynomial over variables $X_1,\ldots, X_M$ for some value of $M$ that will be specified later. Intuitively, one can think of $\idb$ as a parameterized database, whose abstract form maps to each deterministic $\db_i \in \idb$.\AR{There is not need to connect back to possible world etc. in this sub-section.} -Further define $\idb$ as an $\mathbb{N}[\vct{X}]$ database, i.e., an incomplete/probabilistic database model where each tuple $\tup \in \idb$ is annotated with a polynomial over variables $X_1,\ldots, X_M$ for some value of $M$ that will be specified later. Intuitively, one can think of $\idb$ as a parameterized database, whose abstract form maps to each deterministic $\db_i \in \idb$. +Since $\idb$ is a database that maps tuples to polynomials, it is customary for arbitrary table $\rel$ to be viewed as a function $\rel: \tup \in \idb \mapsto \mathbb{N}[\vct{X}]$,\AR{function notation is always a map from domain to range. Also you need a notation for set of all tuples.} where $\rel(\tup)$ denotes the polynomial mapped to tuple $\tup$. -Since $\idb$ is a database that maps tuples to polynomials, it is customary for arbitrary table $\rel$ to be viewed as a function $\rel: \tup \in \idb \mapsto \mathbb{N}[\vct{X}]$, where $\rel(\tup)$ denotes the polynomial mapped to tuple $\tup$. - -It has been shown in previous work that commutative semirings precisely model translations of RA+ query operations to set annotations. Since $\idb$ is an $\mathbb{N}[\vct{X}]$ database, we are then working with the commutative semiring $\{\mathbb{N}[\vct{X}], +, \times, 0, 1\}$, where $\mathbb{N}[\vct{X}]$ is the set from which all annotations originate. +It has been shown in previous work that commutative semirings precisely model translations of RA+ query operations to set annotations. Since $\idb$ is an $\mathbb{N}[\vct{X}]$ database, we are then working with the commutative semiring $\{\mathbb{N}[\vct{X}], +, \times, 0, 1\}$. %, where $\mathbb{N}[\vct{X}]$ is the set from which all annotations originate. Given a query $\query$, operations in $\query$ are translated into the following polynomial operations. - +\AR{Explicitly mention what $\llbracket \cdot \rrbracket$ notation means.} \begin{align*} @@ -36,6 +36,7 @@ Given a query $\query$, operations in $\query$ are translated into the following 0 &\text{otherwise}. \end{cases} \end{align*} +\AR{You should have the base case of the reduction explicitly stated as well-- i.e. what the poly of a tuple is. Also, in the RHS of the equality should also have the evaluation notation. Finally why is the join not just the product of $R_1(t)$ and $R_2(t)$, or more precisely $\llbracket R_1\rrbracket(t)\times \llbracket R_2\rrbracket(t)$?} Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is better modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.