More work on background/notational/translation section

This commit is contained in:
Aaron Huber 2020-06-30 15:31:06 -04:00
parent 432a3fb7e9
commit 9c52805d69
2 changed files with 11 additions and 3 deletions

View file

@ -6,8 +6,11 @@
%RA-to-Poly Notation
\newcommand{\rel}{R}
\newcommand{\prel}{\mathcal{\rel}}
\newcommand{\reli}{S}
\newcommand{\relii}{T}
\newcommand{\db}{D}
\newcommand{\idb}{\mathcal{\db}}
\newcommand{\query}{Q}
\newcommand{\join}{\Join}
\newcommand{\select}{\sigma}
@ -271,7 +274,7 @@
%
%borrowed from Su and Boris
%borrowed from Su and Boris; Ihave them here for reference purposes.
%needs to be cleaned up
%
@ -337,7 +340,6 @@
\newcommand{\relschema}{\mathbf{R}}
\newcommand{\schemaOf}{\textsc{Sch}}
\newcommand{\arity}[1]{arity({#1})}
\newcommand{\db}{D}
%\newcommand{\rel}{R}
%\newcommand{\query}{Q}
\newcommand{\qClass}{\mathcal{C}}

View file

@ -7,7 +7,13 @@
%2) DB (TIDB) notation
%3) How queries translate into polynomials
%}
\AH{I think I need to$\ldots$, possibly also the notion of $P(t \in \query(\idb))$, and then lead into discussing the semiring paper annotation of $\query(\rel)(t)$.}
An incomplete database $\idb$ is a set of deterministic databases $\db_i$ where each element is known a possible world. Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db_i \in \idb$ has the same set of relations, $\{\rel_1,\ldots, \rel_n\}$, whose schemas are unchanging across each $\db_i$. When $\idb$ is a probabilistice database, $\idb$ can be viewed as having two components, the set of possible worlds, and a probability space $\left(\Omega, \mathcal{A}, P\right)$ over that set. Since the set of possible outcomes is the set of possible worlds, $\wSet$, and the set of outcomes is equivalent to the set of events, we will simplify notation and use $\left(\wSet, P\right)$ to denote the probability space of $\idb$.
$\idb$ can be generally viewed as the set of relations $\{\prel_1,\ldots, \prel_n\}$, where for each $\prel_i \in \idb$, $\prel_i$ consists of the set of all tuples appearing in $\rel_i$ across each of the possible worlds $\db_i \in \idb$, where each tuple is annotated with a provenance polynomial from the set $\mathbb{N}[X]$, and the set $X$ is the alphabet of variables in $\idb$. One can think of $\idb$ as a parameterized database, whose abstract form maps to a deterministic $\db_i \in \idb$ based on the valuation to which the variables of $\idb$ are bound.
Denote an arbitrary positive relational algebra query as $\query$, and $\query(\idb)$ to be the query run with $\idb$ as input. Operations in $\query$ are translated into the following operations.
\AH{Here is where to pick up the discussion again.}
Given tables $\rel, \reli$, an arbitrary query $\query(\rel)$ over the positive relational operators (SPJU), abusing notation slightly denote the query polynomial as $\poly(X_1,\ldots, X_\numTup)$.
\OK{
Eventually, you probably want a little more background here, depending on the query notation you choose to use. The simplest approach would be basing it on the Green et. al. Provenance Semirings paper. As we discussed, that would make $\query(\mathcal D)(t)$ the query polynomial.
@ -42,7 +48,7 @@ Consider the translation of relational operators to polynomial operators in grea
\end{align*}
Considering probabilistic databases, let $\prob(X_i)$ $\left(\prob(\vct{X})\right)$ denote the probability that a given variable (set of variables) occur(s). We can substitute $\wVec$ for $\vct{X}$ where the $i^{th}$ bit of $\wVec$ is bound to it's corresponding $X_i$ variable. Then $\prob(\wVec)$ denotes the probability that a given world occurs.
\OK{Might help to more precisely define $\wVec$ and its relation to the $X_i$s}
The output we desire performed over the tuple annotations, i.e. polynomial $\poly(X_1,\ldots, X_\numTup)$ is the expectation, i.e.
\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \prob(\wVec).\]