%root: main.tex

\section{Query translation into polynomials}
%\AH{This section will involve the set of queries (RA+) that we are interested in, the probabilistic/incomplete models we address, and the outer aggregate functions we perform over the output \textit{annotation}
%1) RA notation
%2) DB (TIDB) notation
%3) How queries translate into polynomials
%}

Given tables $\rel, \reli$, an arbitrary query $\query(\rel)$ over the positive relational operators (SPJU), abusing notation slightly denote the query polynomial as $\poly(X_1,\ldots, X_\numTup)$.  To be clear,  $\poly(X_1,\ldots, X_\numTup)$ is a polynomial whose variables represent the tuple annotations of an arbitrary query.The annotation for arbitrary tuple $\tup$ can be viewed as an element of the image of $\rel$, where relation $\rel$ can be thought of as a function with preimage of all tuples in $\rel$, such that $\rel(\tup) = \poly(X_1,\ldots, X_\numTup)$.  Further, it is known that the algebraic semiring structure aptly models the translation and computation of query operations into tuple annotation, aka polynomials.  
To make things more concrete, consider the $\{\mathbb{N}, \times, +, 1, 0\}$ bag semiring.  Here the set in which the tuple annotations (computed polynomials) exist is the natural numbers.  Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is better modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.

Consider the translation of relational operators to polynomial operators in greater detail.

\begin{align*}
&\project_A(\rel)(\tup) = &&\sum_{\tup' s.t. \tup[A] = \tup'} \rel(\tup')\\
& (\rel_1 \union \rel_2)(\tup) = &&\rel_1(\tup) + \rel_2(\tup)\\
&(\rel_1 \join_\theta \rel_2)(\tup) = &&\begin{cases}
						\rel_1(\tup_1) \times \rel_2(\tup_2)	&\text{if }\theta(\tup_1, \tup_2)\\
						0						&\text{otherwise}
					 \end{cases} \\
&\select_\theta(\rel) = &&\begin{cases}
					\rel(\tup)	&\text{if }\theta(\tup) = 1\\
					0		&\text{otherwise}.
				\end{cases}
\end{align*}

Considering probabilistic databases, let $\prob(\wVec)$ denote the probability that a given world occurs.
The output we desire is over the tuple annotations, i.e. polynomial $\poly(X_1,\ldots, X_\numTup)$ is simply the expectation, i.e.
\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \prob(\wVec).\]

A specific probabilistic data model is the Tuple Independent Database (\ti).  This is a database model in which each table is a set of tuples, each of which are independent of one another, and individually occur with a specific probability, $\prob_\tup$.

There are features of $\ti$ that we can exploit.  Note that a $\ti$ naturally has $2^\numTup$ possible worlds, each of which can be conveniently modeled by an $\numTup$ bit string.  The bit-string world value can be used as an index to determine which tuples are present in the $\wVec$ world.  We can then write and equivalent expectation for $\ti$ models,

\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\prod_{\substack{i \in [\numTup]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numTup]\\s.t. w_i = 0}}\left(1 - \prob_i\right).\]