%root: main.tex \section{Query translation into polynomials} %\AH{This section will involve the set of queries (RA+) that we are interested in, the probabilistic/incomplete models we address, and the outer aggregate functions we perform over the output \textit{annotation} %1) RA notation %2) DB (TIDB) notation %3) How queries translate into polynomials %} Given tables $\rel, \reli$, an arbitrary query $\query(\rel)$ over the positive relational operators (SPJU), abusing notation slightly denote the query polynomial as $\poly(X_1,\ldots, X_\numTup)$. To be clear, $\poly(X_1,\ldots, X_\numTup)$ is a polynomial whose variables represent the tuple annotations of an arbitrary query.The annotation for arbitrary tuple $\tup$ can be viewed as an element of the image of $\rel$, where relation $\rel$ can be thought of as a function with preimage of all tuples in $\rel$, such that $\rel(\tup) = \poly(X_1,\ldots, X_\numTup)$. Further, it is known that the algebraic semiring structure aptly models the translation and computation of query operations into tuple annotation, aka polynomials. To make things more concrete, consider the $\{\mathbb{N}, \times, +, 1, 0\}$ bag semiring. Here the set in which the tuple annotations (computed polynomials) exist is the natural numbers. Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is better modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate. Consider the translation of relational operators to polynomial operators in greater detail. \begin{align*} &\project_A(\rel)(\tup) = &&\sum_{\tup' s.t. \tup[A] = \tup'} \rel(\tup')\\ & (\rel_1 \union \rel_2)(\tup) = &&\rel_1(\tup) + \rel_2(\tup)\\ &(\rel_1 \join_\theta \rel_2)(\tup) = &&\begin{cases} \rel_1(\tup_1) \times \rel_2(\tup_2) &\text{if }\theta(\tup_1, \tup_2)\\ 0 &\text{otherwise} \end{cases} \\ &\select_\theta(\rel) = &&\begin{cases} \rel(\tup) &\text{if }\theta(\tup) = 1\\ 0 &\text{otherwise}. \end{cases} \end{align*} Considering probabilistic databases, let $\prob(\wVec)$ denote the probability that a given world occurs. The output we desire is over the tuple annotations, i.e. polynomial $\poly(X_1,\ldots, X_\numTup)$ is simply the expectation, i.e. \[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \prob(\wVec).\] A specific probabilistic data model is the Tuple Independent Database (\ti). This is a database model in which each table is a set of tuples, each of which are independent of one another, and individually occur with a specific probability, $\prob_\tup$. There are features of $\ti$ that we can exploit. Note that a $\ti$ naturally has $2^\numTup$ possible worlds, each of which can be conveniently modeled by an $\numTup$ bit string. The bit-string world value can be used as an index to determine which tuples are present in the $\wVec$ world. We can then write and equivalent expectation for $\ti$ models, \[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\prod_{\substack{i \in [\numTup]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numTup]\\s.t. w_i = 0}}\left(1 - \prob_i\right).\]