paper-BagRelationalPDBsAreHard/ra-to-poly.tex

40 lines
3.3 KiB
TeX
Raw Normal View History

%root: main.tex
\section{Query translation into polynomials}
2020-06-23 15:49:19 -04:00
\AH{This section will involve the set of queries (RA+) that we are interested in, the probabilistic/incomplete models we address, and the outer aggregate functions we perform over the output \textit{annotation}
1) RA notation
2) DB (TIDB) notation
3) How queries translate into polynomials
}
Given tables $\rel, \reli$, an arbitrary query $\query(\rel)$ over the positive relational operators (SPJU), abusing notation slightly denote the query polynomial as $\poly(X_1,\ldots, X_\numTup)$. To be clear, $\poly(X_1,\ldots, X_\numTup)$ is a polynomial whose variables represent the tuple annotations of an arbitrary query.The annotation for arbitrary tuple $\tup$ can be viewed as an element of the image of $\rel$, where relation $\rel$ can be thought of as a function with preimage of all tuples in $\rel$, such that $\rel(\tup) = \poly(X_1,\ldots, X_\numTup)$. Further, it is known that the algebraic semiring structure aptly models the translation and computation of query operations into tuple annotation, aka polynomials.
2020-06-23 15:49:19 -04:00
To make things more concrete, consider the $\{\mathbb{N}, \times, +, 1, 0\}$ bag semiring. Here the set in which the tuple annotations (computed polynomials) exist is the natural numbers. Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is better modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.
Consider the translation of relational operators to polynomial operators in greater detail.
\begin{align*}
&\project_A(\rel)(\tup) = &&\sum_{\tup' s.t. \tup[A] = \tup'} \rel(\tup')\\
& (\rel_1 \union \rel_2)(\tup) = &&\rel_1(\tup) + \rel_2(\tup)\\
&(\rel_1 \join_\theta \rel_2)(\tup) = &&\begin{cases}
\rel_1(\tup_1) \times \rel_2(\tup_2) &\text{if }\theta(\tup_1, \tup_2)\\
0 &\text{otherwise}
\end{cases} \\
&\select_\theta(\rel) = &&\begin{cases}
\rel(\tup) &\text{if }\theta(\tup) = 1\\
0 &\text{otherwise}.
\end{cases}
\end{align*}
Considering probabilistic databases, let $\prob(\wVec)$ denote the probability that a given world occurs.
The output we desire is over the tuple annotations, i.e. polynomial $\poly(X_1,\ldots, X_\numTup)$ is simply the expectation, i.e.
\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \prob(\wVec).\]
A specific probabilistic data model is the Tuple Independent Database (\ti). This is a database model in which each table is a set of tuples, each of which are independent of one another, and individually occur with a specific probability, $\prob_\tup$.
There are features of $\ti$ that we can exploit. Note that a $\ti$ naturally has $2^\numTup$ possible worlds, each of which can be conveniently modeled by an $\numTup$ bit string. The bit-string world value can be used as an index to determine which tuples are present in the $\wVec$ world. We can then write and equivalent expectation for $\ti$ models,
\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\prod_{\substack{i \in [\numTup]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numTup]\\s.t. w_i = 0}}\left(1 - \prob_i\right).\]
2020-06-23 09:57:35 -04:00