paper-BagRelationalPDBsAreHard/ra-to-poly.tex

59 lines
4.4 KiB
TeX

%root: main.tex
%!TEX root=./main.tex
\section{Query translation into polynomials}
%\AH{This section will involve the set of queries (RA+) that we are interested in, the probabilistic/incomplete models we address, and the outer aggregate functions we perform over the output \textit{annotation}
%1) RA notation
%2) DB (TIDB) notation
%3) How queries translate into polynomials
%}
Given tables $\rel, \reli$, an arbitrary query $\query(\rel)$ over the positive relational operators (SPJU), abusing notation slightly denote the query polynomial as $\poly(X_1,\ldots, X_\numTup)$.
\OK{
Eventually, you probably want a little more background here, depending on the query notation you choose to use. The simplest approach would be basing it on the Green et. al. Provenance Semirings paper. As we discussed, that would make $\query(\mathcal D)(t)$ the query polynomial.
}
To be clear, $\poly(X_1,\ldots, X_\numTup)$ is a polynomial whose variables represent the tuple annotations of an arbitrary query.
\OK{
I don't think we're on the same page here. From the Prov. Semirings perspective, the entire $\poly(X_i)$ is the annotation of a tuple in an arbitrary query over a $\mathbb R[x]$-relation (i.e., a relation who's tuples are annotated by polynomials over the reals). The $X_i$s are not annotations, they're the variables of that polynomial. (footnote: Presumably, there are tuples in the database who's annotations are just a single variable, but that's not the general case).
}
The annotation for arbitrary tuple $\tup$ can be viewed as an element of the image of $\rel$, where relation $\rel$ can be thought of as a function with preimage of all tuples in $\rel$, such that $\rel(\tup) = \poly(X_1,\ldots, X_\numTup)$. Further, it is known that the algebraic semiring structure aptly models the translation and computation of query operations into tuple annotation, aka polynomials.
To make things more concrete, consider the $\{\mathbb{N}, \times, +, 1, 0\}$ bag semiring. Here the set in which the tuple annotations (computed polynomials) exist is the natural numbers. Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is better modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.
\OK{
A good summary to start. We'll need to make this more precise for the final paper though.
}
Consider the translation of relational operators to polynomial operators in greater detail.
\begin{align*}
&\project_A(\rel)(\tup) = &&\sum_{\tup' s.t. \tup'[A] = \tup} \rel(\tup')\\
& (\rel_1 \union \rel_2)(\tup) = &&\rel_1(\tup) + \rel_2(\tup)\\
&(\rel_1 \join_\theta \rel_2)(\tup) = &&\begin{cases}
\rel_1(\tup_1) \times \rel_2(\tup_2) &\text{if }\theta(\tup_1, \tup_2)\\
0 &\text{otherwise}
\end{cases} \\
&\select_\theta(\rel) = &&\begin{cases}
\rel(\tup) &\text{if }\theta(\tup) = 1\\
0 &\text{otherwise}.
\end{cases}
\end{align*}
Considering probabilistic databases, let $\prob(\wVec)$ denote the probability that a given world occurs.
\OK{Might help to more precisely define $\wVec$ and its relation to the $X_i$s}
The output we desire is over the tuple annotations, i.e. polynomial $\poly(X_1,\ldots, X_\numTup)$ is simply the expectation, i.e.
\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \prob(\wVec).\]
A specific probabilistic data model is the Tuple Independent Database (\ti). This is a database model in which each table is a set of tuples, each of which are independent of one another, and individually occur with a specific probability, $\prob_\tup$.
There are features of $\ti$ that we can exploit. Note that a $\ti$ naturally has $2^\numTup$ possible worlds, each of which can be conveniently modeled by an $\numTup$ bit string. The bit-string world value can be used as an index to determine which tuples are present in the $\wVec$ world. We can then write and equivalent expectation for $\ti$ models,
\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\prod_{\substack{i \in [\numTup]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numTup]\\s.t. w_i = 0}}\left(1 - \prob_i\right).\]
\OK{
It would, again, be helpful here to have an explicitly stated mapping between $\wVec$ and the $X_i$s
}