paper-BagRelationalPDBsAreHard/ra-to-poly.tex

%root: main.tex

\section{Query translation into polynomials}
\AH{This section will involve the set of queries (RA+) that we are interested in, the probabilistic/incomplete models we address, and the outer aggregate functions we perform over the output \textit{annotation}
1) RA notation
2) DB (TIDB) notation
3) How queries translate into polynomials
}

Given tables $\rel, \reli$, an arbitrary query $\query(\rel)$ over the positive relational operators (SPJU), abusing notation slightly denote the query polynomial as $\poly(X_1,\ldots, X_\numTup)$.  To be clear,  $\poly(X_1,\ldots, X_\numTup)$ is a polynomial whose variables represent the tuple annotations of an arbitrary query.The annotation for arbitrary tuple $\tup$ can be viewed as an element of the image of $\rel$, where relation $\rel$ can be thought of as a function with preimage of all tuples in $\rel$, such that $\rel(\tup) = \poly(X_1,\ldots, X_\numTup)$.  Further, it is known that the algebraic semiring structure aptly models the translation and computation of query operations into tuple annotation, aka polynomials.  
To make things more concrete, consider the $\{\mathbb{N}, \times, +, 1, 0\}$ bag semiring.  Here the set in which the tuple annotations (computed polynomials) exist is the natural numbers.  Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is better modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.

Consider the translation of relational operators to polynomial operators in greater detail.

\begin{align*}
&\project_A(\rel)(\tup) = &&\sum_{\tup' s.t. \tup[A] = \tup'} \rel(\tup')\\
& (\rel_1 \union \rel_2)(\tup) = &&\rel_1(\tup) + \rel_2(\tup)\\
&(\rel_1 \join_\theta \rel_2)(\tup) = &&\begin{cases}
						\rel_1(\tup_1) \times \rel_2(\tup_2)	&\text{if }\theta(\tup_1, \tup_2)\\
						0						&\text{otherwise}
					 \end{cases} \\
&\select_\theta(\rel) = &&\begin{cases}
					\rel(\tup)	&\text{if }\theta(\tup) = 1\\
					0		&\text{otherwise}.
				\end{cases}
\end{align*}

Considering probabilistic databases, let $\prob(\wVec)$ denote the probability that a given world occurs.
The output we desire is over the tuple annotations, i.e. polynomial $\poly(X_1,\ldots, X_\numTup)$ is simply the expectation, i.e.
\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \prob(\wVec).\]

A specific probabilistic data model is the Tuple Independent Database (\ti).  This is a database model in which each table is a set of tuples, each of which are independent of one another, and individually occur with a specific probability, $\prob_\tup$.

There are features of $\ti$ that we can exploit.  Note that a $\ti$ naturally has $2^\numTup$ possible worlds, each of which can be conveniently modeled by an $\numTup$ bit string.  The bit-string world value can be used as an index to determine which tuples are present in the $\wVec$ world.  We can then write and equivalent expectation for $\ti$ models,

\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\prod_{\substack{i \in [\numTup]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numTup]\\s.t. w_i = 0}}\left(1 - \prob_i\right).\]
Started texing poly reformation write up. 2020-06-12 11:45:15 -04:00			`%root: main.tex`

			`\section{Query translation into polynomials}`
Started translation, notation section 2020-06-23 15:49:19 -04:00			`\AH{This section will involve the set of queries (RA+) that we are interested in, the probabilistic/incomplete models we address, and the outer aggregate functions we perform over the output \textit{annotation}`
			`1) RA notation`
			`2) DB (TIDB) notation`
			`3) How queries translate into polynomials`
			`}`

RA to poly translation; corrections 062320 2020-06-23 19:33:28 -04:00			Given tables $\rel, \reli$, an arbitrary query $\query(\rel)$ over the positive relational operators (SPJU), abusing notation slightly denote the query polynomial as $\poly(X_1,\ldots, X_\numTup)$. To be clear, $\poly(X_1,\ldots, X_\numTup)$ is a polynomial whose variables represent the tuple annotations of an arbitrary query.The annotation for arbitrary tuple $\tup$ can be viewed as an element of the image of $\rel$, where relation $\rel$ can be thought of as a function with preimage of all tuples in $\rel$, such that $\rel(\tup) = \poly(X_1,\ldots, X_\numTup)$. Further, it is known that the algebraic semiring structure aptly models the translation and computation of query operations into tuple annotation, aka polynomials.
Started translation, notation section 2020-06-23 15:49:19 -04:00			To make things more concrete, consider the $\{\mathbb{N}, \times, +, 1, 0\}$ bag semiring. Here the set in which the tuple annotations (computed polynomials) exist is the natural numbers. Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is better modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.

			`Consider the translation of relational operators to polynomial operators in greater detail.`

			`\begin{align*}`
			`&\project_A(\rel)(\tup) = &&\sum_{\tup' s.t. \tup[A] = \tup'} \rel(\tup')\\`
			`& (\rel_1 \union \rel_2)(\tup) = &&\rel_1(\tup) + \rel_2(\tup)\\`
			`&(\rel_1 \join_\theta \rel_2)(\tup) = &&\begin{cases}`
			`\rel_1(\tup_1) \times \rel_2(\tup_2) &\text{if }\theta(\tup_1, \tup_2)\\`
			`0 &\text{otherwise}`
			`\end{cases} \\`
			`&\select_\theta(\rel) = &&\begin{cases}`
			`\rel(\tup) &\text{if }\theta(\tup) = 1\\`
			`0 &\text{otherwise}.`
			`\end{cases}`
			`\end{align*}`

RA to poly translation; corrections 062320 2020-06-23 19:33:28 -04:00			`Considering probabilistic databases, let $\prob(\wVec)$ denote the probability that a given world occurs.`
			`The output we desire is over the tuple annotations, i.e. polynomial $\poly(X_1,\ldots, X_\numTup)$ is simply the expectation, i.e.`
			`\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \prob(\wVec).\]`

			`A specific probabilistic data model is the Tuple Independent Database (\ti). This is a database model in which each table is a set of tuples, each of which are independent of one another, and individually occur with a specific probability, $\prob_\tup$.`

			`There are features of $\ti$ that we can exploit. Note that a $\ti$ naturally has $2^\numTup$ possible worlds, each of which can be conveniently modeled by an $\numTup$ bit string. The bit-string world value can be used as an index to determine which tuples are present in the $\wVec$ world. We can then write and equivalent expectation for $\ti$ models,`

			`\[\expct_{\wVec}\pbox{\poly(\wVec)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\prod_{\substack{i \in [\numTup]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numTup]\\s.t. w_i = 0}}\left(1 - \prob_i\right).\]`


Made pass on Sec 1 2020-06-23 09:57:35 -04:00