paper-BagRelationalPDBsAreHard/app_k-relations.tex

18 lines
3.0 KiB
TeX

%!TEX root=./main.tex
We can use $\semK$-relations to model bags. A \emph{$\semK$-relation}~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are annotated with elements from a commutative semiring $\semK = \inset{\domK, \addK, \multK, \zeroK, \oneK}$. A commutative semiring is a structure with a domain $\domK$ and associative and commutative binary operations $\addK$ and $\multK$ such that $\multK$ distributes over $\addK$, $\zeroK$ is the identity of $\addK$, $\oneK$ is the identity of $\multK$, and $\zeroK$ annihilates all elements of $\domK$ when combined by $\multK$.
Let $\udom$ be a countable domain of values.
Formally, an n-ary $\semK$-relation $\rel$ over $\udom$ is a function $\rel: \udom^n \to \domK$ with finite support $\support{\rel} = \{ \tup \mid \rel(\tup) \neq \zeroK \}$. A $\semK$-database is defined similarly, where we view the $\semK$-database (relation) as a function mapping tuples to their respective annotations.
$\raPlus$ query semantics over $\semK$-relations are analogous to the lineage construction semantics of \Cref{fig:nxDBSemantics}, with the exception of replacing $+$ with $\addK$ and $\cdot$ with $\multK$.
Consider the semiring $\semN = \inset{\domN,+,\times,0,1}$ of natural numbers. $\semN$-databases model bag semantics by annotating each tuple with its multiplicity. A probabilistic $\semN$-database ($\semN$-PDB) is a PDB where each possible world is an $\semN$-database. We study the problem of computing statistical moments for query results over such databases. Given an $\semN$-\abbrPDB $\pdb = (\idb, \pd)$, ($\raPlus$) query $\query$, and possible result tuple $\tup$, we sum $\query(\db)(\tup)\cdot\pd\inparen{\db}$ for all $\db \in \idb$ to compute the expected multiplicity of $\tup$. Intuitively, the expectation of $\query(\db)(t)$ is the number of duplicates of $t$ we expect to find in result of query $\query$.
Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_n)$ with natural number coefficients and exponents.
Consider now the semiring (abusing notation) $\semNX = \inset{\semNX, +, \cdot, 0, 1}$ whose domain is $\semNX$, with the standard addition and multiplication of polynomials.
We define an \abbrNXPDB $\pxdb$ as the tuple $(\db_{\semNX}, \pd)$, where $\semNX$-database $\db_{\semNX}$ is paired with the probability distribution $\pd$ across the set of possible worlds \emph{represented} by $\db_{\semNX}$, i.e. the one induced from $\mathcal{P}_{\semNX}$, the probability distribution over $\vct{X}$. Note that the notation is slightly abused since the first element of the pair is an encoded set of possible worlds, i.e. $\db_{\semNX}$ is the \dbbaseName.
We denote by $\nxpolyqdt$ the annotation of tuple $t$ in the result of $\query(\db_{\semNX})(t)$, and as before, interpret it as a function $\nxpolyqdt: \{0,1\}^{|\vct X|} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
\abbrNXPDB\xplural and a function $\rmod$ (which transforms an \abbrNXPDB to an equivalent $\semN$-PDB) are both formalized next.