paper-BagRelationalPDBsAreHard/k-relations.tex

19 lines
2.9 KiB
TeX
Raw Normal View History

2021-04-09 00:07:33 -04:00
%!TEX root=./main.tex
2021-09-17 19:06:35 -04:00
We can use $\semK$-relations to model bags. A \emph{$\semK$-relation}~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are annotated with elements from a commutative semiring $\semK = (\domK, \addK, \multK, \zeroK, \oneK)$. A commutative semiring is a structure with a domain $\domK$ and associative and commutative binary operations $\addK$ and $\multK$ such that $\multK$ distributes over $\addK$, $\zeroK$ is the identity of $\addK$, $\oneK$ is the identity of $\multK$, and $\zeroK$ annihilates all elements of $\domK$ when combined by $\multK$.
2021-04-09 00:07:33 -04:00
Let $\udom$ be a countable domain of values.
2021-04-10 00:19:16 -04:00
Formally, an n-ary $\semK$-relation over $\udom$ is a function $\rel: \udom^n \to \domK$ with finite support $\support{\rel} = \{ \tup \mid \rel(\tup) \neq \zeroK \}$.
2021-04-09 00:07:33 -04:00
A $\semK$-database is a set of $\semK$-relations. It will be convenient to also interpret a $\semK$-database as a function from tuples to annotations. Thus, $\rel(t)$ (resp., $\db(t)$) denotes the annotation associated by $\semK$-relation $\rel$ ($\semK$-database $\db$) to $t$.
2021-09-17 19:06:35 -04:00
The semantics for $\raPlus$ queries over $\semK$-relations are analogous to the lineage construction semantics of \Cref{fig:nxDBSemantics}, with the exception of replacing $+$ with $\addK$ and $\cdot$ with $\multK$.
2021-04-09 00:07:33 -04:00
2021-09-17 19:06:35 -04:00
Consider the semiring $\semN = (\domN,+,\times,0,1)$ of natural numbers. $\semN$-databases model bag semantics by annotating each tuple with its multiplicity. A probabilistic $\semN$-database ($\semN$-PDB) is a PDB where each possible world is an $\semN$-database. We study the problem of computing statistical moments for query results over such databases. Given a probabilistic $\semN$-database $\pdb = (\idb, \pd)$, ($\raPlus$) query $\query$, and possible result tuple $\tup$, we sum $\query(\db)(\tup)\cdot\pd\inparen{\db}$ for all $\db \in \idb$ to compute the expected multiplicity of $\tup$. Intuitively, the expectation of $\query(\db)(t)$ is the number of duplicates of $t$ we expect to find in result of query $\query$.
2021-04-09 00:07:33 -04:00
Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_n)$ with natural number coefficients and exponents.
Consider now the semiring $(\semNX, +, \cdot, 0, 1)$ whose domain is $\semNX$, with the standard addition and multiplication of polynomials.
2021-09-17 22:17:30 -04:00
We define an \abbrNXPDB $\pxdb$ as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$.
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit \abbrNXPDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{|\vct X|} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
\abbrNXPDB\xplural and a function $\rmod$ (which transforms an \abbrNXPDB to an equivalent $\semN$-PDB) are both formalized next.
2021-04-09 22:00:34 -04:00