poly
This commit is contained in:
parent
0f704e7377
commit
a089fe3a23
|
@ -47,8 +47,7 @@ The reduced form of a lineage polynomial can be obtained but requires a linear s
|
||||||
\subsection{Probabilistic Databases (PDBs)}
|
\subsection{Probabilistic Databases (PDBs)}
|
||||||
|
|
||||||
An \textit{incomplete database} $\idb$ is a set of deterministic databases $\db$ called possible worlds.
|
An \textit{incomplete database} $\idb$ is a set of deterministic databases $\db$ called possible worlds.
|
||||||
Denote the schema of $\db$ as $\sch(\db)$. A \textit{probabilistic database} $\pdb$ is a pair $(\idb, \pd)$ where $\idb$ is an incomplete database and $\pd$ is a probability distribution over $\idb$. Queries over probabilistic databases are evaluated using the so-called possible world semantics. Under possible world semantics, the result of a query $\query$ over an incomplete database $\idb$ is the set of query answers produced by evaluating $\query$ over each possible world:
|
Denote the schema of $\db$ as $\sch(\db)$. A \textit{probabilistic database} $\pdb$ is a pair $(\idb, \pd)$ where $\idb$ is an incomplete database and $\pd$ is a probability distribution over $\idb$. Queries over probabilistic databases are evaluated using the so-called possible world semantics. Under possible world semantics, the result of a query $\query$ over an incomplete database $\idb$ is the set of query answers produced by evaluating $\query$ over each possible world: $\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}$
|
||||||
\[\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}\]
|
|
||||||
|
|
||||||
For a probabilistic database $\pdb = (\idb, \pd)$, the result of a query is the pair $(\query(\idb), \pd')$ where $\pd'$ is a probability distribution over $\query(\idb)$ that assigns to each possible query result the sum of the probabilities of the worlds that produce this answer:
|
For a probabilistic database $\pdb = (\idb, \pd)$, the result of a query is the pair $(\query(\idb), \pd')$ where $\pd'$ is a probability distribution over $\query(\idb)$ that assigns to each possible query result the sum of the probabilities of the worlds that produce this answer:
|
||||||
\[\forall \db \in \query(\idb): \pd'(\db) = \sum_{\db' \in \idb: \query(\db') = \db} \pd(\db') \]
|
\[\forall \db \in \query(\idb): \pd'(\db) = \sum_{\db' \in \idb: \query(\db') = \db} \pd(\db') \]
|
||||||
|
@ -77,16 +76,16 @@ Intuitively, the expectation of $\query(\db)(t)$ is the number of duplicates of
|
||||||
For completeness, we briefly review the semantics for $\raPlus$ queries over $\semK$-relations~\cite{DBLP:conf/pods/GreenKT07}.
|
For completeness, we briefly review the semantics for $\raPlus$ queries over $\semK$-relations~\cite{DBLP:conf/pods/GreenKT07}.
|
||||||
We use $\evald{\cdot}{\db}$ to denote the result of evaluating query $\query$ over $\semK$-database $\db$. Below, we assume that tuples are of appropriate arity, use $\sch(\rel)$ to denote the attributes of $\rel$, and use $\project_A(\tup)$ to denote the projection of tuple $\tup$ on a list of attributes $A$. Furthermore, $\theta(\tup)$ denotes the (Boolean) result of evaluating condition $\theta$ over $\tup$.
|
We use $\evald{\cdot}{\db}$ to denote the result of evaluating query $\query$ over $\semK$-database $\db$. Below, we assume that tuples are of appropriate arity, use $\sch(\rel)$ to denote the attributes of $\rel$, and use $\project_A(\tup)$ to denote the projection of tuple $\tup$ on a list of attributes $A$. Furthermore, $\theta(\tup)$ denotes the (Boolean) result of evaluating condition $\theta$ over $\tup$.
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\evald{\project_A(\rel)}{\db}(\tup) &= \sum_{\tup': \project_A(\tup') = \tup} \evald{\rel}{\db}(\tup') &
|
\evald{\project_A(\rel)}{\db}(\tup) &= \sum_{\tup': \project_A(\tup') = \tup} \evald{\rel}{\db}(\tup') &
|
||||||
\evald{(\rel_1 \union \rel_2)}{\db}(\tup) &= \evald{\rel_1}{\db}(\tup) \addK \evald{\rel_2}{\db}(\tup)\\
|
\evald{(\rel_1 \union \rel_2)}{\db}(\tup) &= \evald{\rel_1}{\db}(\tup) \addK \evald{\rel_2}{\db}(\tup)\\
|
||||||
\evald{\select_\theta(\rel)}{\db}(\tup) &= \begin{cases}
|
\evald{\select_\theta(\rel)}{\db}(\tup) &= \begin{cases}
|
||||||
\evald{\rel}{\db}(\tup) & \text{if }\theta(\tup) \\
|
\evald{\rel}{\db}(\tup) & \text{if }\theta(\tup) \\
|
||||||
\zeroK & \text{otherwise}.
|
\zeroK & \text{otherwise}.
|
||||||
\end{cases} &
|
\end{cases} &
|
||||||
\evald{(\rel_1 \join \rel_2)}{\db}(\tup) &=
|
\evald{(\rel_1 \join \rel_2)}{\db}(\tup) &=
|
||||||
\begin{aligned}
|
\begin{aligned}
|
||||||
\evald{\rel_1}{\db}(\project_{\sch(\rel_1)}(\tup)) \multK \\
|
\evald{\rel_1}{\db}(\project_{\sch(\rel_1)}(\tup)) \multK \\
|
||||||
\evald{\rel_2}{\db}(\project_{\sch(\rel_2)}(\tup))
|
\evald{\rel_2}{\db}(\project_{\sch(\rel_2)}(\tup))
|
||||||
\end{aligned}\\
|
\end{aligned}\\
|
||||||
& & \evald{R}{\db}(\tup) &= \rel(\tup)
|
& & \evald{R}{\db}(\tup) &= \rel(\tup)
|
||||||
\end{align*}
|
\end{align*}
|
||||||
|
@ -96,18 +95,18 @@ We use $\evald{\cdot}{\db}$ to denote the result of evaluating query $\query$ ov
|
||||||
\subsubsection{$\semNX$ as a Representation System}\label{sec:semnx-as-repr}
|
\subsubsection{$\semNX$ as a Representation System}\label{sec:semnx-as-repr}
|
||||||
|
|
||||||
Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_n)$ with natural number coefficients and exponents.
|
Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_n)$ with natural number coefficients and exponents.
|
||||||
Consider now the semiring $(\semNX, +, \cdot, 0, 1)$ whose domain is $\semNX$, with the standard addition and multiplication of polynomials.
|
Consider now the semiring $(\semNX, +, \cdot, 0, 1)$ whose domain is $\semNX$, with the standard addition and multiplication of polynomials.
|
||||||
We will use $\semNX$-PDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$.
|
We will use $\semNX$-PDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$.
|
||||||
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit $\semNX$-PDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{|\vct X|} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
|
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit $\semNX$-PDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{|\vct X|} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
|
||||||
$\semNX$-PDBs and a function $\rmod$ (which transforms an $\semNX$-PDB to an equivalent $\semN$-PDB) are both formalized in \Cref{subsec:supp-mat-background}). \AR{Boris/Oliver: Should we mention that the proposition is obvious but has not been stated in literature for bags?}
|
$\semNX$-PDBs and a function $\rmod$ (which transforms an $\semNX$-PDB to an equivalent $\semN$-PDB) are both formalized in \Cref{subsec:supp-mat-background}). \AR{Boris/Oliver: Should we mention that the proposition is obvious but has not been stated in literature for bags?}
|
||||||
|
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\begin{Proposition}[Expectation of polynomials]\label{prop:expection-of-polynom}
|
\begin{Proposition}[Expectation of polynomials]\label{prop:expection-of-polynom}
|
||||||
Given an $\semN$-PDB $\pdb = (\idb,\pd)$ and $\semNX$-PDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$, we have:
|
Given an $\semN$-PDB $\pdb = (\idb,\pd)$ and $\semNX$-PDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$, we have:
|
||||||
\[ \expct_{\idb \sim \pd}[\query(\idb)(t)] = \expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}. \]
|
$ \expct_{\idb \sim \pd}[\query(\idb)(t)] = \expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}. $
|
||||||
\end{Proposition}
|
\end{Proposition}
|
||||||
\noindent A formal proof of \Cref{prop:expection-of-polynom} is given in \Cref{subsec:expectation-of-polynom-proof}.
|
\noindent A formal proof of \Cref{prop:expection-of-polynom} is given in \Cref{subsec:expectation-of-polynom-proof}.
|
||||||
This proposition shows that computing expected tuple multiplicities is equivalent to computing the expectation of a polynomial (for that tuple) from a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$.
|
This proposition shows that computing expected tuple multiplicities is equivalent to computing the expectation of a polynomial (for that tuple) from a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$.
|
||||||
We focus on this problem from now on, assume an implicit result tuple, and so drop the subscript from $\polyForTuple$ (i.e., $\poly$ will denote a polynomial).
|
We focus on this problem from now on, assume an implicit result tuple, and so drop the subscript from $\polyForTuple$ (i.e., $\poly$ will denote a polynomial).
|
||||||
|
|
||||||
|
@ -122,3 +121,7 @@ A \emph{\ti} is a \bi where each block contains exactly one tuple.
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
|
||||||
|
|
||||||
|
%%% Local Variables:
|
||||||
|
%%% mode: latex
|
||||||
|
%%% TeX-master: "main"
|
||||||
|
%%% End:
|
||||||
|
|
Loading…
Reference in a new issue