Minor typo fixes

This commit is contained in:
Oliver Kennedy 2020-12-15 17:46:35 -05:00
parent 56b47078a0
commit 9e101082c7
Signed by: okennedy
GPG key ID: 3E5F9B3ABD3FDB60
2 changed files with 6 additions and 8 deletions

View file

@ -3,8 +3,8 @@
%\onecolumn
\subsection{Reduced Polynomials and Equivalences}
Since we have shown that computing the expected multiplicity of a query result tuple is equivalent to computing the expectation of a polynomial (for that tuple) given a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$, we from now on focus on this problem exclusively.
We now introduce some basic terminology for polynomials and then develop a reduced normal form for polynomials that preserves a polynomial expectation for probability distributions that stems from \bis or \tis.
Since we have shown that computing the expected multiplicity of a query result tuple is equivalent to computing the expectation of a polynomial (for that tuple) given a probability distribution over all possible assignments of variables in the polynomial to $\{0,1\}$, we focus on this problem exclusively from now on.
We now introduce some basic terminology for polynomials and then develop a reduced normal form for polynomials that preserves a polynomial expectation for probability distributions that stem from \bis or \tis.
Let us use the expression $(x + y)^2$ as a running example in this section.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -101,7 +101,7 @@ Consider $\poly(x, y) = (x + y)(x + y)$ where $x$ and $y$ are from different blo
%When considering $\bi$ input, it becomes necessary to redefine $\rpoly(\vct{X})$.
The usefulness of this reduction become clear in \Cref{lem:exp-poly-rpoly}.
The usefulness of this will reduction become clear in \Cref{lem:exp-poly-rpoly}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:pre-poly-rpoly}

View file

@ -8,11 +8,9 @@
An \textit{incomplete database} $\idb$ is a set of deterministic databases $\db$ called possible worlds.
Denote the schema of $\db$ as $\sch(\db)$. A \textit{probabilistic database} $\pdb$ is a pair $(\idb, \pd)$ where $\idb$ is an incomplete database and $\pd$ is a probability distribution over $\idb$. Queries over probabilistic databases are evaluated using the so-called possible world semantics. Under possible world semantics, the result of a query $\query$ over an incomplete database $\idb$ is the set of query answers produced by evaluating $\query$ over each possible world:
\[\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}\]
For a probabilistic database $\pdb = (\idb, \pd)$, the result of a query is the pair $(\query(\idb), \pd')$ where $\pd'$ is a probability distribution over $\query(\idb)$ that assigns to each possible query result the sum of the probabilities of the worlds that produce this answer:
\[\forall \db \in \query(\idb): \pd'(\db) = \sum_{\db' \in \idb: \query(\db') = \db} \pd(\db') \]
Note that in this work we consider multisets, i.e., each possible world is a set of multiset relations and queries are evaluated using bag semantics. We will use K-relations to model multisets. A \emph{K-relation}~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are annotated with elements from a commutative semiring $\semK = (\domK, \addK, \multK, \zeroK, \oneK)$. A commutative semiring is a structure with a domain $\domK$ and associative and commutative binary operations $\addK$ and $\multK$ such that $\multK$ distributes over $\addK$, $\zeroK$ is the identity of $\addK$, $\oneK$ is the identity of $\multK$, and $\zeroK$ annihilates all elements of $\domK$ when being combined with $\multK$.
@ -21,7 +19,7 @@ Formally, an n-ary $\semK$-relation over $\udom$ is a function $\rel: \udom^n \t
A $\semK$-database is a set of $\semK$-relations. It will be convenient to also interpret a $\semK$-database as a function from tuples to annotations. Thus, $\rel(t)$ ($\db(t)$) denotes the annotation associated by $\semK$-relation $\rel$ ($\semK$-database $\db$) to tuple $t$.
We review the semantics of positive relational algebra queries over $\semK$-relations below.
Consider the semiring $\semN = (\domN,+,\times,0,1)$ of natural number. $\semN$-databases are used to model bag semantics by annotating each tuple with its multiplicity. A probabilistic $\semN$-databases ($\semN$-PDB) is a PDB where each possible world is a $\semN$-database. We will study the problem of evaluating statical moments of query results over such databases. Specifically, given a probabilistic $\semN$-database $\pdb = (\idb, \pd)$, query $\query$, and possible result tuple $t$, we treat $\query(\db)(t)$ as a random $\semN$-valued variable and are interested in computing its expectation $\expct_{\idb \sim \pd}[\query(\db)(t)]$:
Consider the semiring $\semN = (\domN,+,\times,0,1)$ of natural numbers. $\semN$-databases are used to model bag semantics by annotating each tuple with its multiplicity. A probabilistic $\semN$-database ($\semN$-PDB) is a PDB where each possible world is an $\semN$-database. We will study the problem of evaluating statical moments of query results over such databases. Specifically, given a probabilistic $\semN$-database $\pdb = (\idb, \pd)$, query $\query$, and possible result tuple $t$, we treat $\query(\db)(t)$ as a random $\semN$-valued variable and are interested in computing its expectation $\expct_{\idb \sim \pd}[\query(\db)(t)]$:
\begin{align}\label{eq:bag-expectation}
\expct_{\idb \sim \pd}[\query(\db)(t)] = \sum_{\db \in \idb} \query(\db)(t) \cdot \pd(\db)
@ -50,7 +48,7 @@ We use $\evald{\cdot}{\db}$ to denote the result of evaluating query $\query$ ov
\subsection{$\semNX$ as a Representation System}\label{sec:semnx-as-repr}
Let $\semNX$ denote the set of polynomials over variables $\vct{X}$ with natural number co-efficients and exponents.
Consider now the semiring $(\semNX, +, \cdot, 0, 1)$ whose domain is $\semNX$ and addition and multiplication are standard addition and multiplication of polynomials. We will utilize $\semNX$-databases $\db$ paired with probability distributions to represent $\semN$-PDBs.\BG{Need more motivation?} To justify the use of $\semNX$-databases, we need to show that we can encode any $\semN$-PDBs in this way and that the query semantics over this representation coincides with query semantics over $\semN$-PDB. For that it will be opportune to define representation systems for $\semN$-PDBs.\BG{cite}
Consider now the semiring $(\semNX, +, \cdot, 0, 1)$ whose domain is $\semNX$ and addition and multiplication are standard addition and multiplication of polynomials. We will utilize $\semNX$-databases $\db$ paired with probability distributions to represent $\semN$-PDBs.\BG{Need more motivation?} To justify the use of $\semNX$-databases, we need to show that we can encode any $\semN$-PDB in this way and that the query semantics over this representation coincides with query semantics over $\semN$-PDB. For that it will be opportune to define representation systems for $\semN$-PDBs.\BG{cite}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Representation System]\label{def:representation-syste}
@ -122,7 +120,7 @@ Since $\semNX$-PDBs $\pxdb$ are a complete representation system closed under $\
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Two important subclasses of $\semNX$-PDBs that are of interested to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\vct{p}(\tup)$. The set of possible worlds represented by a \ti $\db$ are all subsets of $\db$. The probability of such a world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. The probability of such a world is the product of the probabilities of all tuples present in the world and one minus the sum of the probabilities of all tuples from blocks for which no tuple is present in the world. For bag \tis and \bis, we define the probability of a tuple to be the probability that the tuple exists with multiplicity $1$.
Two important subclasses of $\semNX$-PDBs that are of interested to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\vct{p}(\tup)$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. The probability of such a world is the product of the probabilities of all tuples present in the world and one minus the sum of the probabilities of all tuples from blocks for which no tuple is present in the world. For bag \tis and \bis, we define the probability of a tuple to be the probability that the tuple exists with multiplicity at least $1$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[\tis and \bis]\label{def:tidbs-and-bidbs}