Incorporated all of Oliver's 113020 suggestions.

This commit is contained in:
Aaron Huber 2020-12-03 10:32:09 -05:00
parent c204c9fc61
commit c20aec43fa
4 changed files with 23 additions and 17 deletions

View file

@ -42,11 +42,6 @@ The degree of polynomial $\poly(\vct{X})$ is the maximum sum of the exponents of
The degree of $\poly(\vct{X})$ in the above example is $2$. In this paper we consider only finite degree polynomials.
\AH{We need to verify that this definition is consistent with the rest of the paper. Also, it might be useful to specify coefficients are 1?}
\begin{Definition}[Monomial]\label{def:monomial}
A monomial is a product of a fixed set of variables, each raised to a non-negative integer power.
\end{Definition}
For example, the expression $xy$ is a monomial from the term $3xy$ of $\poly(\vct{X})$, produced from the set of variables $\vct{X} = \{x, y\}$.
%\begin{Definition}[$|\vct{X}|$]\label{def:num-vars}

View file

@ -4,6 +4,7 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%RA-to-Poly Notation
\newcommand{\polyinput}[2]{\left(#1,\ldots, #2\right)}
\newcommand{\numvar}{n}

View file

@ -6,6 +6,19 @@
Before proceeding, note that the following is assuming $\ti$s in the setting of \textit{bag} semantics.
Throughout the note, we also make the following \textit{assumption}.
\begin{Definition}[Monomial]\label{def:monomial}
A monomial is a product of a fixed set of variables, each raised to a non-negative integer power.
\end{Definition}
For the term $2xy$, by ~\cref{def:monomial} the monomial is $xy$.
\begin{Definition}[Standard Monomial Basis]
A polynomial is in standard monomial basis when it is fully expanded out such that no product of sums exist and where each unique monomial appears exactly once.
\end{Definition}
For example, consider the expression $(x + y)^2$. The standard monomial basis for this expression is $x^2 +2xy + y^2$. While $x^2 + xy + xy + y^2$ is an expanded form of the expression, it is not the standard monomial basis since $xy$ appears more than once.
\begin{Assumption}
All polynomials considered are in standard monomial basis, i.e., $\poly(\vct{X}) = \sum\limits_{\vct{d} \in \mathbb{N}^\numvar}q_d \cdot \prod\limits_{i = 1, d_i \geq 1}^{\numvar}X_i^{d_i}$, where $q_d$ is the coefficient for the monomial encoded in $\vct{d}$ and $d_i$ is the $i^{th}$ element of $\vct{d}$.
\end{Assumption}
@ -131,7 +144,7 @@ For any graph $G$, the following formulas compute $\numocc{G}{H}$ for their resp
&\numocc{G}{\twopathdis} + 3\numocc{G}{\threedis} = \sum_{(i, j) \in E} \binom{\numedge - d_i - d_j + 1}{2}\label{eq:2pd-3d}
\end{align}
A quick argument to why \cref{eq:2m} is true. Note that for edge $(i, j)$ connecting arbitrary vertices $i$ and $j$, finding all other edges in $G$ disjoin to $(i, j)$ is equivalent to finding all edges that are not connected to either vertex $i$ or $j$. The number of such edges is $m - d_i - d_j + 1$, where we add $1$ since edge $(i, j)$ is removed twice when subtracting both $d_i$ and $d_j$. Since the summation is iterating over all edges, division by $2$ eliminates the double counting.
A quick argument to why \cref{eq:2m} is true. Note that for edge $(i, j)$ connecting arbitrary vertices $i$ and $j$, finding all other edges in $G$ disjoint to $(i, j)$ is equivalent to finding all edges that are not connected to either vertex $i$ or $j$. The number of such edges is $m - d_i - d_j + 1$, where we add $1$ since edge $(i, j)$ is removed twice when subtracting both $d_i$ and $d_j$. Since the summation is iterating over all edges such that a pair $\left((i, j), (k, \ell)\right)$ will also be counted as $\left((k, \ell), (i, j)\right)$, division by $2$ then eliminates this double counting.
\AH{The formula ~\cref{eq:2pd-3d} has been fixed to reflect the triple counting of 3-matchings. Notice the factor of 3 on the right term (3-matchings) in the LHS. 110220}
Equation ~\ref{eq:2pd-3d} is true for similar reasons. For edge $(i, j)$, it is necessary to find two additional edges, disjoint or connected. As in ~\cref{eq:2m}, once the number of edges disjoint to $(i, j)$ have been computed, then we only need to consider all possible combinations of two edges from the set of disjoint edges, since it doesn't matter if the two edges are connected or not. Note, the factor $3$ of $\threedis$ is necessary to account for the triple counting of $3$-matchings. It is also the case that, since the two path in $\twopathdis$ is connected, that there will be no double counting by the fact that the summation automatically 'disconnects' the current edge, meaning that a two matching at the current edge will not be counted. The sum over all such edge combinations is precisely then $\numocc{G}{\twopathdis} + 3\numocc{G}{\threedis}$.

View file

@ -12,10 +12,10 @@
An incomplete database $\idb$ is a set of deterministic databases $\db$ where each element is known as a possible world. %Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db \in \idb$ has the same named set of relations, $\{\rel_1,\ldots, \rel_n\}$ (albeit not equivalent across all instances), whose schemas $(\sch(\rel_i))$are unchanging across each $\db_j$.
Denote the schema of $\db$ as $\sch(\db)$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is a probability distribution over $\wSet$.
Denote the schema of $\db$ as $\sch(\db)$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two-tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is a probability distribution over $\wSet$.
The possible worlds semantics gives a framework for how to think about running queries over $\idb$. Given a query $\query$, $\query$ is deterministically run over each $\db \in \idb$, and the output of $\query(\idb)$ is defined as the set of results (worlds) from running $\query$ over each $\db_i \in \idb$. We write this formally as,
\[\query(\idb) = \{\query(\db) | \db \in \idb\}\]
\[\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}.\]
@ -24,16 +24,13 @@ Define $\vct{X}$ to be the variables $X_1,\dots,X_M$. We emphasize that formal
\subsubsection{K-relations}\label{subsubsec:k-rel}
A K-relation~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are each annotated with an expression whose values come from its respective commutative K-semiring, denoted $\{K, \oplus, \otimes, \mathbbold{0}, \mathbbold{1}\}$. A commutative $K$-semiring has associative and commutative operators $\oplus$ and $\otimes$, with $\otimes$ distributing over $\oplus$, $\mathbbold{0}$ the identity of $\oplus$, $\mathbbold{1}$ likewise of $\otimes$, and element $\mathbbold{0}$ anihilates all elements of $K$ when being combined with $\otimes$. The information encoded in the annotation depends on the underlying semiring of the relation.
A K-relation~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are each annotated with an expression whose values come from a commutative K-semiring, denoted $\{K, \oplus, \otimes, \mathbbold{0}, \mathbbold{1}\}$. A commutative $K$-semiring has associative and commutative operators $\oplus$ and $\otimes$, with $\otimes$ distributing over $\oplus$, $\mathbbold{0}$ the identity of $\oplus$, $\mathbbold{1}$ likewise of $\otimes$, and element $\mathbbold{0}$ anihilates all elements of $K$ when being combined with $\otimes$. The information encoded in the annotation depends on the underlying semiring of the relation.
As noted in \cite{DBLP:conf/pods/GreenKT07}, the $\mathbb{N}[\vct{X}]$-semiring is a semiring over the set $\mathbb{N}[\vct{X}]$ of all polynomials, whose variables can then be substituted with $K$-values from other semirings, evaluating the operators with the operators of the substituted semiring, to produce varying semantics such as set, bag, and security annotations.
When used with $\mathbb B$-typed variables, an $\mathbb{N}[\vct{X}]$ relation is effectively a C-Table \cite{DBLP:conf/pods/GreenKT07}, since all first order formulas can be equivalently modeled by polynomials, where disjunction is equivalent to the addition operator and conjunction is equivalent to the multiplication operator.
Using $\mathbb B$-typed variables in an $\mathbb{N}[\vct{X}]$ relation would correspond to substituting values and operators from the $\{\mathbb{B}, \vee, \wedge, \bot, \top\}$ semiring.
Further define $\nxdb$ as an $\mathbb{N}[\vct{X}]$ database where each tuple $\tup \in \db$ is annotated with a polynomial over variables $X_1,\ldots, X_M$ for some value of $M$ that will be specified later.
Further define $\nxdb$ as an $\mathbb{N}[\vct{X}]$ database where each tuple $\tup \in \db$ is annotated with a polynomial over variables $X_1,\ldots, X_M$.
Since $\nxdb$ is a database that maps tuples to polynomials, it is customary for arbitrary table $\rel$ to be viewed as a function $\rel: \tset \mapsto \mathbb{N}[\vct{X}]$, where $\rel(\tup)$ denotes the polynomial annotating tuple $\tup$.
It has been shown in previous work that commutative semirings precisely model translations of RA+ query operations to $k$-annotations.
It has been shown in previous work that commutative semirings precisely model translations of RA+ query operations to $K$-annotations.
The evalution semantics notation $\llbracket \cdot \rrbracket = x$ simply mean that the result of evaluating expression $\cdot$ is given by following the semantics $x$. Given a query $\query$, operations in $\query$ are translated into the following polynomial expressions.
\begin{align*}
@ -47,11 +44,11 @@ The evalution semantics notation $\llbracket \cdot \rrbracket = x$ simply mean t
&\eval{R}(\tup) && = &&\rel(\tup)
\end{align*}
The above semantics show us how to obtain the $k$-annotation on a tuple in the result of query $\query$ from the annotations on the tuples in the input of $\query$.
The above semantics show us how to obtain the $K$-annotation on a tuple in the result of query $\query$ from the annotations on the tuples in the input of $\query$. When used with $\mathbb B$-typed variables, an $\mathbb{N}[\vct{X}]$ relation is effectively a C-Table \cite{DBLP:conf/pods/GreenKT07}, since all first order formulas can be equivalently modeled by polynomials, where $\oplus$ is disjunction and $\otimes$ is conjunction.
Using $\mathbb B$-typed variables in an $\mathbb{N}[\vct{X}]$ relation would correspond to substituting values and operators from the $\{\mathbb{B}, \vee, \wedge, \bot, \top\}$ semiring. In like manner, when using variables from the $\mathbb{N}$ domain, the annotations then effectively model bag semantics, where the variables and $\oplus$ and $\otimes$ operations come from the natural numbers semiring $\{\mathbb{N}, +, \times, 0, 1\}$.
\subsection{Defining the Data}\label{subsec:def-data}
For the set of possible worlds, $\wSet$, i.e. the set of all $\db_i \in \idb$, define an injective mapping to the set $\{0, 1\}^M$, where for each vector $\vct{w} \in \{0, 1\}^M$ there is at most one element $\db_i \in \idb$ mapped to $\vct{w}$.
In the general case, the binary value of $\vct{w}$ uniquely identifies a potential possible world. For example, consider the case of the Tuple Independent Database $(\ti)$ data model in which each table is a set of tuples, each of which is independent of one another, and individually occur with a specific probability $\prob_\tup$. Because of independence, a $\ti$ with $\numTup$ tuples naturally has $2^\numTup$ possible worlds, thus $\numTup = M$, and the injective mapping for each $\vct{w} \in \{0, 1\}^M$ is trivial. In the Block Independent Disjoint data model ($\bi$), because of the disjoint condition on tuples within the same block, a $\bi$ may not have exactly $2^M$ possible worlds. Excess $\vct{w}$'s are assigned a probability of $0$.
In the general case, the binary value of $\vct{w}$ uniquely identifies a potential possible world. For example, consider the case of the Tuple Independent Database $(\ti)$ data model in which each table is a set of tuples, each of which is independent of one another, and individually occur with a specific probability $\prob_\tup$. Because of independence, a $\ti$ with $\numTup$ tuples naturally has $2^\numTup$ possible worlds, thus $\numTup = M$, and the injective mapping for each $\vct{w} \in \{0, 1\}^M$ is trivial. In the Block Independent Disjoint data model ($\bi$), because of the disjoint condition on tuples within the same block, a $\bi$ may not have exactly $2^M$ possible worlds since there are combinations of tuples that cannot exist in the encoding.
Denote a random variable selecting a world according to distribution $P$ to be $\rw$. Provided that for any non-possible world $\vct{w} \in \{0, 1\}^M, \pd[\rw = \vct{w}] = 0$, a probability distribution over $\{0, 1\}^M$ is a distribution over $\Omega$, which we have already defined as $\pd$.