a few fixes

master
Oliver Kennedy 2021-09-17 22:17:30 -04:00
parent 317a05ec3f
commit c379f14af6
Signed by: okennedy
GPG Key ID: 3E5F9B3ABD3FDB60
5 changed files with 57 additions and 42 deletions

View File

@ -19,14 +19,14 @@ To justify the use of $\semNX$-databases, we need to show that we can encode any
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
As mentioned above we will use $\semNX$-databases paired with a probability distribution as a representation system, referring to such databases as $\semNX$-PDBs.
Formally, an $\semNX$-PDB is an $\semNX$-database $\idb_{\semNX}$ and a probability distribution $\pd$ over assignments $\assign$ of the variables $\vct{X} = \{X_1, \ldots, X_\numvar\}$ occurring in annotations of $\idb_{\semNX}$ to $\{0,1\}$.
As mentioned above we will use $\semNX$-databases paired with a probability distribution as a representation system, referring to such databases as \abbrNXPDB\xplural.
Formally, an \abbrNXPDB is an $\semNX$-database $\idb_{\semNX}$ and a probability distribution $\pd$ over assignments $\assign$ of the variables $\vct{X} = \{X_1, \ldots, X_\numvar\}$ occurring in annotations of $\idb_{\semNX}$ to $\{0,1\}$.
\AH{There was a big ICDT reviewer complaint in this section, but I don't know that I think it confuses things to think of them both an assignment and/or a vector of variables.}
Note that an assignment $\assign: \vct{X} \to \{0,1\}^\numvar$ can be represented as a vector $\vct{w} \in \{0,1\}^n$ where $\vct{w}[i]$ records the value assigned to variable $X_i$. Thus, from now on we will solely use such vectors which we refer to as \emph{world vectors} and implicitly understand them to represent assignments. Given an assignment $\assign$ we use $\assign(\pxdb)$ to denote the semiring homomorphism $\semNX \to \semN$ that applies the assignment $\assign$ to all variables of a polynomial and evaluates the resulting expression in $\semN$.\BG{explain connection to homomorphism lifting in K-relations}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[$\semNX$-PDBs]\label{def:semnx-pdbs}
An $\semNX$-PDB $\pxdb$ over variables $\vct{X} = \{X_1, \ldots, X_n\}$ is a tuple $(\idb_{\semNX},\pd)$ where $\db$ is an $\semNX$-database and $\pd$ is a probability distribution over $\vct{w} \in \{0,1\}^n$. We use $\assign_{\vct{w}}$ to denote the assignment corresponding to $\vct{w} \in \{0,1\}^n$. The $\semN$-PDB $\rmod(\pxdb) = (\idb, \pd')$ encoded by $\pxdb$ is defined as:
\begin{Definition}[\abbrNXPDB\xplural]\label{def:semnx-pdbs}
An \abbrNXPDB$\pxdb$ over variables $\vct{X} = \{X_1, \ldots, X_n\}$ is a tuple $(\idb_{\semNX},\pd)$ where $\db$ is an $\semNX$-database and $\pd$ is a probability distribution over $\vct{w} \in \{0,1\}^n$. We use $\assign_{\vct{w}}$ to denote the assignment corresponding to $\vct{w} \in \{0,1\}^n$. The $\semN$-PDB $\rmod(\pxdb) = (\idb, \pd')$ encoded by $\pxdb$ is defined as:
\begin{align*}
\idb & = \{ \assign_{\vct{w}}(\pxdb) \mid \vct{w} \in \{0,1\}^n \} \\
\forall \db \in \idb: \probOf(\db) & = \sum_{\vct{w} \in \{0,1\}^n: \assign_{\vct{w}}(\pxdb) = \db} \probOf(\vct{w})
@ -34,24 +34,24 @@ Note that an assignment $\assign: \vct{X} \to \{0,1\}^\numvar$ can be represente
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
For instance, consider a $\pxdb$ consisting of a single tuple $\tup_1 = (1)$ annotated with $X_1 + X_2$ with probability distribution $\probOf([0,0]) = 0$, $\probOf([0,1]) = 0$, $\probOf([1,0]) = 0.3$ and $\probOf([1,1]) = 0.7$. This $\semNX$-PDB encodes two possible worlds (with non-zero probability) that we denote using their world vectors.
For instance, consider a $\pxdb$ consisting of a single tuple $\tup_1 = (1)$ annotated with $X_1 + X_2$ with probability distribution $\probOf([0,0]) = 0$, $\probOf([0,1]) = 0$, $\probOf([1,0]) = 0.3$ and $\probOf([1,1]) = 0.7$. This \abbrNXPDB encodes two possible worlds (with non-zero probability) that we denote using their world vectors.
%
\[
D_{[0,1]}(\tup_1) = 1 \hspace{0.3cm} \mathbf{and} \hspace{0.3cm} D_{[1,1]}(\tup_1) = 2
\]
%
\AH{I get the notation above, but we never formally introduced it.}
Importantly, as the following proposition shows, any finite $\semN$-PDB can be encoded as an $\semNX$-PDB and $\semNX$-PDBs are closed under positive relational algebra queries, the class of queries we are interested in in this work. \AH{We need to stick with one formalism.}
\AH{Is it a known result that $\semNX$-\abbrPDB\xplural are closed under $\raPlus$ queries?}
Importantly, as the following proposition shows, any finite $\semN$-PDB can be encoded as an \abbrNXPDB and \abbrNXPDB\xplural are closed under positive relational algebra queries, the class of queries we are interested in in this work. \AH{We need to stick with one formalism.}
\AH{Is it a known result that \abbrNXPDB\xplural are closed under $\raPlus$ queries?}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Proposition}\label{prop:semnx-pdbs-are-a-}
$\semNX$-PDBs are a complete representation system for $\semN$-PDBs that is closed under $\raPlus$ queries.
\abbrNXPDB\xplural are a complete representation system for $\semN$-PDBs that is closed under $\raPlus$ queries.
\end{Proposition}
%\subsection{Proof of~\Cref{prop:semnx-pdbs-are-a-}}
\begin{proof}
To prove that $\semNX$-PDBs are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an $\semNX$-PDB $\pxdb = (\idb_{\semNX}, \pd')$ such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}$ and let $max(D_i)$
To prove that \abbrNXPDB\xplural are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an \abbrNXPDB $\pxdb = (\idb_{\semNX}, \pd')$ such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}$ and let $max(D_i)$
\AH{What are we using $max(D_i)$ for?}
denote $max_{\tup} D_i(\tup)$. For each world $D_i$ we create a corresponding variable $X_i$.
%variables $X_{i1}$, \ldots, $X_{im}$ where $m = max(D_i)$.
@ -61,7 +61,8 @@ In $\idb_{\semNX}$ we assign each tuple $\tup$ the polynomial:
\idb_{\semNX}(\tup) = \sum_{i=1}^{\abs{\idb}} D_i(\tup)\cdot X_{i}
\]
The probability distribution $\pd'$ assigns all world vectors zero probability except for $\abs{\idb}$ world vectors (representing the possible worlds) $\vct{w}_i$. All elements of $\vct{w}_i$ are zero except for the position corresponding to variables $X_{i}$ which is set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, $\semNX$ are a complete representation system.
The closure under $\raPlus$ queries follows from the fact that an assignment $\vct{X} \to \{0,1\}$ is a semiring homomorphism and that semiring homomorphisms commute with queries over $\semK$-relations.
It is trivial to show that an assignment $\vct{X} \to \{0,1\}$ is a semiring homomorphism.
Closure under $\raPlus$ queries follows from this, and from \cite{DBLP:conf/pods/GreenKT07}'s Proposition 3.5, which states that semiring homomorphisms commute with queries over $\semK$-relations.
@ -70,24 +71,27 @@ The closure under $\raPlus$ queries follows from the fact that an assignment $\v
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Now let us consider computing the expected multiplicity of a tuple $\tup$ in the result of a query $\query$ over an $\semN$-PDB $\pdb$ using the annotation of $\tup$ in the result of evaluating $\query$ over an $\semNX$-PDB $\pxdb$ for which $\rmod(\pxdb) = \pdb$. The expectation of the polynomial $\poly = \query(\pxdb)(\tup)$ based on the probability distribution of $\pxdb$ over the variables in $\pxdb$ is:
\AH{The wording ``...over the variables...'' I {\emph think} can be misleading since we also discuss a probability distribution $\pd$ being induced by a vector of probability assignments $\vct{p}$ to each variable $\pVar_i$.}
\OK{Removed text here. c.f., Reviewer 1: "Proof of Proposition A.3. I seems the proof should end after l.687, since you already
proved everything from the statement of the proposition. I dont understand what it is
that you do after this line."}
% Now let us consider computing the expected multiplicity of a tuple $\tup$ in the result of a query $\query$ over an $\semN$-PDB $\pdb$ using the annotation of $\tup$ in the result of evaluating $\query$ over an \abbrNXPDB $\pxdb$ for which $\rmod(\pxdb) = \pdb$. The expectation of the polynomial $\poly = \query(\pxdb)(\tup)$ based on the probability distribution of $\pxdb$ over the variables in $\pxdb$ is:
% \AH{The wording ``...over the variables...'' I {\emph think} can be misleading since we also discuss a probability distribution $\pd$ being induced by a vector of probability assignments $\vct{p}$ to each variable $\pVar_i$.}
\begin{equation}
\expct_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})} = \sum_{\vct{w} \in \{0,1\}^n} \assign_{\vct{w}}(\query(\pxdb)(\tup)) \cdot \probOf(\vct{w})\label{eq:expect-q-nx}
\end{equation}
% \begin{equation}
% \expct_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})} = \sum_{\vct{w} \in \{0,1\}^n} \assign_{\vct{w}}(\query(\pxdb)(\tup)) \cdot \probOf(\vct{w})\label{eq:expect-q-nx}
% \end{equation}
Since $\semNX$-PDBs $\pxdb$ are a complete representation system for $\semN$-PDBs which are closed under $\raPlus$, computing the expectation of the multiplicity of a tuple $t$ in the result of an $\raPlus$ query over the $\semN$-PDB $\rmod(\pxdb)$, is the same as computing the expectation of the polynomial $\query(\pxdb)(t)$.
% Since \abbrNXPDB\xplural $\pxdb$ are a complete representation system for $\semN$-PDBs which are closed under $\raPlus$, computing the expectation of the multiplicity of a tuple $t$ in the result of an $\raPlus$ query over the $\semN$-PDB $\rmod(\pxdb)$, is the same as computing the expectation of the polynomial $\query(\pxdb)(t)$.
\qed
\end{proof}
\subsection{\tis and \bis in the $\semNX$-PDB model}\label{subsec:supp-mat-ti-bi-def}
Two important subclasses of $\semNX$-PDBs that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob_\tup$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is at most $1$. \AH{Reviewer complaint: This is not true by definition.}
\subsection{\tis and \bis in the \abbrNXPDB model}\label{subsec:supp-mat-ti-bi-def}
Two important subclasses of \abbrNXPDB\xplural that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob_\tup$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is at most $1$. \AH{Reviewer complaint: This is not true by definition.}
The probability of such a world is the product of the probabilities of all tuples present in the world. %and one minus the sum of the probabilities of all tuples from blocks for which no tuple is present in the world.
For bag \tis and \bis, we define the probability of a tuple to be the probability that the tuple exists with multiplicity at least $1$.
As already noted above, in this work, we define \tis and \bis as subclasses of $\semNX$-PDBs.
As already noted above, in this work, we define \tis and \bis as subclasses of \abbrNXPDB\xplural.
In this work, we consider one further deviation from the standard: We use bag semantics for queries.
Even though tuples cannot occur more than once in the input \ti or \bi, they can occur with a multiplicity larger than one in the result of a query.
Since in \tis and \bis, there is a one-to-one correspondence between tuples in the database and variables, we can interpret a vector $\vct{w} \in \{0,1\}^n$ as denoting which tuples exist in the possible world $\assign_{\vct{w}}(\pxdb)$ (the ones where $\vct{w}[j] = 1$).
@ -112,9 +116,9 @@ $\semNX$-relations are closed under $\raPlus$ (\Cref{fig:nxDBSemantics}).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We will use $\semNX$-\abbrPDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$ over the assignments to $\vct{X}$.
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit $\semNX$-\abbrPDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{\numvar} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
$\semNX$-\abbrPDB\xplural and a function $\rmod$ (which transforms an $\semNX$-\abbrPDB to a classical bag-\abbrPDB, or $\semN$-\abbrPDB~\cite{DBLP:conf/pods/GreenKT07,feng:2019:sigmod:uncertainty}) are both formalized in \Cref{subsec:supp-mat-background}.
We will use \abbrNXPDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$ over the assignments to $\vct{X}$.
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit \abbrNXPDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{\numvar} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
\abbrNXPDB\xplural and a function $\rmod$ (which transforms an \abbrNXPDB to a classical bag-\abbrPDB, or $\semN$-\abbrPDB~\cite{DBLP:conf/pods/GreenKT07,feng:2019:sigmod:uncertainty}) are both formalized in \Cref{subsec:supp-mat-background}.
\BG{Oliver's conjecture: Bag-\tis + Q can express any finite bag-PDB:
A well-known result for set semantics PDBs is that while not all finite PDBs can be encoded as \tis, any finite PDB can be encoded using a \ti and a query. An analog result holds in our case: any finite $\semN$-PDB can be encoded as a bag \ti and a query (WHAT CLASS? ADD PROOF)
@ -124,7 +128,7 @@ A well-known result for set semantics PDBs is that while not all finite PDBs can
\subsection{Proof of~\Cref{prop:expection-of-polynom}}
\label{subsec:expectation-of-polynom-proof}
\begin{proof}
We need to prove for $\semN$-PDB $\pdb = (\idb,\pd)$ and $\semNX$-PDB $\pxdb = (\db',\pd')$ where $\rmod(\pxdb) = \pdb$ that $\expct_{\randDB\sim \pd}[\query(\db)(t)] = \expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}$
We need to prove for $\semN$-PDB $\pdb = (\idb,\pd)$ and \abbrNXPDB $\pxdb = (\db',\pd')$ where $\rmod(\pxdb) = \pdb$ that $\expct_{\randDB\sim \pd}[\query(\db)(t)] = \expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}$
By expanding $\polyForTuple$ and the expectation we have:
\begin{align*}
\expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}
@ -155,19 +159,12 @@ Follows by the construction of $\rpoly$ in \cref{def:reduced-bi-poly}.
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Valid Worlds]
For probability distribution $\pd$, % and its corresponding probability mass function $\probOf$,
the set of valid worlds $\valworlds$ consists of all the worlds with probability value greater than $0$; i.e., for random world variable vector $\vct{W}$
\[
\valworlds = \comprehension{\vct{w}}{\probOf[\vct{W} = \vct{w}] > 0}
\]
\end{Definition}
\subsection{Proposition~\ref{proposition:q-qtilde}}\label{app:subsec-prop-q-qtilde}
\noindent Note the following fact:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Proposition}\label{proposition:q-qtilde} For any \bi-lineage polynomial $\poly(X_1, \ldots, X_\numvar)$ and all $\vct{w} \in \eta$, it holds that
\begin{Proposition}\label{proposition:q-qtilde} For any \bi-lineage polynomial $\poly(X_1, \ldots, X_\numvar)$ and all $\vct{w}$ such that $\probOf[\vct{W} = \vct{w}] > 0$, it holds that
$% \[
\poly(\vct{w}) = \rpoly(\vct{w}).
$% \]
@ -175,6 +172,10 @@ $% \]
\begin{proof}%[Proof for~\Cref{proposition:q-qtilde}]
Note that any $\poly$ in factorized form is equivalent to its \abbrSMB expansion. For each term in the expanded form, further note that for all $b \in \{0, 1\}$ and all $e \geq 1$, $b^e = b$.
Finally, note that there are exactly three cases where the expectation of a monomial term $\expct\left[c_{\vct{d}}\prod_{i = n\; s.t.\; \vct{d}_i \geq 1}X_i\right]$ is zero:
(i) when $c_{\vct{d}} = 0$,
(ii) when $p_i = 0$ for some $i$ where $\vct{d}_i \geq 1$, and
(iii) when $X_i$ and $X_j$ are in the same block for some $i,j$ where $\vct{d}_i, \vct{d}_j \geq 1$.
\qed
\end{proof}

View File

@ -3,7 +3,7 @@
\section{Missing details from Section~\ref{sec:background}}\label{sec:proofs-background}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{$\semK$-relations and $\semNX$-PDBs}\label{subsec:supp-mat-background}\label{subsec:supp-mat-krelations}
\subsection{$\semK$-relations and \abbrNXPDB\xplural}\label{subsec:supp-mat-background}\label{subsec:supp-mat-krelations}
\input{app_notation-background}
@ -27,13 +27,13 @@
\label{sec:circuits-formal}
We now formalize circuits and the construction of circuits for SPJU queries.
As mentioned earlier, we represent lineage polynomials as arithmetic circuits over $\mathbb N$-valued variables with $+$, $\times$.
A circuit for query $Q$ and $\semNX$-PDB $\pxdb$ is a directed acyclic graph $\tuple{V_{Q,\pxdb}, E_{Q,\pxdb}, \phi_{Q,\pxdb}, \ell_{Q,\pxdb}}$ with vertices $V_{Q,\pxdb}$ and directed edges $E_{Q,\pxdb} \subset {V_{Q,\pxdb}}^2$.
A circuit for query $Q$ and \abbrNXPDB $\pxdb$ is a directed acyclic graph $\tuple{V_{Q,\pxdb}, E_{Q,\pxdb}, \phi_{Q,\pxdb}, \ell_{Q,\pxdb}}$ with vertices $V_{Q,\pxdb}$ and directed edges $E_{Q,\pxdb} \subset {V_{Q,\pxdb}}^2$.
The sink function $\phi_{Q,\pxdb} : \udom^n \rightarrow V_{Q,\pxdb}$ is a partial function that maps the tuples of the $n$-ary relation $Q(\pxdb)$ to vertices.
We require that $\phi_{Q,\pxdb}$'s range be limited to sink vertices (i.e., vertices with out-degree 0).
%We call a sink vertex not in the range of $\phi_R$ a \emph{dead sink}.
A function $\ell_{Q,\pxdb} : V_{Q,\pxdb} \rightarrow \{\;+,\times\;\}\cup \mathbb N \cup \vct X$ assigns a label to each node: Source nodes (i.e., vertices with in-degree 0) are labeled with constants or variables (i.e., $\mathbb N \cup \vct X$), while the remaining nodes are labeled with the symbol $+$ or $\times$.
We require that vertices have an in-degree of at most two.
%For the specifics on how to construct a circuit to encode the polynomials of all result tuples for a query and $\semNX$-PDB see \Cref{app:subsec-rep-poly-lin-circ}.
%For the specifics on how to construct a circuit to encode the polynomials of all result tuples for a query and \abbrNXPDB see \Cref{app:subsec-rep-poly-lin-circ}.
Note that we can construct circuits for \bis in time linear in the time required for deterministic query processing over a possible world of the \bi under the aforementioned assumption that $\abs{\pxdb} \leq c \cdot \abs{\db}$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -43,7 +43,7 @@ Note that we can construct circuits for \bis in time linear in the time required
\newcommand{\bagdbof}{\textsc{bag}(\pxdb)}
We now connect the size of a circuit (where the size of a circuit is the number of vertices in the corresponding DAG) %\footnote{since each node has indegree at most two, this also is the same up to constants to counting the number of edges in the DAG.})
for a given SPJU query $Q$ and $\semNX$-PDB $\pxdb$ to
for a given SPJU query $Q$ and \abbrNXPDB $\pxdb$ to
the runtime $\qruntime{Q,\dbbase}$ of the PDB's \dbbaseName $\dbbase$.
We do this formally by showing that the size of the circuit is asymptotically no worse than the corresponding runtime of a large class of deterministic query processing algorithms.
@ -122,7 +122,7 @@ There are $|{Q_1} \bowtie \ldots \bowtie {Q_k}|$ such vertices, so the corrected
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:circ-model-runtime}
\label{lem:circuits-model-runtime}
Given a $\semNX$-PDB $\pxdb$ with \dbbaseName $\dbbase$, and query plan $Q$, the runtime of $Q$ over $\dbbase$ has the same or greater complexity as the size of the lineage of $Q(\pxdb)$. That is, we have $\abs{V_{Q,\pxdb}} \leq (k-1)\qruntime{Q, \dbbase}$, where $k$ is the maximal degree of any polynomial in $Q(\pxdb)$.
Given a \abbrNXPDB $\pxdb$ with \dbbaseName $\dbbase$, and query plan $Q$, the runtime of $Q$ over $\dbbase$ has the same or greater complexity as the size of the lineage of $Q(\pxdb)$. That is, we have $\abs{V_{Q,\pxdb}} \leq (k-1)\qruntime{Q, \dbbase}$, where $k$ is the maximal degree of any polynomial in $Q(\pxdb)$.
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\noindent The proof is shown in \Cref{app:subsec-lem-lin-vs-qplan}.

View File

@ -10,9 +10,9 @@ Consider the semiring $\semN = (\domN,+,\times,0,1)$ of natural numbers. $\semN$
Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_n)$ with natural number coefficients and exponents.
Consider now the semiring $(\semNX, +, \cdot, 0, 1)$ whose domain is $\semNX$, with the standard addition and multiplication of polynomials.
We will use $\semNX$-PDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$.
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit $\semNX$-PDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{|\vct X|} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
$\semNX$-PDBs and a function $\rmod$ (which transforms an $\semNX$-PDB to an equivalent $\semN$-PDB) are both formalized next.
We define an \abbrNXPDB $\pxdb$ as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$.
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit \abbrNXPDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{|\vct X|} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
\abbrNXPDB\xplural and a function $\rmod$ (which transforms an \abbrNXPDB to an equivalent $\semN$-PDB) are both formalized next.

View File

@ -143,6 +143,7 @@
\newcommand{\tis}{TIDBs\xspace}
\newcommand{\bi}{BIDB\xspace}
\newcommand{\bis}{BIDBs\xspace}
\newcommand{\abbrNXPDB}{$\semNX$-encoded PDB\xspace}
%not sure if we use these; arguably the above abbrev macros should have a name change
\newcommand{\tiabb}{ti}

View File

@ -62,7 +62,9 @@ To make the paper more accessible and general, we found it better to not use $\s
We have done as the reviewer has suggested. All material in \Cref{sec:background} that is proof related is in the appendix, while \Cref{sec:background} is itself now self-contained.
\RCOMMENT{l.622 and l.632-634: so a N-PDB is a PDB where each possible world is an N-database, but an N[X]-PDB is not a PDB where each possible world is an N[X]-database... Confusing notation}
\AH{We need to make sure this is taken care of in the appendix.}
The text now refers to latter as an \abbrNXPDB\xplural.
\RCOMMENT{If you want to be in the setting of bag PDBs, why not consider that the value of the variables are integers rather that Boolean? I.e., consider valuations $\nu: X \rightarrow$ N (or even to R, why not?) instead of $X \rightarrow \{0,1\}$; this would seem more natural to me than having this ad-hoc "mix" of Boolean and non-Boolean setting. If you consider this then your "reduced polynomial" trick does not seem to work anymore.}
@ -93,7 +95,9 @@ We have reserved $\query$ to mean an $\raPlus$ query and nothing else.
The semantics for the polynomial as seen in \Cref{eq:sop-form} is specified indeed as the reviewer has pointed out.
\RCOMMENT{Proof of Proposition A.3. I seems the proof should end after l.687, since you already proved everything from the statement of the proposition. I don't understand what it is that you do after this line.}
\AH{This needs to be verified.}
This text is an informal proof of \Cref{prop:expection-of-polynom} originally intended to motivate \Cref{prop:semnx-pdbs-are-a-}.
We agree that this should not be part of the proof of the later, and have removed the text.
\RCOMMENT{l.686 "The closure of ... over K-relations": you should give more details on this part. It is not obvious to me that the relations from l.646 hold.}
\AH{This too needs to be looked at.}
@ -149,6 +153,7 @@ As alluded to previously, we have followed the reviewer's suggestion and have fo
We have added a reference. Please see \Cref{lem:val-ub}.
\RCOMMENT{Generally speaking, I think I don't understand much about Section 4, and the convolutedness of the appendix does not help to understand. I don't even see in which result you get a linear runtime and to which queries the linear runtime applies. Somewhere there should be a corollary that clearly states a linear time approximation algorithm for some queries.}
\AH{Needs to be addressed.}
\RCOMMENT{In section 5, it seems you are arguing that we can compute lineages as arithmetic circuits at the same time as we would be running an ordinary query evaluation plan. How is that different from using the relations in Figure 2 for computing the lineage?}
@ -219,7 +224,15 @@ Our revision has eliminated this statement.
\AH{Need to elaborate on this.}
\RCOMMENT{The coverage of related work is adequate. Fink et. al seems as the closest related work to me and I would appreciate a more elaborate comparison with this paper. My understanding is that Fink et. al considers exact evaluation only and focuses on knowledge compilation techniques based on decompositions. They also note that "Expected values can lead to unintuitive query answers, for instance when data values and their probabilities follow skewed and non-aligned distributions" attributed to [2]. Does this apply to the current work? Can you please comment on this?}
\AH{Need to comment on this if possible.}
The work is indeed quite close to our own.
It targets a broader class of queries (aggregates include COUNT/SUM/MIN/MAX, rather than our more narrow focus on COUNT), but has significantly less general theoretical results.
Most notably, their proof of linear runtime in the size of the input polynomial is based on a tree-based encoding of the polynomial.
Tree-based representation representation (and hence the Fink et. al. algorithm's runtime) is, as we note several times, superlinear in $\qruntime{\query, \dbbase}$.
This result is also limited to a specific class of (hierarchical) queries, devolving to exponential time (as in \cite{FH13}) in general.
By contrast, our results apply to all of $\raPlus$.
Our revised related work section now addresses both points.
\RCOMMENT{I assume the authors focus on parametrised complexity throughout the paper, and even this is not stated unambiguously. The authors should make an extra effort to make the paper more accessible by using the explanations and examples from the appendix in the body of the paper. It is also important to highlight the differences with the complexity of standard query evaluation over PDBs.}
Our revision has focused on explicitly mentioning the complexity metrics we are interested in. This can be seen in e.g. \Cref{sec:intro} and formal statement (theorems, lemmas, etc.), which have been rewritten to eliminate ambiguities.