Finished rebuttal document.

2021-09-17 11:44:04 -04:00 · 2021-09-17 11:44:04 -04:00 · 6294207b50
parent b91bcf4cc8
commit 6294207b50
2 changed files with 48 additions and 11 deletions
--- a/mult_distinct_p.tex
+++ b/mult_distinct_p.tex
@ -9,7 +9,7 @@ In this section, we will prove the hardness results claimed in Table~\ref{tab:lb
 %Furthermore, we demonstrate in \Cref{sec:single-p} that the problem remains hard, even if $\probOf[X_i=1] = \prob$ for all $X_i$ and any fixed valued $\prob \in (0, 1)$ as long as certain popular hardness conjectures in fine-grained complexity hold. 

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsection{Preliminaries}
+\subsection{Preliminaries}\label{sec:hard:sub:pre}
 Our hardness results are based on (exactly) counting the number of (not necessarily induced) subgraphs in $G$ isomorphic to $H$. Let $\numocc{G}{H}$ denote this quantity.  We can think of $H$ as being of constant size and $G$ as growing.  %In query processing, $H$ can be viewed as the query while $G$ as the database instance.
 In particular, we will consider the problems of computing the following counts (given $G$ in its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threedis}$ (the number of $3$-matchings), and the latter's generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings).  We use $\kmatchtime$ to denote the optimal runtime of computing $\numocc{G}{\kmatch}$.  Our hardness results in \Cref{sec:multiple-p} are based on the following hardness results/conjectures:

--- a/rebuttal.tex
+++ b/rebuttal.tex
@ -126,6 +126,7 @@ We clarify this overloaded notation immediately after \Cref{def:positive-circuit

 \RCOMMENT{Sometimes you consider UCQs, sometimes RA+ queries. I think it would be  simpler if you stick to one formalism (probably UCQs is cleaner?)}	
 As alluded to previously, we have followed the reviewer's suggestion and have found $\raPlus$ queries to be most amenable for this work.
+
 \RCOMMENT{l.432 what is an FAQ query?}
 We have added a reference. Please see \Cref{lem:val-ub}.

@ -159,18 +160,54 @@ We have implemented the reviewer's ad-hoc suggestion in light of Reviewer 1's si
 \RCOMMENT{the paper uses three notations (UCQ, RA+, SPJU) for the same thing, and never defines formally any of them.}
 We have chosen $\raPlus$ for consistent use throughout the paper.  We have included \Cref{footnote:ra-def} on \Cpageref{footnote:ra-def} for an explicity definition of $\raPlus$ queries.

-\RCOMMENT{}
+\RCOMMENT{$G^{\ell}$ is used in Lemma 3.8 but defined only in the Appendix (Def. B.2), without even a forward pointer.  This is a major omission: Lemma 3.8 is a key step for a key result, but it is impossible to read.}
+We have fixed this mistake.
+
+\RCOMMENT{Definition 2.7.  "valid worlds $\eta$".  This is confusing.  A "possible world" is an element of $\idb$: this is not stated explicitly in the paper, but it is implicit on line 163, so I assumed that possible worlds refer to elements of $\idb$.  If I assumed correctly, then calling $\eta$ a "world" in Def. 2.7 is misleading, because $\eta$ is not an element of $\idb$.  More, it is unclear to me why this definition is needed: it is used right below, in Lemma 2.8, but that lemma seems to continue to hold even if w is not restricted.}
+\AH{Needs to be addressed.}
+
+\RCOMMENT{line 305: please define what is an "occurrence of H in G".  It could mean: a homomorphic image, a subgraph of G isomorphic to H, an induced subgraph of G isomorphic to H, or maybe something else.}
+We agree with the reviewer's suggestion and have rephrased the wording to be clear.  Please see the beginning of \Cref{sec:hard:sub:pre}.
+
+\RCOMMENT{If the proofs are given in the appendix, please say so.  Lemmas 3.5 and 3.8 are stated without any mention, and one has to guess whether they are obvious, or proven somewhere else.  On this note: I found Lemma 3.5 quite easy, since the number of k-matching is the coefficient of the leading monomial (of degree 2k) in $Q^k(p,p,...,p)$, while Lemma 3.8 appears much harder.  It would help to briefly mention this in the main body of the paper.}
+We have implemented the reviewer's suggestion.  Please see the last sentence of \Cref{sec:intro}.
+
+\RCOMMENT{line 177: what is $\Omega_{\semNX}$?}
+We have eliminated the use of $\semNX$-DBs in the paper proper, using them only when necessary in the proofs of the appendix.
+\AH{Need to address what is $\idb_\semNX$}
+
+\RCOMMENT{line 217.  The polynomial $X^2 + 2XY + Y^2$ is a poor choice to illustrate the degree. There are two standard definitions of the degree of a multivariate polynomial, and one has to always clarify which one is meant.  One definition is the total degree (which is Def. 2.3 in the paper), the other is the maximum degree of any single variable.  It is nice that you are trying to clarify for the reader which definition you are using, but the polynomial $X^2 + 2XY + Y^2$ is worst choice, since here the two coincide.}
+We have adjusted the example to account for the reviewer's correct observation.
+
+\RCOMMENT{line 220.  "we consider only finite degree polynomials".  This is a surprise.  Polynomials, by definition, are of finite degree; there are extensions (I'm aware of powerseries, maybe you have other extensions in mind), but they are usually not called polynomials, plus, nothing in the paper so far suggests that it might refer to those extensions.}
+We have removed the redundant terminology the reviewer has pointed out, and refined the discussion surrounding (and including) \Cref{eq:sop-form} to be explicit to the novice reader that polynomials are by definition of finite degree.
+
+\RCOMMENT{"Note that our hardness results even hold for the expression trees".  At this point we haven't seen the hardness results, nor their proofs, and we don't know what expression trees are.  It's unclear what we can note.}
+We have accounted for the reviewer's concern in the rewrite of \Cref{sec:hard} adjusting the prose accordingly.
+
+\RCOMMENT{paragraph at the top of pp.10 is confusing.  My guess is that it is trying to this say: "there exists a query Q, such that, for each graph G, there exists a database D s.t. the lineage of Q on D is the polynomial $Q_G$."}
+Our revision has eliminated this statement.
+
+\subsection{Reviewer 3}
+\RCOMMENT{The overall study is then extended to a multiplicative approximation algorithm for the expectation of polynomial circuits in linear time in the size of the polynomial. It was much harder to read this part, and I found the examples and flow in the appendix quite helpful. I suggest to include these examples into the body of the paper. }
+\AH{Need to address this.}
+
+\RCOMMENT{While ApproximateQ is linear in the size of the circuit, it is quadratic in epsilon and so we need quadratically many samples for the desired accuracy --  overall runtime is not linear therefore and it may be better to elaborate this.  It may also be helpful to comment on how this relates to Karp, Luby, Madras algorithm [1] for \#DNF which is also quadratic in epsilon.}
+\AH{Need to elaborate on this.}
+
+\RCOMMENT{The coverage of related work is adequate. Fink et. al seems as the closest related work to me and I would appreciate a more elaborate comparison with this paper. My understanding is that Fink et. al considers exact evaluation only and focuses on knowledge compilation techniques based on decompositions. They also note that "Expected values can lead to unintuitive query answers, for instance when data values and their probabilities follow skewed and non-aligned distributions" attributed to [2]. Does this apply to the current work? Can you please comment on this?}
+\AH{Need to comment on this if possible.}
+
+\RCOMMENT{I assume the authors focus on parametrised complexity throughout the paper, and even this is not stated unambiguously. The authors should make an extra effort to make the paper more accessible by using the explanations and examples from the appendix in the body of the paper. It is also important to highlight the differences with the complexity of standard query evaluation over PDBs.}
+Our revision has focused on explicitly mentioning the complexity metrics we are interested in.  This can be seen in e.g. \Cref{sec:intro} and formal statement (theorems, lemmas, etc.), which have been rewritten to eliminate ambiguities.
+We have also taken pains to be promote accessibility, keeping the paper self-contained, and using examples for difficult or potentially unclear concepts.  This can be seen in e.g. eliminating unnecessary machinery (e.g. $\semNX$-DB machinery from the paper proper), providing/modifiying examples (c.f. \Cref{def:expand-circuit}, \Cref{def:degree}), and ensuring consistency in notational use, e.g. using one query evaluation formalism ($\raPlus$).
+
+\subsection{Reviewer 4}
+\RCOMMENT{I wonder whether the writing could be revisited to give the reader a better overview of the technical challenges, motivation, and the high level ideas of the algorithm and hardness results. The current exposition seems slightly too tailored for the expert in probabilistic databases rather than the average ICDT attendee. Also the current exposition is structured such that the reader needs to get through quite a few definitions and technical lemmas until they get to the new ideas in the paper.}
+We have (as noted throughout this section) revised the writing to provide precision and clarity to the problem we explore as well as the results we obtain.  Part of this revision was a complete rewriting of \Cref{sec:intro} where we sought to be extremely precise in language and through a series of problem statements to help the reader navigate and understand the problem we explore as well as how we have gone about exploring that problem coupled with the validity of the exploration stategy.  We have simultaneously sought to make the paper more accessible by assuming the average ICDT attendee and defining or explaining concepts that might not be known to them.

 \RCOMMENT{}

 \RCOMMENT{}

-\RCOMMENT{}
-
-\RCOMMENT{}
-
-\RCOMMENT{}
-
-\RCOMMENT{}
-
-
+\RCOMMENT{}