Changes to S4; more rebuttals.

master
Aaron Huber 2021-09-17 10:06:03 -04:00
parent 8d8e369962
commit cc90b8799a
2 changed files with 76 additions and 5 deletions

View File

@ -5,7 +5,7 @@
In \Cref{sec:hard}, we showed that the answer to \Cref{prob:intro-stmt} is no.
With this result, we now design an approximation algorithm for our problem that runs in $\bigO{\abs{\circuit}}$.\footnote{For a very broad class of circuits: please see the discussion after \Cref{lem:val-ub} for more.}
The folowing approximation algorithm applies to \bi, though our bounds are more meaningful for a non-trivial subclass of queries over \bis that contains all queries on \tis, as well as the queries of the PDBench benchmark~\cite{pdbench}. As before, all proofs and pseudocode can be found in \Cref{sec:proofs-approx-alg}.
The folowing approximation algorithm applies to \abbrBIDB lineage polynomials (over $\raPlus$ queries), though our bounds are more meaningful for a non-trivial subclass of queries over \bis that contains all queries on \tis, as well as the queries of the PDBench benchmark~\cite{pdbench}. As before, all proofs and pseudocode can be found in \Cref{sec:proofs-approx-alg}.
%it is then desirable to have an algorithm to approximate the multiplicity in linear time, which is what we describe next.
\subsection{Preliminaries and some more notation}
@ -75,10 +75,11 @@ Finally, we use the following notation for the complexity of multiplying integer
In a RAM model of word size of $W$-bits, $\multc{M}{W}$ denotes the complexity of multiplying two integers represented with $M$-bits. (We will assume that for input of size $N$, $W=O(\log{N})$.
\end{Definition}
\subsection{Our main result}
\subsection{Our main result}\label{sec:algo:sub:main-result}
The following results assume input circuit \circuit computed from an arbitrary $\raPlus$ query $\query$ and arbitrary \abbrBIDB $\pdb$. We refer to \circuit as a \abbrBIDB circuit.
\AH{Verify that the proof for \Cref{lem:approx-alg} doesn't rely on properties of $\raPlus$ or \abbrBIDB.}
\begin{Theorem}\label{lem:approx-alg}
Let \circuit be an arbitrary circuit from a \abbrBIDB %for a UCQ over \bi
Let \circuit be an arbitrary \abbrBIDB circuit %for a UCQ over \bi
and define $\poly(\vct{X})=\polyf(\circuit)$ and let $k=\degree(\circuit)$.
Then an estimate $\mathcal{E}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ can be computed in time
{\small
@ -96,7 +97,7 @@ $\expansion{\circuit}$
that need to be `canceled' when monomials with dependent variables are removed (\Cref{def:reduced-bi-poly}). %def:hen it is modded with $\mathcal{B}$ (\Cref{def:mod-set-polys}).
Let $\isInd{\cdot}$ be a boolean function returning true if monomial $\encMon$ is composed of independent variables and false otherwise; further, let $\indicator{\theta}$ also be a boolean function returning true if $\theta$ evaluates to true.
\begin{Definition}[Parameter $\gamma$]\label{def:param-gamma}
Given a circuit $\circuit$ from a \abbrBIDB, define
Given a \abbrBIDB circuit $\circuit$ define
\AH{Technically, $\monom$ is a set of variables rather than a monomial. Perhaps we don't need the $\var(\cdot)$ function and can replace is with a function that returns the monomial represented by a set of variables. FIXED: need to propogate this to the appendix ($\encMon$)}
\AH{To add, this is an issue on line 1073, 1117 of app C.}
\[\gamma(\circuit)=\frac{\sum_{(\monom, \coef)\in \expansion{\circuit}} \abs{\coef}\cdot \indicator{\neg\isInd{\encMon}} }%\encMon\mod{\mathcal{B}}\equiv 0}}
@ -121,7 +122,7 @@ $\abs{\circuit}(1,\ldots, 1)\le 2^{2^k\cdot \size(\circuit)}.$
Further, under either of the following conditions:
\begin{enumerate}
\item $\circuit$ is a tree,
\item $\circuit$ encodes the run of the algorithm in~\cite{DBLP:conf/pods/KhamisNR16} on a FAQ\AH{AJAR citation.} query,
\item $\circuit$ encodes the run of the algorithm on a FAQ~\cite{DBLP:conf/pods/KhamisNR16}\AH{AJAR citation.} query,
\end{enumerate}
we have $\abs{\circuit}(1,\ldots, 1)\le \size(\circuit)^{O(k)}.$
\end{Lemma}

View File

@ -65,6 +65,76 @@ We have made effort to be deliberately consistent with the use of notation, foll
that...": I don't understand why you need a database to extend an assignment to its semiring homomorphism from $\semNX \rightarrow \semN$}
\AH{Need to make sure that the reason for this is clear.}
\RCOMMENT{Figure 2, K is undefined}
We have updated \Cref{fig:nxDBSemantics} (originally figure 2) to not necessitate $K$.
\RCOMMENT{l.178 "$Q_t$", l.189 "Q will denote a polynomial": this is a very poor choice of notation}
\RCOMMENT{l.242 "and query Q": is Q a query or a lineage?}
We have reserved $\query$ to mean an $\raPlus$ query and nothing else.
\RCOMMENT{Section 2.1.1: here you are considering set semantics no? Otherwise, one would think that for bag semantics the annotation of a tuple could be 0 or something of the form c $\times$ X, where X is a variable and c is a natural number}
The semantics for the polynomial as seen in \Cref{eq:sop-form} is specified indeed as the reviewer has pointed out.
\RCOMMENT{Proof of Proposition A.3. I seems the proof should end after l.687, since you already proved everything from the statement of the proposition. I don't understand what it is that you do after this line.}
\AH{This needs to be verified.}
\RCOMMENT{l.686 "The closure of ... over K-relations": you should give more details on this part. It is not obvious to me that the relations from l.646 hold.}
\AH{This too needs to be looked at.}
\RCOMMENT{l.711 "As already noted...": ah? I don't see where you define which subclass of N[X]-PDBs define bag version of TIDBs. If this is supposed to be in Section 2.1.1 this is not clear, since the world "bag" does not even appear there (and as already mentioned everything seems to be set semantics in this section). I fact, nowhere in the article can I see a definition of what are bag TIDBs/BIDBs}
\AH{This needs to be taken care of in the appendix.}
\RCOMMENT{- l.707 "the sum of the probabilities of all the tuples in the same block b is 1": no, traditionally it can be less than 1, which means that there could be no tuple in the block.}
The reviewer is correct and we have updated our appendix text accordingly.
\RCOMMENT{it is not clear to me how you can go from l.733 to l.736, which is sad because this is actually the whole point of this proof. If I understand correctly, in l.733, Q(D)(t) is the polynomial annotation of t when you use the semantics of Figure 2 with the semiring K being N[X], so I don't see how you go from this to l.736}
\AH{Needs to be verified. I have looked at this previously, and the proof iirc.}
\RCOMMENT{l.209-227: so you define what is a polynomial and what is the degree of a polynomial (things that everyone knows), but you don't bother explaining what "taking the mod of Q(X) over all polynomials in S" means? This is a bit weird.}
Based on this and other reviewer comments, we removed the formal definition of $\rpoly\inparen{\vct{X}}$ and have defined it in a more ad-hoc manner, as suggested by the reviewers, including the comment immediately following.
\RCOMMENT{Definition 2.6: to me, using polynomial long division to define $\tilde{Q}$(X) seems like a pedantic way of reformulating something similar to Definition 1.3, which was perfectly fine and understandable already! You could just define $\tilde{Q}$(X) to set all exponents in the SOP that are >1 to 1 and to remove all monomials with variables from the same block, or using Lemma A.4 as a definition?}
As alluded to above, we have incorporated the reviewer's suggestion, c.f. \Cref{def:reduced-poly} and \Cref{def:reduced-bi-poly}.
\RCOMMENT{Definition 2.14. It is not clear what is the input exactly. Are the query Q and database D fixed? Moreover, I have the impression that your hardness results have nothing to do with lineages and that you don't need them to express your results. I think the problem you should consider is simply the following: Expected Multiplicity Problem: Input: query Q, N[X]-database D, tuple t. Output: expected multiplicity of t in Q(D). Your main hardness result would then look like this: the Expected
Multiplicity problem restricted to conjunctive queries is \#W[1]-hard, parameterized by query size. Indeed if I look at the proof, all you need is the queries $Q^k_G$. The problem is \#W[1]-hard and it should not matter how one tries to solve it: using an approach with lineages or using anything else.
Currently it is confusing because you make it look like the problem is hard only when you consider general arithmetic circuits, but your hardness proof has nothing to do with circuits. Moreover, it is not surprising that computing the expected output of an arithmetic circuit is hard: it is trivial, given a CNF $\phi$, to build an arithmetic circuit C such that for any valuation $\nu$ of the variables the formula $\phi$ evaluates to True under $\nu$ if C evaluates to 1 and the formula $\phi$ evaluates to False under $\nu$ if C evaluates to 0, so this problem is \sharpphard anyways.}
We have rewritten \Cref{sec:intro} with a series of refined problem statements to show that the problem we explore and the results we obtain directly involve lineage polynomials. The reviewer is correct that the output is the expected multiplicity, and we hope that our updated presentation of the paper makes it clear that $\expct_{\vct{\randWorld}\sim\pdassign}\pbox{\apolyqdt\inparen{\vct{\randWorld}}}$ is indeed the expected multiplicity spoken of. We have also addressed the ambiguity in the complexity we are focusing on, both explicitly in the intro and in the revised definition, \Cref{def:the-expected-multipl}.
Regarding the use of circuits, it is true that our hardness results do not require circuits while our approximation algorithm and cost model both rely on circuits. We have adjusted our presentation (e.g. the segway between \Cref{prob:informal} and \Cref{prob:big-o-joint-steps}) to make this distinction clear and eliminate any confusion.
\RCOMMENT{Section 3.3. It seems to me the important part of this section is not so much the fact that we have fixed values of p but that the query is now fixed and that you are looking at the fine-grained complexity. If what you really cared about was having fixed value of p, then the result of this section should be exactly like the one in Theorem 3.4, but starting with "fix p". So something like "Fix p. Computing $\tilde{Q}^k_G$ for arbitrary G is \#W1-hard".}
\AH{Need help in responding to this one.}
\RCOMMENT{General remark: The story of the paper I think should be this: we can always compute the expected multiplicity for a UCQ Q and N[X]-database D and tuple t by first computing the lineage in SOP form and then using linearity of expectation, which gives an upper bound of (roughly) $O(|D|^|Q|)$. We show that this exponential dependence in |Q| is unavoidable by proving that this problem is \#W1 hard parameterized by |Q| (which implies that we cannot solve it in $f(|Q|) |D|^c$ ). Furthermore we obtain fine-grained superlinear lower bounds for a fix conjunctive query Q. (Observe how up to here, there is no need to talk about lineages at all). We then obtain an approximation algorithm for this problem for [this class of queries] and [that class of bag PDBs] with [that running time (Q,D)]. The method is to first compute the lineage as an arithmetic circuit C in [this running time (Q,D)], and then from the arithmetic circuit C compute in [running time(C)] an approximation of its expected output. Currently I don't understand to which queries your approximation algorithm can be applied (see later comments).}
We have followed the suggestions of the reviewer to delineate between the `coarse' polynomial time and the fine grained complexity analysis. We found it necessary to introduce polynomials earlier since our hard query, hardness results, and their proofs are easier to present (and we feel make the paper more accessible) than doing so without the lineage polynomials.
We have taken pains to be very clear that this work only considers $\raPlus$ queries, adding a reminder to this end in the first paragraph of \Cref{sec:algo}.
\AH{We need to address the last line of the reviewer's comment. Also, not sure if I answered the comment perfectly.}
\RCOMMENT{l.381: Here again, I think it would be simpler to consider that the input of the problem is the query, the database and a tuple and claim that you can compute an approximation of the expected multiplicity in linear time. The algo is to first compute the lineage as an arithmetic circuit, and then to use what you currently use (which could be put in a lemma or in a proposition).}
Our appoximation algorithm assumes an input circuit \circuit that has been computed via an arbitrary $\raPlus$ query $\query$ and arbitrary \abbrBIDB $\pdb$. We have included prose to describe this at the beginning of {sec:algo:sub:main-result}.
\RCOMMENT{Definition 4.2: would you mind giving an intuition of what this is? It is not OK to define something and just tell the reader to refer the appendix to understand what this is and why this is needed; the article should be understandable without having to look at the appendix. It is simply something that gives the coefficient of each monomial in the reduced polynomial?}
We have provided an intuitive example in directly after \Cref{def:expand-circuit}.
\RCOMMENT{- l.409: how does it matter that the circuit C is the lineage of a UCQ? Doesn't this work for any arithmetic circuit?}
The reviewer is correct that our approximation results apply to $\raPlus$ queries over \abbrBIDB\xplural. This we specify this in the formal statements of \Cref{sec:algo}, e.g. see \Cref{def:param-gamma} and \Cref{cor:approx-algo-const-p}.
\RCOMMENT{l.411: what are $|C|^2(1,...,1)$ and $|C|(1,...,1)$? }
We clarify this overloaded notation immediately after \Cref{def:positive-circuit}.
\RCOMMENT{Sometimes you consider UCQs, sometimes RA+ queries. I think it would be simpler if you stick to one formalism (probably UCQs is cleaner?)}
As alluded to previously, we have followed the reviewer's suggestion and have found $\raPlus$ queries to be most amenable for this work.
\RCOMMENT{l.432 what is an FAQ query?}
We have added a reference. Please see \Cref{lem:val-ub}.
\RCOMMENT{Generally speaking, I think I don't understand much about Section 4, and the convolutedness of the appendix does not help to understand. I don't even see in which result you get a linear runtime and to which queries the linear runtime applies. Somewhere there should be a corollary that clearly states a linear time approximation algorithm for some queries.}
\AH{Needs to be addressed.}
\RCOMMENT{In section 5, it seems you are arguing that we can compute lineages as arithmetic circuits at the same time as we would be running an ordinary query evaluation plan. How is that different from using the relations in Figure 2 for computing the lineage?}
There is not a major difference between the two. We explicitly focus on circuits since our approximation results rely on them. We have taken pains to be clear that our hardness results do not rely on them, while our approximation results do. This can be seen e.g. in the progressional sequence of problem statements in the revised introduction (\Cref{sec:intro}).
\RCOMMENT{}