More additions.

This commit is contained in:
Aaron Huber 2021-07-14 15:30:04 -04:00
parent 6e165011b7
commit 102a890cf9

View file

@ -6,8 +6,8 @@ The lesser problem of simply computing $\query$ over a deterministic database is
The model of computation in \cref{fig:two-step} views \abbrPDB query processing as two steps. As depicted, the first step consists of computing $\query$ over a $\abbrPDB$, which is essentially the deterministic computation of both the query output and $\poly(\vct{X})$\footnote{Note that, assuming standard $\raPlus$ query algorithms, computing the lineage polynomial of $\tup$ is upperbounded by the runtime of deterministic query evaluation of $\tup$.}. The second step consists of computing $\expct\pbox{\poly(\vct{X})}$. Such a model of computation is nicely followed by set-\abbrPDB semantics \cite{DBLP:series/synthesis/2011Suciu} (where e.g. intensional evaluation is itself a separate computational step; further, computing $\expct\pbox{\poly\inparen{\vct{X}}}$ in extensional evaluation occurs as a separate step of each operator in the query tree, and therefore implies that both concerns can be separated) and also by that of semiring provenance \cite{DBLP:conf/pods/GreenKT07} (where the $\semNX$-DB first computes the annotation via the query, and then the polynomial is evaluated on a specific valuation), and further, in this work the model lends itself nicely in separating the deterministic computation from the probability computation.
The problem of computing $\query(\pdb)$ has been extensively studied in the context of \emph{set}-\abbrPDB\xplural, where the lineage is represented as a propositional formula rather than a polynomial.\footnote{For the case when $\query$ is in the class of $\raPlus$ and $\pdb$ is a \abbrTIDB, the lineage propositional formula is essentially a polynomial with conjunction ($\wedge$) as the polynomial multiplication operator and disjunction ($\vee$) as the polynomial addition operator.} The semantics of evaluating $\query(\pdb)$ in this setting require each output tuple in $\query(\pdb)$ appears at most once in the result with its corresponding marginal probability $\expct\pbox{\poly\inparen{\vct{X}}}$. Dalvi and Suicu showed that the complexity of the query computation problem over set-\abbrPDB\xplural is \sharpphard in general, and proved that a dichotomy exists for this problem, where the runtime of $\query(\pdb)$ is either polynomial or \sharpphard for any polynomial step one. Since the hardness is in the size of the input ($\numvar$), fine grained complexity analysis of step two will not reduce the hardness results from the \sharpphard complexity class for any parameterized complexity class. To overcome this result, one can allow for approximation which reduces the problem to a quadratic upper bound.
The problem of computing $\query(\pdb)$ has been extensively studied in the context of \emph{set}-\abbrPDB\xplural, where the lineage polynomial is a propositional formula.\footnote{For the case when $\query$ is in the class of $\raPlus$ and $\pdb$ is a \abbrTIDB, a propositional formula is special case of the general polynomial with conjunction ($\wedge$) as the polynomial multiplication operator and disjunction ($\vee$) as the polynomial addition operator.} The semantics of evaluating $\query(\pdb)$ in this setting require each output tuple in $\query(\pdb)$ appears at most once in the result with its corresponding marginal probability $\expct\pbox{\poly\inparen{\vct{X}}}$. Dalvi and Suicu showed that the complexity of the query computation problem over set-\abbrPDB\xplural is \sharpphard in general, and proved that a dichotomy exists for this problem, where the runtime of $\query(\pdb)$ is either polynomial or \sharpphard for any polynomial step one. Since the hardness is in data complexity (the size of the input, ($\numvar$)), fine grained complexity analysis of step two will not reduce the hardness results from the \sharpphard complexity class for any parameterized complexity class. To overcome this result, one can allow for approximation which reduces the problem to a quadratic upper bound.
There exist some queries for which \emph{bag}-\abbrPDB\xplural are a more natural fit. One such query is the count query, where one might desire to compute the expected multiplicity ($\expct\pbox{\poly\inparen{\vct{X}}}$) of a result tuple $\tup$. The semantics of $\query(\pdb)$ in bag-\abbrPDB\xplural allow for output tuples to appear \emph{more} than once.
There exist some queries for which \emph{bag}-\abbrPDB\xplural are a more natural fit. One such query is the count query, where one might desire to compute the expected multiplicity ($\expct\pbox{\poly\inparen{\vct{X}}}$) of a result tuple $\tup$. The semantics of $\query(\pdb)$ in bag-\abbrPDB\xplural allow for output tuples to appear \emph{more} than once, which is naturally captured by a lineage polynomial with standard addition and multiplication polynomial operators. In this setting, linearity of expectation holds over the lineage polynomial, and the complexity of computing step two is linear in the size of the lineage polynomial.
For the case of bags, should we allow for approximation in the setting of bag-\abbrPDB\xplural, this paper shows that we can \emph{guarantee} runtime of $\query(\pdb)$ to be linear in the deterministic runtime of $\query$. We further show in this paper that it is \emph{not} the case in general that the intensional step of $\query(\pdb)$ is linear in the runtime of the deterministic query $\query$.