This commit is contained in:
Boris Glavic 2021-09-02 16:58:09 -05:00
parent 680cb7e227
commit 7cdc1f775b

View file

@ -18,7 +18,7 @@ of tuple $\tup$.
\end{Problem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We are mostly interested in the data complexity of this problem (i.e. we think of $Q$ as being of constant size). Unless stated otherwise, we implicitly assume the probability distribution $\pd$, and for notational convenience use $\expct\pbox{\cdot}$ instead of $\expct_\pd\pbox{\cdot}$. It has been shown that the problem of computing the marginal probability of a query result tuple can be reduced to the problem of computing the probability that the lineage formula of the tuple evaluates to true. The lineage formula of a tuple $\tup$ is a propositional formula over boolean random variables (whose joint probability distribution encodes which tuple exists in which world) representing the tuples of $\pdb$ which encodes how the existence of $\tup$ depends on the existence of the input tuples. The bag semantics analog of a lineage formula is a provenance polynomial $\apolyqdt$, a polynomial with integer co-efficients and exponents over integer random variables encoding the multiplicity of input tuples.
We are mostly interested in the data complexity of this problem (i.e. we think of $Q$ as being of constant size). Unless stated otherwise, we implicitly assume the probability distribution $\pd$, and for notational convenience use $\expct\pbox{\cdot}$ instead of $\expct_\pd\pbox{\cdot}$. It has been shown that the problem of computing the marginal probability of a query result tuple can be reduced to the problem of computing the probability that the lineage formula of the tuple evaluates to true. The lineage formula of a tuple $\tup$ is a propositional formula over boolean random variables (whose joint probability distribution encodes which tuple exists in which world) representing the tuples of $\pdb$ which encodes how the existence of $\tup$ depends on the existence of the input tuples. The bag semantics analog of a lineage formula is a provenance polynomial $\apolyqdt$, a polynomial with integer co-efficients and exponents over integer random variables encoding the multiplicity of input tuples. Note that we drop $Q$, $\pdb$, and $\tup$ from $\apolyqdt$ if they are clear from the context or irrelevant to the discussion.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Problem}[Expected Multiplicity of Lineage Polynomials]\label{prob:bag-pdb-poly-expected}
@ -55,10 +55,10 @@ This in turn implies that $\expct\pbox{\query\inparen{\pdb}\inparen{\tup}} = \ex
Thanks to linearity of expectation, simple polynomial-time algorithms exist
% The algo is trivial so I think putting in a 2010 cite seems like bit too much
%\cite{kennedy:2010:icde:pip})
for computing exact results for bag-probabilistic count queries $Q$ over \abbrTIDB{}s. On the other hand, it is also known that since we are considering data complexity, the {\em deterministic} query processing for the same query $Q$ can also be done in polynomial time. If our notion of efficiency was polynomial time algorithms, then we would be done. However, in practice (and in theory), we care about the {\em fine-grained} complexity of deterministic query processing (i.e. we care about the exact exponent in our polynomial runtime). Given that there is a huge literature on fine grained complexity of determinitic query complexity, here is a natural (informal) specialization of~\cref{prob:bag-pdb-query-eval}:
for computing exact results for bag-probabilistic count queries $Q$ over \abbrTIDB{}s. However, it is also known that since we are considering data complexity, that {\em deterministic} query processing for the same query $Q$ can also be done in polynomial time. If our notion of efficiency was polynomial time algorithms, then we would be done. However, in practice (and in theory), we care about the {\em fine-grained} complexity of deterministic query processing (i.e. we care about the exact exponent in our polynomial runtime). Given that there is a huge literature on fine grained complexity of deterministic query complexity, here is a natural (informal) specialization of~\cref{prob:bag-pdb-query-eval}:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Problem}[Informal problem statement]
For any query $Q$, is it the case that the {\em fine-grained complexity} of bag-PDB processing of $Q$ can be asymptotically as fast as the `best' deterministic query processing of $Q$?
For any query $Q$, is it the case that the {\em fine-grained complexity} of computing expected multiplicities for the result tuples of $Q$ can be asymptotically as fast as the `best' deterministic query processing of $Q$?
\label{prob:informal}
\end{Problem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -177,7 +177,7 @@ as the representation system of $\poly_\tup(\vct{X})$, which are a natural fit t
% \end{aligned}\\
% & & \evald{R}{\db}(\tup) =& \rel(\tup)
\end{align*}\\[-10mm]
\caption{Construction of the lineage (polynomial) for $\raPlus$ over \abbrBPDB} % Evaluation semantics $\evald{\cdot}{\db}$ for $\semNX$-DBs~\cite{DBLP:conf/pods/GreenKT07}.}
\caption{Construction of the lineage (polynomial) for an $\raPlus$ query over a \abbrBPDB} % Evaluation semantics $\evald{\cdot}{\db}$ for $\semNX$-DBs~\cite{DBLP:conf/pods/GreenKT07}.}
\label{fig:nxDBSemantics}
\end{figure}