Restructuring Intro based on 070921 conversation.

master
Aaron Huber 2021-07-13 14:00:42 -04:00
parent e3e5f6ee13
commit c9f0f8689c
2 changed files with 14 additions and 0 deletions

13
intro-rewrite-070921.tex Normal file
View File

@ -0,0 +1,13 @@
%root: main.tex
\section{Introduction (Rewrite - 070921)}
A probabilistic database (\abbrPDB) $\pdb$ is a probability distribution $\pd$ over a (multi-) set of $\numvar$ tuples in a deterministic database $\db$. A tuple independent probabilistic database (\abbrTIDB) $\pdb$ further restricts $\pd$ to treating each tuple in $\db$ as a bernoulli distributed random event. Given a query $\query$ from the class of positive relational algebra queries ($\raPlus$), the goal is to compute the expectation ($\expct\pbox{\poly\inparen{\vct{X}}}$) of each output tuple $\tup$, where $\poly\inparen{\vct{X}}$ is the lineage polynomial of $\tup$ parameterized by $\vct{X}$, the set of $\numvar$ variables annotating the base tuples of $\pdb$.
The lesser problem of simply computing $\query$ over a deterministic database is itself known to be \sharpwonehard in the general case. This is seen in such queries as counting $k$-cliques and $k$-way joins, where the superlinear runtime is parameterized in $k$. This result is (obviously) unsatisfying when considering query runtime over \abbrPDB\xplural, since it \emph{entirely} ignores the complexity of intensional evaluation (computing $\expct\pbox{\poly\inparen{\vct{X}}}$). A natural question is whether or not we can quantify the runtime of the intensional evaluation of $\poly\inparen{\vct{X}}$ separately from the complexity of deterministic query evaluation. \Cref{fig:two-step} illustrates one way to do this.
The model of computation in \cref{fig:two-step} views \abbrPDB query processing as two steps. As depicted, the first step consists of computing $\query$ over a $\abbrPDB$, which is essentially the deterministic computation of both the query output and $\poly(\vct{X})$\footnote{Note that, assuming standard $\raPlus$ query algorithms, computing the lineage polynomial of $\tup$ upperbounded by the runtime of deterministic query evaluation of $\tup$.}. The second step consists of computing $\expct\pbox{\poly(\vct{X})}$. Such a model of computation is nicely followed by set-\abbrPDB semantics \cite{DBLP:series/synthesis/2011Suciu} (where e.g. intensional evaluation is itself a separate computational step; further, computing $\expct\pbox{\poly\inparen{\vct{X}}}$ in extensional evaluation occurs as a separate step of each operator in the query tree, and therefore implies that both concerns can be separated) and also by that of semiring provenance \cite{DBLP:conf/pods/GreenKT07} (where the $\semNX$-DB first computes the annotation via the query, and then the polynomial is evaluated on a specific valuation), and further, in this work the model lends itself nicely in separating the deterministic computation from the probability computation.
\%\%\%\%\%\%\%\%\%\%\%\%\%\%\%\%\%\%\%\%
The problem of computing an $\raPlus$ query $\query$ over a \emph{set}-\abbrPDB has been extensively studied and has been shown to be \sharpphard in the general case, with the complexity bottleneck being in the probability computation, otherwise known as intensional evaluation, or the intensional step. These results suggest a perhaps natural compartmentalized computational model when considering the query evaluation problem over \abbrPDB\xplural. When one allows for approximation in set-\abbrPDB\xplural, the problem is reduced to an upper bound of quadratic runtime.
For the case of bags, should we allow for approximation in the setting of bag-\abbrPDB\xplural, this paper shows that we can \emph{guarantee} runtime of $\query(\pdb)$ to be linear in the deterministic runtime of $\query$. We further show in this paper that it is \emph{not} the case in general that the intensional step of $\query(\pdb)$ is linear in the runtime of the deterministic query $\query$.

View File

@ -119,6 +119,7 @@ sensitive=true
\maketitle
\input{abstract}
\input{intro-rewrite-070921}
\input{intro-rewrite2}%ICDT 2nd Round submission
%\input{outline-intro-new}
%\input{intro-new}%ICDT 1st Round submission