Minor tweaks to the 1st iteration.

2021-06-24 11:24:49 -04:00 · 2021-06-24 11:24:49 -04:00 · 0213a07ec2
parent bb378db2aa
commit 0213a07ec2
1 changed files with 1 additions and 1 deletions
--- a/intro-rewrite2.tex
+++ b/intro-rewrite2.tex
@ -47,7 +47,7 @@ A tuple independent database (\abbrTIDB) is a \abbrPDB whose tuples are treated

 Traditionally, bag-\abbrPDB\xplural have long been considered to be bottlenecked in step one only, or linear in the size of query.  This may partially be due to the prevalence that exists in using a sum of products (\abbrSOP) representation of the lineage polynomial amongst many of the most well-known implementations of set-\abbrPDB\xplural.  Such a representation used in the bag-\abbrPDB setting \emph{indeed} allows for step two to be linear in the \emph{size} of the \abbrSOP representation, a result due to linearity of expectation.  

-However, it is not necessarily satisfying to stop here.  Since typical implementations of \abbrPDB\xplural compute the representation of the lineage polynomial in sync with the particular choice of query plan, it is important that optimizations are allowed if we want to have a true comparison between step one and step two in bag-\abbrPDB queries.  Optimizations like projection push-down produce factorized or non-\abbrSOP representations of the lineage polynomial.  Our work explores whether or not step two in the computation model is \emph{always} linear in the \emph{size} of the representation of the lineage polynomial when step one of $\query(\pdb)$ is easy.\footnote{It is known that, in general, there exist queries that are \emph{not} linear in the size of the data.  Such queries as multiple joins and counting cliques are specific examples of this.  We are considering cases where the query is linear in the size of the data.}
+However, it is not necessarily satisfying to stop here.  Since typical implementations of \abbrPDB\xplural compute the representation of the lineage polynomial in sync with the particular choice of query plan, it is important that optimizations are allowed if we want to have a true comparison between step one and step two in bag-\abbrPDB queries.  Optimizations like projection push-down produce factorized or non-\abbrSOP representations of the lineage polynomial.  Our work explores whether or not step two in the computation model is \emph{always} linear in the \emph{size} of the representation of the lineage polynomial when step one of $\query(\pdb)$ is easy.\footnote{It is known that, in general, there exist queries that are \emph{not} linear in the size of the data.  Such queries as multiple joins and counting cliques are specific examples of this.  We are considering cases where the query is linear in the size of the data.}  Indeed, if for all $i \in [\numvar]$, $\prob_i = 1$, computation is essentially a deterministic query and dominated by the first step.  This changes, however, when $\prob_i < 1$, and for this case our work shows that the problem is not linear in the size of the representation.\AH{Not sure of the wording...this is \emph{true} for all representations, correct?  Also, should I be more precise and say ``when $\forall i \in [\numvar]$, $\prob_i < 1$''?  I don't think we consider the case of a mixture of probabilities, with some equal to 1 and some less than 1.}

 Our work focuses on the following setting for query computation.  Inputs of $\query$ are set-\abbrPDB\xplural, while the output of $\query$ is a bag-\abbrPDB.  This setting, however, is not limiting as a simple generalization exists, which involves assigning a unique id to each tuple of bag-\abbrPDB inputs.