Finished first draft of new Intro.

master
Aaron Huber 2022-01-27 11:14:31 -05:00
parent 2dfd9c7452
commit 13193d8365
1 changed files with 8 additions and 8 deletions

View File

@ -247,7 +247,10 @@ $\expct\limits_{\vct{\randWorld}\sim\pdassign}\pbox{\poly^2\inparen{\vct{\randWo
For any polynomial $\poly(\vct{X})$ corresponding to a \abbrCTIDB (henceforth, \abbrCTIDB-lineage polynomial),
%\BG{Better introduce the notion of TIDB lin poly before here, then it iis more clear?},
%Atri: Done
define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the standard monomial basis (\abbrSMB) form of $\poly(\vct{X})$ to $1$.
define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the standard monomial basis (\abbrSMB) \footnote{
This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
}
form of $\poly(\vct{X})$ to $1$.
\end{Definition}
With $\poly^2\inparen{A, B, C, E, X, Y, Z}$ as an example, we have:
\begin{align*}
@ -296,9 +299,6 @@ We adopt the two-step intensional model of query evaluation used in set-\abbrPDB
(ii) \termStepTwo (\abbrStepTwo): Given $\poly(\vct{X})$ for each tuple, compute $\expct\pbox{\poly(\vct{\randWorld})}$.
Let $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit)$ denote the runtime of \abbrStepOne when it outputs $\circuit$ (which is a representation of $\poly$ as an arithmetic circuit --- more on this representation shortly).
Denote by $\timeOf{\abbrStepTwo}(\circuit)$ (recall $\circuit$ is the output of \abbrStepOne) the runtime of \abbrStepTwo, allowing us to formally define our objective:
\AHchange{
Our next question is whether or not there exists a $\inparen{1\pm\epsilon}$-approximation algorithm that is linear to the deterministic query? If so, we have shown that approximation of bag \abbrPDB\xplural is comparable to deterministic query processing.
}
\begin{Problem}[Bag-\abbrCTIDB linear time approximation]\label{prob:big-o-joint-steps}
Given \abbrCTIDB $\pdb$, $\raPlus$ query $\query$,
@ -308,9 +308,7 @@ $\exists \circuit : \timeOf{\abbrStepOne}(Q,\tupset, \circuit) + \timeOf{\abbrSt
We show in \Cref{sec:circuit-depth} an $O(\qruntime{Q, \tupset})$ algorithm for constructing the lineage polynomial for all result tuples of an $\raPlus$ query $\query$ (or more more precisely, a single circuit $\circuit$ with one sink per tuple representing the tuple's lineage).
A key insight of this paper is that the representation of $\circuit$ matters.
For example, if we insist that $\circuit$ represent the lineage polynomial in the standard monomial basis (henceforth, \abbrSMB)\footnote{
This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
}, the answer to the above question in general is no, since then we will need $\abs{\circuit}\ge \Omega\inparen{\inparen{\qruntime{Q, \tupset}}^k}$,
For example, if we insist that $\circuit$ represent the lineage polynomial in \abbrSMB, the answer to the above question in general is no, since then we will need $\abs{\circuit}\ge \Omega\inparen{\inparen{\qruntime{Q, \tupset}}^k}$,
and hence, just $\timeOf{\abbrStepOne}(Q,\tupset,\circuit)$ will be too large.
However, systems can directly emit compact, factorized representations of $\poly(\vct{X})$ (e.g., as a consequence of the standard projection push-down optimization~\cite{DBLP:books/daglib/0020812}).
@ -323,11 +321,13 @@ as the representation system of $\poly(\vct{X})$.
Given that there exists a representation $\circuit^*$ such that $\timeOf{\abbrStepOne}(\query,\tupset,\circuit^*)\le O(\qruntime{\query, \tupset})$, we can now focus on the complexity of \abbrStepTwo.
We can represent the factorized lineage polynomial by its correspoding arithmetic circuit $\circuit$ (whose size we denote by $|\circuit|$).
As we also show in \Cref{sec:circuit-runtime}, this size is also bounded by $\qruntime{Q, \tupset}$ (i.e., $|\circuit^*| \le O(\qruntime{Q, \tupset})$).
Thus, \AHchange{the question of approximation} %\Cref{prob:big-o-joint-steps}
Thus, the question of approximation %\Cref{prob:big-o-joint-steps}
can be reframed as:
\begin{Problem}[\Cref{prob:big-o-joint-steps} reframed]\label{prob:intro-stmt}
Given one circuit $\circuit$ that encodes $\apolyqdt$ for all result tuples $\tup$ (one sink per $\tup$) for \abbrBPDB $\pdb$ and $\raPlus$ query $\query$, does there exist an algorithm that computes a $(1\pm\epsilon)$-approximation of $\expct_{\vct{\db}\sim\pd}\pbox{\query\inparen{\vct{\db}}\inparen{\tup}}$ (for all result tuples $\tup$) in $\bigO{|\circuit|}$ time?
\end{Problem}
\rule[1cm]{\textwidth}{1.5pt}
}
\AHchange{