diff --git a/intro-rewrite-070921.tex b/intro-rewrite-070921.tex index e3f616e..75a8c85 100644 --- a/intro-rewrite-070921.tex +++ b/intro-rewrite-070921.tex @@ -222,6 +222,8 @@ The second step, \termStepTwo (\abbrStepTwo) consists of computing $\expct\pbox For \abbrBPDB $\pdb$, query $\query$, let $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit)$ denote the runtime of \abbrStepOne, when it outputs $\circuit$ (which is a representation of $\poly$-- more on this representation shortly). Let us denote by $\timeOf{\abbrStepTwo}(\circuit)$ (recall $\circuit$ is the output of \abbrStepOne) the runtime of \abbrStepTwo, allowing us to formally define our objective: +\input{two-step-model} + \begin{Problem}\label{prob:big-o-joint-steps} Given \abbrBPDB $\pdb$, $\raPlus$ query $\query$ and output tuple $\tup$, does there exist a $(1\pm\epsilon)$-approximation of $\expct_{\db\sim\pd}\pbox{\query\inparen{\db}\inparen{\tup}}$ (for all resuult tuples $\tup$) for some $\circuit$ such that @@ -229,10 +231,14 @@ $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit) + \timeOf{\abbrStepTwo}(\circuit) \le \end{Problem} Note that if the answer to the above problem is yes, then we have shown that the answer to \Cref{prob:informal} is yes (when we are interested in approximating the expected muktiplities). -We show in \Cref{sec:circuit-runtime}\OK{confirm this ref} an $O(\qruntime{Q, \dbbase})$ algorithm for constructing the lineage polynomial of the singleton result tuple of a count query. +We show in \Cref{sec:gen} +%\OK{confirm this ref} +%Atri: fixed the ref + an $O(\qruntime{Q, \dbbase})$ algorithm for constructing the lineage polynomial of result tuples of an $\raPlus$ query $\query$ (or more preicsely its representation $\circuit$). % , and by extension the first step is in \sharpwonehard\AH{\sharpwonehard is not defined.}. -A key insight of this paper is that the representation matters; -One can have compact representations of $\poly(\vct{X})$ (e.g., resulting from optimizations like projection push-down~\cite{DBLP:books/daglib/0020812}, which produce factorized representations of $\poly(\vct{X})$. +A key insight of this paper is that the representation of $\circuit$ matters. For example if we insist that $\circuit$ represent the lineage polynomial in the standard monomial basis (henceforth, \abbrSMB)\footnote{This is the representation where the polynomial is reresented as sum of `pure' products-- see \Cref{def:smb} for a formal definition.}, the answer to the above question in general is no, since then we will need $\abs{\circuit}\ge \Omega\inparen{\inparen{\qruntime{Q, \dbbase}}^k}$, and hence, just $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit)$ will be too large. + +However, one can have compact representations of $\poly(\vct{X})$ (e.g., resulting from optimizations like projection push-down~\cite{DBLP:books/daglib/0020812}, which produce factorized representations of $\poly(\vct{X})$. For example, in~\Cref{fig:two-step}, $B(Y+Z)$ is a factorized representation of the SMB-form $BY+BZ$. To capture such factorizations, this work uses (arithmetic) circuits\footnote{An arithmetic circuit has variable and/or numeric inputs, with internal nodes representing either an addition or multiplication operator.} as the representation system of $\poly(\vct{X})$. @@ -248,7 +254,6 @@ as the representation system of $\poly(\vct{X})$. % In this case, we have for any output tuple $\tup$, $\expct\pbox{\poly(\vct{W})}=\Phi(1,\dots,1)$. % Thus, we have another case where $\timeOf{\abbrStepTwo}(Q,\pdb)$ is $\bigO{\timeOf{\abbrStepOne}(Q,\pdb)}$ and we again achieve deterministic query runtime for $\query\inparen{\pdb}$ (up to a constant factor). These observations introduce our first formalization of~\Cref{prob:informal}: -\input{two-step-model} Given $\timeOf{\abbrStepOne}(Q,\pdb) = O(\qruntime{Q, \dbbase})$, we can now focus on the complexity of \abbrStepTwo. We can represent the factorized lineage polynomial by the size of its correspoding arithmetic circuit $\circuit$ (which we denote by $|\circuit|$). diff --git a/macros.tex b/macros.tex index 03b6c9b..f2b96c2 100644 --- a/macros.tex +++ b/macros.tex @@ -109,7 +109,7 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Incomplete DB/PDBs % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\newcommand{\idb}{\Omega} +\newcommand{\idb}{{\overline{\Omega}}} \newcommand{\pd}{{\mathcal{P}_{\idb}}}%pd for probability distribution \newcommand{\pdassign}{\mathcal{P}} \newcommand{\pdb}{\mathcal{D}}