This commit is contained in:
Boris Glavic 2021-09-17 13:04:41 -05:00
parent 007f36bfb5
commit f657c94086

View file

@ -226,8 +226,8 @@ In the remainder of this work, we demonstrate that a $(1\pm\epsilon)$ (multiplic
Like set-probabilistic databases, our approach adopts the two-step intensional model of query evaluation, as illustrated in \Cref{fig:two-step}:
(i) \termStepOne (\abbrStepOne): Given input $\dbbase$ and $\query$, output every tuple $\tup$ that possibly satisfies $\query$, annotated with its lineage polynomial ($\poly(\vct{X})=\apolyqdt\inparen{\vct{X}}$);
(ii) \termStepTwo (\abbrStepTwo): Given $\poly(\vct{X})$ for each tuple, compute $\expct\pbox{\poly(\vct{\randWorld})}$.
Let $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit)$ denote the runtime of \abbrStepOne when it outputs $\circuit$ (which is a representation of $\poly$ --- more on this representation shortly).
Respectively denote by $\timeOf{\abbrStepTwo}(\circuit)$ (recall $\circuit$ is the output of \abbrStepOne) the runtime of \abbrStepTwo, allowing us to formally define our objective:
Let $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit)$ denote the runtime of \abbrStepOne when it outputs $\circuit$ (which is a representation of $\poly$ as an arithmetic circuit --- more on this representation shortly).
Denote by $\timeOf{\abbrStepTwo}(\circuit)$ (recall $\circuit$ is the output of \abbrStepOne) the runtime of \abbrStepTwo, allowing us to formally define our objective:
\begin{Problem}\label{prob:big-o-joint-steps}
@ -240,7 +240,7 @@ Note that if the answer to the above problem is yes, then we have shown that the
We show in \Cref{sec:gen}
%\OK{confirm this ref}
%Atri: fixed the ref
an $O(\qruntime{Q, \dbbase})$ algorithm for constructing the lineage polynomial for all result tuples of an $\raPlus$ query $\query$ (or more more precisely, a single $\circuit$ with one sink per tuple representing the lineage).
an $O(\qruntime{Q, \dbbase})$ algorithm for constructing the lineage polynomial for all result tuples of an $\raPlus$ query $\query$ (or more more precisely, a single circuit $\circuit$ with one sink per tuple representing the lineage).
% , and by extension the first step is in \sharpwonehard\AH{\sharpwonehard is not defined.}.
A key insight of this paper is that the representation of $\circuit$ matters.
For example, if we insist that $\circuit$ represent the lineage polynomial in the standard monomial basis (henceforth, \abbrSMB)\footnote{
@ -268,7 +268,7 @@ as the representation system of $\poly(\vct{X})$.
Given that there exists a representation $\circuit$ such that $\timeOf{\abbrStepOne}(\query,\dbbase,\circuit)\le O(\qruntime{\query, \dbbase})$, we can now focus on the complexity of \abbrStepTwo.
We can represent the factorized lineage polynomial by the size of its correspoding arithmetic circuit $\circuit$ (which we denote by $|\circuit|$).
We can represent the factorized lineage polynomial by the size of its correspoding arithmetic circuit $\circuit$ (which we denote by $|\circuit|$).\BG{This sentence didn't parse for me. What do we mean by representing a polynomial by a size?}
As we also show in \Cref{sec:circuit-runtime}, this size is also bounded by $\qruntime{Q, \dbbase}$ (i.e., $|\circuit| = O(\qruntime{Q, \dbbase})$).
Thus, \Cref{prob:big-o-joint-steps} can be reframed as:
@ -278,7 +278,7 @@ Thus, \Cref{prob:big-o-joint-steps} can be reframed as:
% Suppose, on the contrary, that \circuit is not in \abbrSMB and rather in some factorized form. Then to naively compute \abbrStepTwo, one needs to convert \circuit into \circuit' such that \circuit' is in \abbrSMB, and then compute $\expct\pbox{\poly_\tup\inparen{\vct{\randWorld}}}$, which takes $\bigO{|\circuit|^k}$ time for the case that $k$ is the degree of the polynimial $\Phi_\tup(\vct{X})$. Since $|\circuit'|$ lies between $\bigO{|\circuit|}$ and $\bigO{|\circuit|^k}$, it behooves us to determine which of these extremes is true for the general \circuit. This leads us to the main problem statement of our paper:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Problem}\label{prob:intro-stmt}
Given one circuit $\circuit$ that encodes $\apolyqdt$ for all result tuples $\tup$ (one sink per $\tup$) for \abbrBPDB $\pdb$ and $\raPlus$ query $\query$, does there exist a $(1\pm\epsilon)$-approximation of $\expct_{\db\sim\pd}\pbox{\query\inparen{\db}\inparen{\tup}}$ (for all resuult tuples $\tup$) in $\bigO{|\circuit|}$ time?
Given one circuit $\circuit$ that encodes $\apolyqdt$ for all result tuples $\tup$ (one sink per $\tup$) for \abbrBPDB $\pdb$ and $\raPlus$ query $\query$, does there exist an algorithm that computes a $(1\pm\epsilon)$-approximation of $\expct_{\db\sim\pd}\pbox{\query\inparen{\db}\inparen{\tup}}$ (for all result tuples $\tup$) in $\bigO{|\circuit|}$ time?
%\OK{This doesn't parse. What is $\bigO{\abbrStepOne}$? Should this be $\bigO{\poly}$?}
\end{Problem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -286,7 +286,7 @@ Given one circuit $\circuit$ that encodes $\apolyqdt$ for all result tuples $\tu
%%%%%%%%%%%%%%%%%%%%%%%%%
%Contributions, Overview, Paper Organization
%%%%%%%%%%%%%%%%%%%%%%%%%
\mypar{Our upper bound results} We show that the answer to \Cref{prob:intro-stmt} (and hence the answer to \Cref{prob:big-o-joint-steps}) is a yes. In particular, we show the following upper bound results.
\mypar{Our upper bound results} We show that the answer to \Cref{prob:intro-stmt} (and hence the answer to \Cref{prob:big-o-joint-steps}) is yes. In particular, we show the following upper bound results.
%In this paper we tackle~\Cref{prob:bag-pdb-query-eval} to~\Cref{prob:intro-stmt}.
%Concretely, we make the following contributions:
%(i) %Under fine grained hardness assumption,
@ -301,7 +301,7 @@ Given one circuit $\circuit$ that encodes $\apolyqdt$ for all result tuples $\tu
%graph query for the special case of all $\prob_i = \prob$ for some $\prob$ in $(0, 1)$;
%(ii) To complement our hardness results, we consider an approximate version of~\Cref{prob:intro-stmt}, where instead of computing the expected multiplicity exactly, we allow for an $(1\pm\epsilon)$-\emph{multiplicative} approximation of the expected multiplicitly.
(i) We show that for typical database usage patterns (e.g. when the circuit is a tree or is generated by recent worst-case optimal join algorithms or their Functional Aggregate Query (FAQ)/Aggregations and Joins over Annotated Relations (AJAR) followups~\cite{DBLP:conf/pods/KhamisNR16, ajar}), where there is a single result tuple, the answer to \Cref{prob:intro-stmt} for \abbrTIDB is {\em yes}.\footnote{We can approximate the expected result tuple multiplicities (for all result tuples {\em simultanesouly} with only $O(\log{Z})=O_k(\log{n})$ overhead (where $Z$ is the number of result tuples) over the runtime of a broad class of query processing algorithms (see \Cref{app:sec-cicuits}).}
(i) We show that for typical database usage patterns\BG{Not sure what we mean by that?} (e.g. when the circuit is a tree or is generated by recent worst-case optimal join algorithms or their Functional Aggregate Query (FAQ)/Aggregations and Joins over Annotated Relations (AJAR) followups~\cite{DBLP:conf/pods/KhamisNR16, ajar}), where there is a single result tuple\BG{This sounds like we restricting the discussion to queries that return a single tuple}, the answer to \Cref{prob:intro-stmt} for \abbrTIDB is {\em yes}.\footnote{We can approximate the expected result tuple multiplicities (for all result tuples {\em simultanesouly} with only $O(\log{Z})=O_k(\log{n})$ overhead (where $Z$ is the number of result tuples) over the runtime of a broad class of query processing algorithms (see \Cref{app:sec-cicuits}).}
% the approximation algorithm has runtime linear in the size of the compressed lineage encoding (
In contrast, known approximation techniques in set-\abbrPDB\xplural are at most quadratic in the size of the compressed lineage encoding~\cite{DBLP:conf/icde/OlteanuHK10,DBLP:journals/jal/KarpLM89}.
%Atri: The footnote below does not add much
@ -318,7 +318,7 @@ SELECT 1 FROM OnTime a, Route r, OnTime b
WHERE a.city = r.city1 AND b.city = r.city2
\end{lstlisting}
%$Q()\dlImp$$OnTime(\text{City}), Route(\text{City}, \text{City}'),$ $OnTime(\text{City}')$
It can be verified that $\poly\inparen{A, B, C, E, X, Y, Z}$ for the sole result tuple (i.e. the count) of $\query$ is $AXB + BYE + BZC$. Now consider the product query $\query^2(\db) = \query(\db) \times \query(\db)$.
It can be verified that $\poly\inparen{A, B, C, E, X, Y, Z}$ for the sole result tuple (i.e. the count) of $\query$ is $AXB + BYE + BZC$. Now consider the product query $\query^2 = \query \times \query$.\BG{$\query(\db)$ is a query result, so I changed it to this}
The lineage polynomial for $Q^2$ is given by $\poly^2\inparen{A, B, C, E, X, Y, Z}$:\AR{Changed the variable $D$ to $E$ to avoid conflict with use of $D$ as a DB.}
$$%\begin{multline*}
@ -347,7 +347,7 @@ $\expct\limits_{\vct{\randWorld}\sim\pdassign}\pbox{\poly^2\inparen{\vct{\randWo
\end{footnotesize}
\noindent This property leads us to consider a structure related to the lineage polynomial.
\begin{Definition}\label{def:reduced-poly}
For any polynomial $\poly(\vct{X})$ corresponding to a \abbrTIDB, define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the \abbrSMB form of $\poly(\vct{X})$ to $1$.
For any polynomial $\poly(\vct{X})$ corresponding to a \abbrTIDB\BG{Better introduce the notion of TIDB lin poly before here, then it iis more clear?}, define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the \abbrSMB form of $\poly(\vct{X})$ to $1$.
\end{Definition}
With $\poly^2\inparen{A, B, C, E, X, Y, Z}$ as an example, we have:
\begin{align*}
@ -360,7 +360,7 @@ Note that we have argued that for our specific example the expectation that we w
\begin{Lemma}\label{lem:tidb-reduce-poly}
Let $\pdb$ be a \abbrTIDB over $n$ input tuples
such that the probability distribution $\pdassign$ over $\vct{W}\in\{0,1\}^\numvar$ (the set of possible worlds) is induced by the probability vector $\probAllTup = \inparen{\prob_1,\ldots,\prob_\numvar}$ where $\prob_i=\probOf\pbox{W_i=1}$.
For any \abbrTIDB-lineage polynomial $\poly\inparen{\vct{X}}=\apolyqdt(\vct{X})$, it holds that $
For any \abbrTIDB-lineage polynomial\BG{Term has not been introduced yet.} $\poly\inparen{\vct{X}}=\apolyqdt(\vct{X})$, it holds that $
\expct_{\vct{W} \sim \pdassign}\pbox{\poly\inparen{\vct{W}}} = \rpoly\inparen{\probAllTup}.
$
\end{Lemma}