Done with intro

master
Atri Rudra 2021-09-18 00:24:28 -04:00
parent a60e4fdf93
commit f6ba1e8a2a
2 changed files with 42 additions and 28 deletions

View File

@ -29,7 +29,7 @@ A common encoding of probabilistic databases (e.g., in \cite{IL84a,Imielinski198
%Each valuation of the random variables appearing in this formula corresponds to one possible world.
%Given a joint probability distribution over such assignments, the marginal probability of a query result tuple $\tup$ is the probability that the lineage formula of $\tup$ evaluates to true. Given a \abbrBPDB $\pdb$, we refer to the above encoding of $\pdb$ as \dbbaseName and denote it as $\dbbase$.
%
The bag semantics analog is a provenance/lineage polynomial $\apolyqdt$~\cite{DBLP:conf/pods/GreenKT07} (see~\Cref{fig:nxDBSemantics} for a definition), a polynomial with integer coefficients and exponents, over integer variables $\vct{X}$ encoding input tuple multiplicities.
The bag semantics analog is a provenance/lineage polynomial $\apolyqdt$~\cite{DBLP:conf/pods/GreenKT07} (see~\Cref{fig:nxDBSemantics} for a definition), a polynomial with non-zero integer coefficients and exponents, over integer variables $\vct{X}$ encoding input tuple multiplicities.
\begin{figure}
\begin{align*}
\polyqdt{\project_A(\query)}{\dbbase}{\tup} =& \sum_{\tup': \project_A(\tup') = \tup} \polyqdt{\query}{\dbbase}{\tup'} &
@ -84,7 +84,7 @@ We note that \Cref{prob:bag-pdb-query-eval} is equivalent to \Cref{prob:bag-pdb-
In this work, we study the complexity of \Cref{prob:bag-pdb-poly-expected} for several models of probabilistic databases and various encodings of such polynomials.
\mypar{\abbrTIDB\xplural}
We initially focus on tuple-independent probabilistic bag-databases\footnote{See \cite{DBLP:series/synthesis/2011Suciu} for a survey of set-\abbrTIDBs; The bag encoding is analogous~\cite{DBLP:conf/pods/GreenKT07}.} (\abbrTIDB\xplural), a compressed encoding of probabilistic databases where the presence of each individual tuple (out of a total of $\numvar$ input tuples) in a possible world is modeled as an independent probabilistic event.\footnote{
We initially focus on tuple-independent probabilistic bag-databases\footnote{See \cite{DBLP:series/synthesis/2011Suciu} for a survey of set-\abbrTIDBs; the bag encoding is analogous~\cite{DBLP:conf/pods/GreenKT07}.} (\abbrTIDB\xplural), a compressed encoding of probabilistic databases where the presence of each individual tuple (out of a total of $\numvar$ input tuples) in a possible world is modeled as an independent probabilistic event.\footnote{
This model is exactly the definition of \abbrTIDB{}s \cite{VS17} under classical set semantics.
Mirroring the implementation of bag relations in production database systems (e.g., Postgresql, DB2), tuple multiplicities are modeled by retaining copies of each tuple (up to its largest possible multiplicity).
% To make each duplicate tuple unique in a set-\abbrTIDB we can assign unique keys across all duplicates.
@ -117,13 +117,13 @@ Thanks to linearity of expectation, simple polynomial-time algorithms (for fixed
% The algo is trivial so I think putting in a 2010 cite seems like bit too much
%\cite{kennedy:2010:icde:pip})
% for computing exact results for bag-probabilistic count queries $Q$ over \abbrTIDB{}s.
However, it is also known that since we are considering data complexity, that {\em deterministic} query processing for the same query $Q$ (over the deterministic database instance $\dbbase$) can also be done in polynomial time.
However, it is also known that since we are considering parameterized complexity, that {\em deterministic} query processing for the same query $Q$ (over the deterministic database instance $\dbbase$) can also be done in polynomial time.
If our notion of efficiency were simply achieving a polynomial time algorithm, then we would be done.
However, in practice (and in theory), we care about the {\em fine-grained} complexity of deterministic query processing (i.e. we care about the exact exponent in our polynomial runtime).
For \abbrBPDB $\pdb$ and query $Q$, let $\timeOf{}^*(Q,\pdb)$ denote the (optimal) runtime complexity of \Cref{prob:bag-pdb-query-eval} (over all result tuples $\tup$).\AR{Am changing these runtime definitions to include the runtime for all result tuples $\tup$.}
Denote by $\qruntime{Q, \db}$ the `runtime' of query $Q$ on deterministic database $\db$ under a cost model that is satisfied by a wide range of query processing algorithms, including those based on the recent work on worst-case optimal join algorithms (we make this runtime concrete in \Cref{sec:gen}\AR{We need to move the definition of $\qruntime{}$ to \Cref{sec:background} because among others we now need it in our lower bound arguments as well.}).
For \abbrBPDB $\pdb$ and query $Q$, let $\timeOf{}^*(Q,\pdb)$ denote the (optimal) runtime complexity of \Cref{prob:bag-pdb-query-eval} (over all result tuples $\tup$). %\AR{Am changing these runtime definitions to include the runtime for all result tuples $\tup$.}
Denote by $\qruntime{Q, \db}$ the `runtime' of query $Q$ on deterministic database $\db$ under a cost model that is satisfied by a wide range of query processing algorithms, including those based on the recent work on worst-case optimal join algorithms (we make this runtime concrete in \Cref{sec:gen}. %\AR{We need to move the definition of $\qruntime{}$ to \Cref{sec:background} because among others we now need it in our lower bound arguments as well.}).
%Denoting by $\dbbase = \bigcup_{\db \in \idb} \db$ the set of all possible tuples in \abbrPDB $\pdb = \inparen{\idb, \pd}$,
We finally have all the pieces to state a formal specification of our problem:
@ -155,7 +155,7 @@ $\Omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{1+\eps_0}}$ for {\em some}
%\hline
$\omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{C_0}}$ for {\em all} $C_0>0$ & Multiple &$\sharpwzero\ne\sharpwone$\\
%\hline
$\Omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{c_0\cdot k}}$ for {\em some} $c_0>0$ & Multiple & Current $k$-matching algorithms\\
$\Omega\inparen{\inparen{\qruntime{\query, \dbbase}}^{c_0\cdot k}}$ for {\em some} $c_0>0$ & Multiple & \Cref{conj:known-algo-kmatch}\\ %Multiple & Current $k$-matching algorithms\\
\hline
\end{tabular}
\caption{Our lower bounds for a specific hard query $Q$ parameterized by $k$. The $\pdb$ is over the same $\dbbase$ and those with `Multiple' in the second column need the algorithm to be able to handle multiple $\pd$. The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
@ -166,7 +166,7 @@ To make some sense of the other lower bounds in Table~\ref{tab:lbs}, we note tha
%\footnote{
% We note similar hardness results for determinsitic query processing that apply lower bounds in terms of $\abs{\dbbase}$. Our lower bounds are in terms of $\qruntime{Q,\dbbase}$, which in general can be super-linear in $\abs{\dbbase}$.
%}
However, this result assumes a hardness conjecture that is not as well studied as those in the first two rows of the table (see \Cref{sec:hard} for more discussion on the hardness assumptions). Further, we note that existing results already imply the claimed lower bounds if we were to replace the $\qruntime{\query, \dbbase}$ by just $\abs{\dbbase}$ (indeed these results follow from known lower bound for deterministic query processing), our contribution is to then identify a family of hard query where deterministic query procedding is `easy' but computing the expected multiplicities is hard. To put these hardness results in context, we will next take a short detour to review the existing hardness results for \abbrPDB\xplural under set semantics.
However, this result assumes a hardness conjecture that is not as well studied as those in the first two rows of the table (see \Cref{sec:hard} for more discussion on the hardness assumptions). Further, we note that existing results already imply the claimed lower bounds if we were to replace the $\qruntime{\query, \dbbase}$ by just $\abs{\dbbase}$ (indeed these results follow from known lower bound for deterministic query processing). Our contribution is to then identify a family of hard query where deterministic query procedding is `easy' but computing the expected multiplicities is hard. To put these hardness results in context, we will next take a short detour to review the existing hardness results for \abbrPDB\xplural under set semantics.
% Atri: Converting sub-section to para since it saves space
@ -193,7 +193,7 @@ We note that the \sharpphard lower bound is much stronger than what one can hope
%Atri: Removed the para above since the above does not seem to add much to the current intro flow.
%Such a guarantee is not possible
For queries on the hard side of the dichotomy, the best known algorithmic approach is the \emph{intensional} query evaluation~\cite{DBLP:series/synthesis/2011Suciu}, where one explicitly computes the lineage polynomial and then its expectation --- we will come back this framework shortly. % as in \Cref{prob:bag-pdb-poly-expected}. % , a two step process that first computes the lineage of the query result --- a representation of $\Phi_\tup$ --- which it then uses to compute the desired probability.
For queries on the hard side of the dichotomy, the best known algorithmic approach is the \emph{intensional} query evaluation~\cite{DBLP:series/synthesis/2011Suciu}, where one explicitly computes the lineage formula and then its expectation --- we will come back this framework shortly. % as in \Cref{prob:bag-pdb-poly-expected}. % , a two step process that first computes the lineage of the query result --- a representation of $\Phi_\tup$ --- which it then uses to compute the desired probability.
%The complexity of this approach is, in general, dominated by computing the expectation $\expct\pbox{\apolyqdt(\vct{\randWorld})}$, a problem known to be \sharpphard~\cite{DS07}.
@ -217,11 +217,11 @@ For queries on the hard side of the dichotomy, the best known algorithmic approa
%The first step, which we will refer to as \termStepOne (\abbrStepOne), consists of computing both $\query\inparen{\db}$ and $\poly_\tup(\vct{X})$.\footnote{Assuming standard $\raPlus$ query processing algorithms, computing the lineage polynomial of $\tup$ is upperbounded by the runtime of deterministic query evaluation of $\tup$, as we show in \Cref{sec:circuit-runtime}.} The second step is \termStepTwo (\abbrStepTwo), which consists of computing $\expct\pbox{\poly_\tup(\vct{\randWorld})}$. Such a model of computation is nicely followed in set-\abbrPDB semantics \cite{DBLP:series/synthesis/2011Suciu}, where $\poly_\tup\inparen{\vct{X}}$ must be computed separate from deterministic query evaluation to obtain exact output when $\query(\pdb)$ is hard since evaluating the probability inline with query operators (extensional evaluation) will only approximate the actual probability in such a case. The paradigm of \Cref{fig:two-step} is also analogous to semiring provenance, where $\semNX$-DB\footnote{An $\semNX$-DB is a database whose tuples are annotated with elements from the set of polynomials with variables in $\vct{X}$ and natural number coeficients and exponents.} query processing \cite{DBLP:conf/pods/GreenKT07} first computes the query and polynomial, and the $\semNX$-polynomial can then subsequently evaluated over a semantically appropriate semiring, e.g. $\semN$ to model bag semantics. Further, in this work, the intensional model lends itself nicely in separating the concerns of deterministic computation and the probability computation.
\mypar{Approximating the expected multiplicities}
\AR{Have done my pass till here}
Our negative results indicate that \abbrBPDB{}s can not achieve comparable performance to deterministic databases for exact results (under standard complexity assumptions). In fact, under plausible hardness conjectures, one cannot improve upon the trivial algorithm to exactly compute the expected multiplicities for \abbrTIDB. A natural followup is whether we can do better if we are willing to settle for an approximation to the expected multiplities.
%\AR{Have done my pass till here}
Our negative results indicate that \abbrBPDB{}s can not achieve comparable performance to deterministic databases for exact results (under complexity assumptions). In fact, under plausible hardness conjectures, one cannot improve upon the trivial algorithm to exactly compute the expected multiplicities for \abbrTIDB. A natural followup is whether we can do better if we are willing to settle for an approximation to the expected multiplities.
In the remainder of this work, we demonstrate that a $(1\pm\epsilon)$ (multiplicative) approximation with competitive performance is achievable.
\input{two-step-model}
\input{two-step-model}\AR{In \Cref{fig:two-step}, perhaps in caption give f/w pointer to definition of $Q$?}
We adopt the two-step intensional model of query evaluation used in set-\abbrPDB\xplural, as illustrated in \Cref{fig:two-step}:
(i) \termStepOne (\abbrStepOne): Given input $\dbbase$ and $\query$, output every tuple $\tup$ that possibly satisfies $\query$, annotated with its lineage polynomial ($\poly(\vct{X})=\apolyqdt\inparen{\vct{X}}$);
@ -237,7 +237,7 @@ $\exists \circuit : \timeOf{\abbrStepOne}(Q,\dbbase,\circuit) + \timeOf{\abbrSte
\end{Problem}
Note that if the answer to the above problem is yes, then we have shown that the answer to \Cref{prob:informal} is yes (when we are interested in approximating the expected multiplicities).
We show in \Cref{sec:gen}
We show in \Cref{sec:gen}\AR{Refs needs to be updated}
%\OK{confirm this ref}
%Atri: fixed the ref
an $O(\qruntime{Q, \dbbase})$ algorithm for constructing the lineage polynomial for all result tuples of an $\raPlus$ query $\query$ (or more more precisely, a single circuit $\circuit$ with one sink per tuple representing the lineage).
@ -245,7 +245,10 @@ We show in \Cref{sec:gen}
A key insight of this paper is that the representation of $\circuit$ matters.
For example, if we insist that $\circuit$ represent the lineage polynomial in the standard monomial basis (henceforth, \abbrSMB)\footnote{
This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
}, the answer to the above question in general is no, since then we will need $\abs{\circuit}\ge \Omega\inparen{\inparen{\qruntime{Q, \dbbase}}^k}$\BG{should be $|\idb |$?}, and hence, just $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit)$ will be too large.
}, the answer to the above question in general is no, since then we will need $\abs{\circuit}\ge \Omega\inparen{\inparen{\qruntime{Q, \dbbase}}^k}$,
%\BG{should be $|\idb |$?},
%Atri: No, this is fine
and hence, just $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit)$ will be too large.
However, systems can directly emit compact, factorized representations of $\poly(\vct{X})$ (e.g., as a consequence of the standard projection push-down optimization~\cite{DBLP:books/daglib/0020812}).
For example, in~\Cref{fig:two-step}, $B(Y+Z)$ is a factorized representation of the SMB-form $BY+BZ$.
@ -268,8 +271,10 @@ as the representation system of $\poly(\vct{X})$.
Given that there exists a representation $\circuit$ such that $\timeOf{\abbrStepOne}(\query,\dbbase,\circuit)\le O(\qruntime{\query, \dbbase})$, we can now focus on the complexity of \abbrStepTwo.
We can represent the factorized lineage polynomial by the size of its correspoding arithmetic circuit $\circuit$ (which we denote by $|\circuit|$).\BG{This sentence didn't parse for me. What do we mean by representing a polynomial by a size?}
As we also show in \Cref{sec:circuit-runtime}, this size is also bounded by $\qruntime{Q, \dbbase}$ (i.e., $|\circuit| = O(\qruntime{Q, \dbbase})$).
We can represent the factorized lineage polynomial by its correspoding arithmetic circuit $\circuit$ (whose size we denote by $|\circuit|$).
%\BG{This sentence didn't parse for me. What do we mean by representing a polynomial by a size?}
%Atri: fixed
As we also show in \Cref{sec:circuit-runtime}, this size is also bounded by $\qruntime{Q, \dbbase}$ (i.e., $|\circuit| \le O(\qruntime{Q, \dbbase})$).
Thus, \Cref{prob:big-o-joint-steps} can be reframed as:
%Atri: Replaced the text below by the above. I know I had talked about $|\circuit|^k$ but I think the stuff below breaks the flow a bit
@ -301,15 +306,16 @@ Given one circuit $\circuit$ that encodes $\apolyqdt$ for all result tuples $\tu
%graph query for the special case of all $\prob_i = \prob$ for some $\prob$ in $(0, 1)$;
%(ii) To complement our hardness results, we consider an approximate version of~\Cref{prob:intro-stmt}, where instead of computing the expected multiplicity exactly, we allow for an $(1\pm\epsilon)$-\emph{multiplicative} approximation of the expected multiplicitly.
(i) We show that for typical database usage patterns\BG{Not sure what we mean by that?} (e.g. when the circuit is a tree or is generated by recent worst-case optimal join algorithms or their Functional Aggregate Query (FAQ)/Aggregations and Joins over Annotated Relations (AJAR) followups~\cite{DBLP:conf/pods/KhamisNR16, ajar}), where there is a single result tuple\BG{This sounds like we restricting the discussion to queries that return a single tuple}, the answer to \Cref{prob:intro-stmt} for \abbrTIDB is {\em yes}.\footnote{We can approximate the expected result tuple multiplicities (for all result tuples {\em simultanesouly} with only $O(\log{Z})=O_k(\log{n})$ overhead (where $Z$ is the number of result tuples) over the runtime of a broad class of query processing algorithms (see \Cref{app:sec-cicuits}).}
(i) We show that for typical database usage patterns\BG{Not sure what we mean by that?} (e.g. when the circuit is a tree or is generated by recent worst-case optimal join algorithms or their Functional Aggregate Query (FAQ)/Aggregations and Joins over Annotated Relations (AJAR) followups~\cite{DBLP:conf/pods/KhamisNR16, ajar}), where there is a single\footnote{We can approximate the expected result tuple multiplicities (for all result tuples {\em simultanesouly}) with only $O(\log{Z})=O_k(\log{n})$ overhead (where $Z$ is the number of result tuples) over the runtime of a broad class of query processing algorithms (see \Cref{app:sec-cicuits}).}
result tuple\BG{This sounds like we restricting the discussion to queries that return a single tuple}\AR{Does moving the footnote help?}, the answer to \Cref{prob:intro-stmt} for \abbrTIDB is {\em yes}.
% the approximation algorithm has runtime linear in the size of the compressed lineage encoding (
In contrast, known approximation techniques in set-\abbrPDB\xplural are at most quadratic in the size of the compressed lineage encoding~\cite{DBLP:conf/icde/OlteanuHK10,DBLP:journals/jal/KarpLM89}.
%Atri: The footnote below does not add much
%\footnote{Note that this doesn't rule out queries for which approximation is linear});
(ii) We generalize the \abbrPDB data model considered by the approximation algorithm to a class of bag-Block Independent Disjoint Databases (see \Cref{subsec:tidbs-and-bidbs}) (\abbrBIDB\xplural);
(ii) We generalize the \abbrPDB data model considered by the approximation algorithm to a class of bag-Block Independent Disjoint Databases (see \Cref{subsec:tidbs-and-bidbs}) (\abbrBIDB\xplural).
%\AH{This point \emph{\Large seems} weird to me. I thought we just said that the approximation complexity is linear in step one, but now it's as if we're saying that it's $\log{\text{step one}} + $ the runtime of step one. Where am I missing it?}
%\OK{Atri's (and most theoretician's) statements about complexity always need to be suffixed with ``to within a log factor''}
(iii) We finally observe that our results trivially extend to higher moments of the tuple multiplicity (instead of just the expectation).
%(iii) We finally observe that our results trivially extend to higher moments of the tuple multiplicity (instead of just the expectation).
\mypar{Overview of our Techniques} All of our results rely on working with a {\em reduced} form of the lineage polynomial $\poly$. In fact, it turns out that for the \abbrTIDB (and \abbrBIDB) case, computing the expected multiplicity is {\em exactly} the same as evaluating this reduced polynomial over the probabilities that define the \abbrTIDB/\abbrBIDB. Next, we motivate this reduced polynomial.
Consider the query $\query$ defined as follows over the bag relations of \Cref{fig:two-step}:
@ -318,16 +324,18 @@ SELECT 1 FROM OnTime a, Route r, OnTime b
WHERE a.city = r.city1 AND b.city = r.city2
\end{lstlisting}
%$Q()\dlImp$$OnTime(\text{City}), Route(\text{City}, \text{City}'),$ $OnTime(\text{City}')$
It can be verified that $\poly\inparen{A, B, C, E, X, Y, Z}$ for the sole result tuple (i.e. the count) of $\query$ is $AXB + BYE + BZC$. Now consider the product query $\query^2 = \query \times \query$.\BG{$\query(\db)$ is a query result, so I changed it to this}
It can be verified that $\poly\inparen{A, B, C, E, X, Y, Z}$ for the sole result tuple (i.e. the count) of $\query$ is $AXB + BYE + BZC$. Now consider the product query $\query^2 = \query \times \query$.
%\BG{$\query(\db)$ is a query result, so I changed it to this}
%Atri: Sounds good.
The lineage polynomial for $Q^2$ is given by $\poly^2\inparen{A, B, C, E, X, Y, Z}$:\AR{Changed the variable $D$ to $E$ to avoid conflict with use of $D$ as a DB.}
The lineage polynomial for $Q^2$ is given by $\poly^2\inparen{A, B, C, E, X, Y, Z}$:%\AR{Changed the variable $D$ to $E$ to avoid conflict with use of $D$ as a DB.}
$$%\begin{multline*}
%\inparen{AXB + BYE + BZC}^2\\
=A^2X^2B^2 + B^2Y^2E^2 + B^2Z^2C^2 + 2AXB^2YE + 2AXB^2ZC + 2B^2YEZC.
$$%\end{multline*}
By exploiting linearity of expectation, further pushing expectation through independent \abbrTIDB variables and observing that for any $\randWorld\in\{0, 1\}$, we have $\randWorld^2=\randWorld$, the expectation
\AH{If we choose to use $\pd$ in \Cref{prob:bag-pdb-poly-expected}, then we either need to follow the same convention here OR introduce the notation $\pdassign$ before using it.}
$\expct\limits_{\vct{\randWorld}\sim\pdassign}\pbox{\poly^2\inparen{\vct{\randWorld}}}$ (Where $\randWorld_A$ is the random variable corresponding to $A$, distributed as $\pdassign$).
By exploiting linearity of expectation, further pushing expectation through independent \abbrTIDB variables and observing that for any $\randWorld\in\{0, 1\}$, we have $\randWorld^2=\randWorld$, the expectation is
%\AH{If we choose to use $\pd$ in \Cref{prob:bag-pdb-poly-expected}, then we either need to follow the same convention here OR introduce the notation $\pdassign$ before using it.}
$\expct\limits_{\vct{\randWorld}\sim\pdassign}\pbox{\poly^2\inparen{\vct{\randWorld}}}$ (where $\randWorld_A$ is the random variable corresponding to $A$, distributed as $\pdassign$).
%Atri: Combined the the first step below with the next one to save space.
%\begin{footnotesize}
@ -347,7 +355,10 @@ $\expct\limits_{\vct{\randWorld}\sim\pdassign}\pbox{\poly^2\inparen{\vct{\randWo
\end{footnotesize}
\noindent This property leads us to consider a structure related to the lineage polynomial.
\begin{Definition}\label{def:reduced-poly}
For any polynomial $\poly(\vct{X})$ corresponding to a \abbrTIDB\BG{Better introduce the notion of TIDB lin poly before here, then it iis more clear?}, define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the \abbrSMB form of $\poly(\vct{X})$ to $1$.
For any polynomial $\poly(\vct{X})$ corresponding to a \abbrTIDB (henceforth, \abbrTIDB-lineage polynomial),
%\BG{Better introduce the notion of TIDB lin poly before here, then it iis more clear?},
%Atri: Done
define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the \abbrSMB form of $\poly(\vct{X})$ to $1$.
\end{Definition}
With $\poly^2\inparen{A, B, C, E, X, Y, Z}$ as an example, we have:
\begin{align*}
@ -360,7 +371,10 @@ Note that we have argued that for our specific example the expectation that we w
\begin{Lemma}\label{lem:tidb-reduce-poly}
Let $\pdb$ be a \abbrTIDB over $n$ input tuples
such that the probability distribution $\pdassign$ over $\vct{W}\in\{0,1\}^\numvar$ (the set of possible worlds) is induced by the probability vector $\probAllTup = \inparen{\prob_1,\ldots,\prob_\numvar}$ where $\prob_i=\probOf\pbox{W_i=1}$.
For any \abbrTIDB-lineage polynomial\BG{Term has not been introduced yet.} $\poly\inparen{\vct{X}}=\apolyqdt(\vct{X})$, it holds that $
For any \abbrTIDB-lineage polynomial
%\BG{Term has not been introduced yet.}
%Atri: fixed
$\poly\inparen{\vct{X}}=\apolyqdt(\vct{X})$, it holds that $
\expct_{\vct{W} \sim \pdassign}\pbox{\poly\inparen{\vct{W}}} = \rpoly\inparen{\probAllTup}.
$
\end{Lemma}
@ -386,11 +400,11 @@ we get that $\poly^2\inparen{\vct{\prob}}$ is in the range $[\inparen{p_0}^3\cdo
To get an $(1\pm \epsilon)$-multiplicative approximation we uniformly sample monomials from the \abbrSMB representation of $\poly$ and `adjust' their contribution to $\widetilde{\poly}\left(\cdot\right)$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\mypar{Paper Organization} We present relevant background and notation in \Cref{sec:background}. We then prove our main hardness results in \Cref{sec:hard} and present our approximation algorithm in \Cref{sec:algo}. We present some (easy) generalizations of our results in \Cref{sec:gen}.
\mypar{Paper Organization} We present relevant background and notation in \Cref{sec:background}. We then prove our main hardness results in \Cref{sec:hard} and present our approximation algorithm in \Cref{sec:algo}. %We present some (easy) generalizations of our results in \Cref{sec:gen}.
%and also discuss extensions from computing expectations of polynomials to the expected result multiplicity problem
%\AH{I don't think I understand what the sentence (about extensions) is saying.}
% (\Cref{def:the-expected-multipl}).
Finally, we discuss related work in \Cref{sec:related-work} and conclude in \Cref{sec:concl-future-work}. All proofs are in the appendix.
Finally, we discuss related work in \Cref{sec:related-work} and conclude in \Cref{sec:concl-future-work}. All proofs and our responses to ICDT first cycle reviewer comments are in the appendix.\AR{Would be good to have a specific app ref to rebuttal}
%%% Local Variables:

View File

@ -126,7 +126,7 @@
\node[below=0.2cm of rrect]{{\LARGE $\expct\pbox{\poly(\vct{X})}$}};
\end{tikzpicture}
}
\caption{Intensional Query ($\query$) Evaluation Model}
\caption{Intensional Query ($\query$) Evaluation Model.}
\label{fig:two-step}
\end{figure}