diff --git a/intro-rewrite-070921.tex b/intro-rewrite-070921.tex index d5b1390..1227328 100644 --- a/intro-rewrite-070921.tex +++ b/intro-rewrite-070921.tex @@ -12,7 +12,7 @@ $\pdb = \inparen{\inset{0,\ldots, c}^\numvar, \mathcal{P}}$ is a bag of tuples s %Since each tuple in $\pdb$ has a mutually exclusive probability distribution over its possible multiplicities, it is natural to reduce a \abbrCTIDB to traditional (set) block independent database (\abbrBIDB). We refer to the reduced \abbrBIDB as a $1$-\abbrBIDB, as it is the case that each tuple can appear in a possible world at most $c = 1$ time. \Cref{fig:ctidb-red} shows an example of this reduction. %} \secrev{ -Allowing for $\leq c$ multiplicities across all tuples gives rise to having $\leq \inparen{c+1}^\numvar$ possible worlds instead of the usual $2^\numvar$ possible worlds of a (set) $1$-\abbrTIDB. +Allowing for $\leq c$ multiplicities across all tuples gives rise to having $\leq \inparen{c+1}^\numvar$ possible worlds instead of the usual $2^\numvar$ possible worlds of the traditional set \abbrTIDB. In this work, it is natural to be specifically considering bag query semantics. We can formally state this problem as: @@ -157,7 +157,10 @@ Further, we generalize the \abbrPDB data model considered by the approximation a } \secrev{ \subsection{Polynomial Equivalence} -A common encoding of probabilistic databases (e.g., in \cite{IL84a,Imielinski1989IncompleteII,Antova_fastand,DBLP:conf/vldb/AgrawalBSHNSW06} and many others) relies on annotating tuples with lineages, propositional formulas that describe the set of possible worlds that the tuple appears in. The bag semantics analog is a provenance/lineage polynomial $\apolyqdt$~\cite{DBLP:conf/pods/GreenKT07} (see~\Cref{fig:nxDBSemantics} for a definition), a polynomial with non-zero integer coefficients and exponents, over integer variables $\vct{X}$ encoding input tuple multiplicities. +A common encoding of probabilistic databases (e.g., in \cite{IL84a,Imielinski1989IncompleteII,Antova_fastand,DBLP:conf/vldb/AgrawalBSHNSW06} and many others) relies on annotating tuples with lineages, propositional formulas that describe the set of possible worlds that the tuple appears in. The bag semantics analog is a provenance/lineage polynomial $\apolyqdt$~\cite{DBLP:conf/pods/GreenKT07}, a polynomial with non-zero integer coefficients and exponents, over integer variables $\vct{X}$ encoding input tuple multiplicities. + +Intuitively, a \abbrCTIDB lends itself to a useful reduction to a specific type of block independent database (\abbrBIDB) which we refer to as a $1$-\abbrBIDB. A $1$-\abbrBIDB is a \abbrBIDB in the traditional sense of allowing no duplicate tuples, \emph{but} where we use bag query semantics instead of the usual set query semantics. +(see~\Cref{fig:nxDBSemantics} for a definition) \begin{figure} \begin{align*} \polyqdt{\project_A(\query)}{\dbbase}{\tup} =& \sum_{\tup': \project_A(\tup') = \tup} \polyqdt{\query}{\dbbase}{\tup'} & @@ -182,19 +185,20 @@ A common encoding of probabilistic databases (e.g., in \cite{IL84a,Imielinski198 We drop $\query$, $\dbbase$, and $\tup$ from $\apolyqdt$ when they are clear from the context or irrelevant to the discussion. We now specify the problem of computing the expectation of tuple multiplicity in the language of lineage polynomials: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{Problem}[Expected Multiplicity of Lineage Polynomials]\label{prob:bag-pdb-poly-expected} -Given an $\raPlus$ query $\query$, -\AHchange{ -\abbrCTIDB $\pdb$ -} -and result tuple $\tup$, compute the expected -multiplicity of the polynomial $\apolyqdt$ (i.e., $\expct_{\vct{W}\sim \pdassign}\pbox{\apolyqdt(\vct{W})}$)., -where $\pdassign$ is the distribution induced by $\pd$ on the relevant assignments $\vct{W}$ to variables of $\apolyqdt$. +Given an $\raPlus$ query $\query$, \abbrCTIDB $\pdb$ and result tuple $\tup$, compute the expected +multiplicity of the polynomial $\apolyqdt$ (i.e., $\expct_{\vct{W}\sim \pdassign}\pbox{\apolyqdt(\vct{W})}$). +%, +%where $\pdassign$ is the distribution induced by $\pd$ on the relevant assignments $\vct{W}$ to variables of $\apolyqdt$. \end{Problem} We note that computing \Cref{prob:expect-mult} is equivalent to computing \Cref{prob:bag-pdb-poly-expected} (see \Cref{prop:expection-of-polynom}). In this work, we study the complexity of \Cref{prob:bag-pdb-poly-expected} for several models of probabilistic databases and various encodings of such polynomials. } +\AHchange{ +\LARGE Old Stuff +} + A probabilistic database (PDB) $\pdb$ is a pair $\inparen{\idb, \pd}$, where $\idb$ is a set of deterministic database instances called possible worlds and $\pd$ is a probability distribution over $\idb$. \AHchange{ A tuple independent database (\abbrTIDB) (to which we will refer to later) is a \abbrPDB such that each tuple is an independent random event.