Changes I apparently hadn't commited.

2021-08-23 09:01:25 -04:00 · 2021-08-23 09:01:25 -04:00 · f7cc75ef6c
parent a077d52e3e
commit f7cc75ef6c
2 changed files with 4 additions and 2 deletions
--- a/intro-rewrite-070921.tex
+++ b/intro-rewrite-070921.tex
@ -1,5 +1,5 @@
 %root: main.tex
-\section{Introduction (Rewrite - 070921)}
+\section{Introduction (Rewrite - 070921)}\label{sec:intro-rewrite-070921}
 \input{two-step-model}
 A tuple independent probabilistic database\footnote{In \cref{sec:background} and beyond, we generalize the data model.} (\abbrTIDB) $\pdb$ is a tuple $\inparen{\idb, \pd}$ such that $\idb$ is the set of deterministic instances of $\pdb$ (possible worlds) and $\pd$ is the probability distribution over $\idb$.  $\pdb$ can equivalently encoded as deterministic database with $\numvar$ tuples, with $\pd$ 
 %with a deterministic table $\encodedDB$ which is a set of $\numvar$ tuples, encoding the set of possible worlds $\idb$.  The probability distribution $\pd$ over the set of database instances (possible worlds) is the one
@ -25,7 +25,7 @@ There exist some queries for which \emph{bag}-\abbrPDB\xplural are a more natura
 %END Needs to be noted.

 A natural question is whether or not we can quantify the complexity of computing $\expct\pbox{\poly_\tup\inparen{\vct{\randWorld}}}$ separately from the complexity of deterministic query evaluation, effectively dividing \abbrPDB query evaluation into two steps: deterministic query evaluation\footnote{Given input $\pdb$, this step includes outputting every tuple $\tup$ that satisfies $\query$, annotated with its lineage polynomial ($\poly_\tup$) which is computed inline across the query operators of $\query$.\cite{Imielinski1989IncompleteII}\cite{DBLP:conf/pods/GreenKT07}} and computing expectation.  Viewing  \abbrPDB query evaluation as these two seperate steps is also known as intensional evaluation \cite{DBLP:series/synthesis/2011Suciu}, illustrated in \cref{fig:two-step}.  
-The first step, which we will refer to as \termStepOne (\abbrStepOne), consists of computing both $\query\inparen{\db}$ and $\poly_\tup(\vct{X})$.\footnote{Assuming standard $\raPlus$ query processing algorithms, computing the lineage polynomial of $\tup$ is upperbounded by the runtime of deterministic query evaluation of $\tup$, as we show in \cref{sec:circuit-runtime}.}  The second step is \termStepTwo (\abbrStepTwo), which consists of computing $\expct\pbox{\poly_\tup(\vct{\randWorld})}$. Such a model of computation is nicely followed in set-\abbrPDB semantics \cite{DBLP:series/synthesis/2011Suciu}, where $\poly_\tup\inparen{\vct{X}}$ must be computed separate from deterministic query evaluation to obtain exact output when $\query(\pdb)$ is hard since evaluating the probability inline with query operators (extensional evaluation) will only approximate the actual probability in such a case.  The paradigm of \cref{fig:two-step} is also analogous to semiring provenance, where $\semNX$-DB\footnote{An $\semNX$-DB is a database whose tuples are annotated with standard polynomials, i.e. elements from $\semNX$ connected by multiplication and addition operators.} query processing \cite{DBLP:conf/pods/GreenKT07} first computes the query and polynomial, and the $\semNX$-polynomial can then subsequently evaluated over a semantically appropriate semiring, e.g. $\semN$ to model bag semantics.  Further, in this work, the intensional model lends itself nicely in separating the concerns of deterministic computation and the probability computation.  
+The first step, which we will refer to as \termStepOne (\abbrStepOne), consists of computing both $\query\inparen{\db}$ and $\poly_\tup(\vct{X})$.\footnote{Assuming standard $\raPlus$ query processing algorithms, computing the lineage polynomial of $\tup$ is upperbounded by the runtime of deterministic query evaluation of $\tup$, as we show in \cref{sec:circuit-runtime}.}  The second step is \termStepTwo (\abbrStepTwo), which consists of computing $\expct\pbox{\poly_\tup(\vct{\randWorld})}$. Such a model of computation is nicely followed in set-\abbrPDB semantics \cite{DBLP:series/synthesis/2011Suciu}, where $\poly_\tup\inparen{\vct{X}}$ must be computed separate from deterministic query evaluation to obtain exact output when $\query(\pdb)$ is hard since evaluating the probability inline with query operators (extensional evaluation) will only approximate the actual probability in such a case.  The paradigm of \cref{fig:two-step} is also analogous to semiring provenance, where $\semNX$-DB\footnote{An $\semNX$-DB is a database whose tuples are annotated with elements from the set of polynomials with variables in $\vct{X}$ and natural number coeficients and exponents.} query processing \cite{DBLP:conf/pods/GreenKT07} first computes the query and polynomial, and the $\semNX$-polynomial can then subsequently evaluated over a semantically appropriate semiring, e.g. $\semN$ to model bag semantics.  Further, in this work, the intensional model lends itself nicely in separating the concerns of deterministic computation and the probability computation.  

 Let $\timeOf{\abbrStepOne}$ denote the runtime of \abbrStepOne and similarly for $\timeOf{\abbrStepTwo}$. 
 Given bag-\abbrPDB query $\query$ and \abbrTIDB $\pdb$  with $\numvar$ tuples, let us go a step further and assume that computing $\poly_\tup$ is lower bounded by the runtime of determistic query computation of $\query$ for the following situations, i.e. when $\abs{\textnormal{input}} \leq \abs{\textnormal{output}}$.  When $\poly_\tup$ is in standard monomial basis (\abbrSMB)\footnote{A polynomial is in \abbrSMB when it consists of a sum of unique products.}, by linearity of expectation and independence of \abbrTIDB, it follows that $\timeOf{\abbrStepTwo}$ is indeed $\bigO{\timeOf{\abbrStepOne}}$.  Let $\prob_i$ denote the probability of tuple $\tup_i$ ($\probOf\pbox{X_i = 1}$) for $i \in [\numvar]$.  Consider another special case when for all $i$ in $[\numvar]$, $\prob_i = 1$.  For output tuple $\tup'$ of $\query\inparen{\pdb}$, computing $\expct\pbox{\poly_{\tup'}\inparen{\vct{\randWorld}}}$ is linear in 
--- a/ra-to-poly.tex
+++ b/ra-to-poly.tex
@ -5,6 +5,8 @@

 \subsection{Probabilistic Databases}

+We focus primarily on set-\abbrPDB inputs in this section, but as noted in \cref{sec:intro-rewrite-070921}, this is not limiting.  
+
 An \textit{incomplete database} $\idb$ is a set of deterministic databases $\db$ called possible worlds.
 Denote the schema of $\db$ as $\sch(\db)$. A \textit{probabilistic database} $\pdb$ is a pair $(\idb, \pd)$ where $\idb$ is an incomplete database and $\pd$ is a probability distribution over $\idb$. Queries over probabilistic databases are evaluated using the so-called possible world semantics. Under the possible world semantics, the result of a query $\query$ over an incomplete database $\idb$ is the set of query answers produced by evaluating $\query$ over each possible world: $\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}$.