Addressing the deterministic database issue.

2022-06-07 10:10:19 -04:00 · 2022-06-07 10:10:19 -04:00 · 5a18732c08
parent 07ea722712
commit 5a18732c08
4 changed files with 5 additions and 5 deletions
--- a/binarybidb.tex
+++ b/binarybidb.tex
@ -39,7 +39,7 @@ Define a \emph{\abbrOneBIDB} to be the pair $\pdb' = \inparen{\bigtimes_{\tup\in
 %\footnote{We slightly abuse notation here, denoting a world vector as $W$ rather than $\worldvec$ to distinguish between the random variable and the world instance.  When there is no ambiguity, we will denote a world vector as $\worldvec$.}
 \end{Definition}

-Lineage polynomials for arbitrary deterministic $\gentupset'$ are constructed in a manner analogous to $1$-\abbrTIDB\xplural (see \Cref{fig:nxDBSemantics}), differing only in the base case.  
+Lineage polynomials for arbitrary \dbbaseName $\gentupset'$ are constructed in a manner analogous to $1$-\abbrTIDB\xplural (see \Cref{fig:nxDBSemantics}), differing only in the base case.  
 In a $1$-\abbrTIDB, each tuple contributes a multiplicity of 0 or 1, and $\polyqdt{\rel}{\gentupset}{\tup} = X_\tup$. %\textcolor{red}{CHANGE}
 In a \abbrOneBIDB, each tuple $\tup\in\tupset'$ contributes its corresponding multiplicity: %\textcolor{red}{CHANGE}
 $\polyqdt{\rel}{\gentupset}{\tup} = c_\tup\cdot X_\tup$.  These semantics are fully detailed in \Cref{fig:lin-poly-bidb}.
--- a/circuits-model-runtime.tex
+++ b/circuits-model-runtime.tex
@ -1,4 +1,4 @@
-%!TEX root=./main.tex
+%!TEX root= prob-def.tex

 \subsection{Deterministic Query Runtimes}\label{sec:gen}
 %We formalize our claim from \Cref{sec:intro} that a linear approximation algorithm for our problem implies that PDB queries (under bag semantics) can be answered (approximately) in the same runtime as deterministic queries under reasonable assumptions.
--- a/conclusions.tex
+++ b/conclusions.tex
@ -5,7 +5,7 @@ We have studied the problem of calculating the expected multiplicity of a bag-qu
 a problem that has a practical application in probabilistic databases over multisets. 
 We show that under various parameterized complexity hardness results/conjectures computing the expected multiplicities exactly is not possible in time linear in the corresponding deterministic query processing time.
 We prove that it is possible to approximate the expectation of a lineage polynomial in linear time
- in the deterministic query processing  over TIDBs and BIDBs (assuming that there are few cancellations).
+ in the deterministic query processing over TIDBs and BIDBs (assuming that there are few cancellations).
 Interesting directions for future work include development of a dichotomy for bag \abbrPDB\xplural.  While we can handle higher moments (this follows fairly easily from our existing results-- see \Cref{sec:momemts}), more general approximations are an interesting area for exploration, including those for more general data models. 

 %%% Local Variables:
--- a/introduction.tex
+++ b/introduction.tex
@ -51,7 +51,7 @@ An $\raPlus$ query is a query expressed in positive relational algebra, i.e., us
 	\end{align*}%\\[-10mm]
 	%\setlength{\abovecaptionskip}{-0.25cm}
 	\savecaptionspace{
-	\caption{Lineage polynomial semantics given $\raPlus$ query $\query$, arbitrary deterministic database $\gentupset$ with variables $\inparen{X_\tup}_{\tup \in\gentupset}$, where for $\rel\in\gentupset$, $\tup\in\rel$, the base case is $\polyqdt{\rel}{\gentupset}{\tup} = X_\tup$.}% for any $\rel\in\gentupset$ and $\tup\in\rel$.}% consists of all $X_\tup$ over all $\rel$ in $\gentupset$ and $\tup$ in $\rel$, such that the base case $\polyqdt{\rel}{\gentupset}{\tup} = X_\tup$.} %Here $\gentupset.\rel$ denotes the instance of relation $\rel$ in $\gentupset$.  Please note, after we introduce the reduction to $1$-\abbrBIDB, the base case will be expressed alternatively.  The base case is $\polyqdt{\rel}{\gentupset}{\tup} = X_\tup$}
+	\caption{Lineage polynomial semantics given $\raPlus$ query $\query$, arbitrary \dbbaseName $\gentupset$ with variables $\inparen{X_\tup}_{\tup \in\gentupset}$, where for $\rel\in\gentupset$, $\tup\in\rel$, the base case is $\polyqdt{\rel}{\gentupset}{\tup} = X_\tup$.}% for any $\rel\in\gentupset$ and $\tup\in\rel$.}% consists of all $X_\tup$ over all $\rel$ in $\gentupset$ and $\tup$ in $\rel$, such that the base case $\polyqdt{\rel}{\gentupset}{\tup} = X_\tup$.} %Here $\gentupset.\rel$ denotes the instance of relation $\rel$ in $\gentupset$.  Please note, after we introduce the reduction to $1$-\abbrBIDB, the base case will be expressed alternatively.  The base case is $\polyqdt{\rel}{\gentupset}{\tup} = X_\tup$}
 	\label{fig:nxDBSemantics}
 	}{\abovecapshrink}{\belowcapshrink}
 	%\vspace{-0.53cm}
@ -112,7 +112,7 @@ Those with `Multiple' in the second column need the algorithm to be able to hand
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \mypar{Our lower bound results}
 %
-Let $\qruntime{\query,\gentupset,\bound}$ (see~\Cref{sec:gen} for further details) denote the runtime for query $\query$ over a deterministic database $\gentupset$ where the maximum multiplicity of any tuple is less than or equal to $\bound$.  % This paper considers $\raPlus$ queries, for which order of operations is \emph{explicit}, as opposed to other query languages, e.g. Datalog, UCQ.  Thus, since order of operations affects runtime, we denote the optimized $\raPlus$ query picked by an arbitrary production system as $\optquery{\query} \approx \min_{\query'\in\raPlus, \query'\equiv\query}\qruntime{\query', \gentupset, \bound}$.  Then $\qruntime{\optquery{\query}, \gentupset,\bound}$ is the runtime for the optimized query.\footnote{The upper bounds on runtime that we derive apply pointwise to any $\query \in\raPlus$, allowing us to abstract away the specific heuristics for choosing an optimized query (i.e., Any deterministic query optimization heuristic is equally useful for \abbrCTIDB queries).}\BG{Rewrite: since an optimized Q is also a Q this also applies in the case where there is a query optimizer the rewrites Q}
+Let $\qruntime{\query,\gentupset,\bound}$ (see~\Cref{sec:gen} for further details) denote the runtime for query $\query$ over a \dbbaseName $\gentupset$ where the maximum multiplicity of any tuple is less than or equal to $\bound$.  % This paper considers $\raPlus$ queries, for which order of operations is \emph{explicit}, as opposed to other query languages, e.g. Datalog, UCQ.  Thus, since order of operations affects runtime, we denote the optimized $\raPlus$ query picked by an arbitrary production system as $\optquery{\query} \approx \min_{\query'\in\raPlus, \query'\equiv\query}\qruntime{\query', \gentupset, \bound}$.  Then $\qruntime{\optquery{\query}, \gentupset,\bound}$ is the runtime for the optimized query.\footnote{The upper bounds on runtime that we derive apply pointwise to any $\query \in\raPlus$, allowing us to abstract away the specific heuristics for choosing an optimized query (i.e., Any deterministic query optimization heuristic is equally useful for \abbrCTIDB queries).}\BG{Rewrite: since an optimized Q is also a Q this also applies in the case where there is a query optimizer the rewrites Q}
 Our question is whether or not it is always true that for every $\query$,  $\timeOf{}^*\inparen{\query, \pdb, \bound}\leq \bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$.  We remark that the issue of query optimization is orthogonal to this question (recall that an $\raPlus$ query explicitly encodes order of operations) since we want to answer the above question for all $\query$. \emph{Specifically, if there is an equivalent query $\query'$ that is more efficient to evaluate, we allow both deterministic and probabilistic query processing access to $\query'$}.

 Unfortunately the the answer to the above question  is no--