Rearranged some figure/table.

2022-03-14 16:29:49 -04:00 · 2022-03-14 16:29:49 -04:00 · a9aea5ecc3
parent 160e959c73
commit a9aea5ecc3
3 changed files with 38 additions and 22 deletions
--- a/introduction.tex
+++ b/introduction.tex
@ -19,6 +19,39 @@ An $\raPlus$ query is a query expressed in positive relational algebra, i.e., us
 }
 $\query$, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\rvworld\sim\bpd}\pbox{\query\inparen{\rvworld}\inparen{\tup}}$.
 \end{Problem}
+
+\begin{figure}[t!]
+  \begin{align*}
+  	&\begin{aligned}[t]
+	  &\polyqdt{\project_A(\query)}{\gentupset}{\tup} =\\
+	  &~~\sum_{\tup': \project_A(\tup') = \tup} \polyqdt{\query}{\gentupset}{\tup'}
+	  \end{aligned} 
+	  &
+	  &\begin{aligned}[t]
+		  &\polyqdt{\query_1 \union \query_2}{\gentupset}{\tup} =\\
+		  &\qquad \polyqdt{\query_1}{\gentupset}{\tup} + \polyqdt{\query_2}{\gentupset}{\tup}\\
+	  \end{aligned}\\
+	  &\begin{aligned}
+		  &\polyqdt{\select_\theta(\query)}{\gentupset}{\tup} =\\
+		  &~~ \begin{cases}
+		    \polyqdt{\query}{\gentupset}{\tup} & \text{if }\theta(\tup) \\
+		    0                       & \text{otherwise}.
+		    \end{cases}
+	  \end{aligned}
+	  &
+       &\begin{aligned}
+	          &\polyqdt{\query_1 \join \query_2}{\gentupset}{\tup} =\\	
+	          &\qquad\polyqdt{\query_1}{\gentupset}{\project_{\attr{\query_1}}{\tup}}\\
+	          &\qquad\cdot\polyqdt{\query_2}{\gentupset}{\project_{\attr{\query_2}}{\tup}}
+          \end{aligned}\\
+	  &&&\polyqdt{\rel}{\gentupset}{\tup} = X_\tup
+	\end{align*}%\\[-10mm]
+	\setlength{\abovecaptionskip}{-0.25cm}
+	\caption{Construction of the lineage (polynomial) for an $\raPlus$ query $\query$ over an arbitrary deterministic database $\gentupset$, where $\vct{X}$ consists of all $X_\tup$ over all $\rel$ in $\gentupset$ and $\tup$ in $\rel$. Here $\gentupset.\rel$ denotes the instance of relation $\rel$ in $\gentupset$.  Please note, after we introduce the reduction to $1$-\abbrBIDB, the base case will be expressed alternatively.}
+	\label{fig:nxDBSemantics}
+	\vspace{-0.53cm}
+\end{figure}
+
 It is natural to explore computing the expected multiplicity of a result tuple as this is the analog for computing the marginal probability of a tuple in a set \abbrPDB.
 In this work we will assume that $c =\bigO{1}$ since this is what is typically seen in practice.
 Allowing for unbounded $c$ is an interesting open problem.
@ -32,7 +65,8 @@ Specifically, in this work we ask if~\Cref{prob:expect-mult} can be solved in ti

 Let $\qruntime{\query,\gentupset,\bound}$ (see~\Cref{sec:gen} for further details) denote the runtime for query $\query$, deterministic database $\gentupset$, and multiplicity bound $\bound$.  This paper considers $\raPlus$ queries for which order of operations is \emph{explicit}, as opposed to other query languages, e.g. Datalog, UCQ.  Thus, since order of operations affects runtime, we denote the optimized $\raPlus$ query picked by an arbitrary production system as $\optquery{\query} = \min_{\query'\in\raPlus, \query'\equiv\query}\qruntime{\query', \gentupset, \bound}$.  Then $\qruntime{\optquery{\query}, \gentupset,\bound}$ is the runtime for the optimized query.\footnote{Note that our work applies to any $\query \in\raPlus$, which implies that specific heuristics for choosing an optimized query can be abstracted away, i.e., our work does not consider heuristic techniques.}

-\begin{table}[t!]
+\begin{table*}[t!]
+\centering
 \begin{tabular}{|p{0.43\textwidth}|p{0.12\textwidth}|p{0.35\textwidth}|}
 \hline
 \textbf{Lower bound on $\timeOf{}^*(\qhard,\pdb)$} & \textbf{Num.} $\bpd$s
@ -45,7 +79,9 @@ $\Omega\inparen{\inparen{\qruntime{\optquery{\qhard}, \tupset, \bound}}^{c_0\cdo
 \end{tabular}
 \caption{Our lower bounds for a specific hard query $\qhard$ parameterized by $k$.  For $\pdb = \inset{\worlds, \bpd}$ those with `Multiple' in the second column need the algorithm to be able to handle multiple $\bpd$, i.e. probability distributions (for a given $\tupset$). The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
 \label{tab:lbs}
-\end{table}
+\vspace{-0.73cm}
+\end{table*}
+
 \mypar{Our lower bound results}
 Our question is whether or not it is always true that $\timeOf{}^*\inparen{\query, \pdb}\leq\qruntime{\optquery{\query}, \tupset, \bound}$.  Unfortunately this is not the case.
 ~\Cref{tab:lbs} shows our results.
@ -64,26 +100,6 @@ Further, our approximation algorithm works for a more general notion of bag \abb

 \subsection{Polynomial Equivalence}\label{sec:intro-poly-equiv}
 A common encoding of probabilistic databases (e.g., in \cite{IL84a,Imielinski1989IncompleteII,Antova_fastand,DBLP:conf/vldb/AgrawalBSHNSW06} and many others) relies on annotating tuples with lineages or propositional formulas that describe the set of possible worlds that the tuple appears in.  The bag semantics analog is a provenance/lineage polynomial (see~\Cref{fig:nxDBSemantics}) $\apolyqdt$~\cite{DBLP:conf/pods/GreenKT07}, a polynomial with non-zero integer coefficients and exponents, over  variables $\vct{X}$ encoding input tuple multiplicities. Evaluating a lineage polynomial for a query result tuple $t_{out}$ by, for each tuple $\tup_{in}$, assigning the variable $X_{t_{in}}$ encoding the tuple's multiplicity to the tuple's multiplicity in the possible world yields the multiplicity of the $\tup_{out}$ in the query result for this world.
-\begin{figure}[b!]
-  \begin{align*}
-	  \polyqdt{\project_A(\query)}{\gentupset}{\tup} =& \sum_{\tup': \project_A(\tup') = \tup} \polyqdt{\query}{\gentupset}{\tup'} &
-	  \polyqdt{\query_1 \union \query_2}{\gentupset}{\tup} =& \polyqdt{\query_1}{\gentupset}{\tup} + \polyqdt{\query_2}{\gentupset}{\tup}\\
-	  \polyqdt{\select_\theta(\query)}{\gentupset}{\tup} =& \begin{cases}
-	    \polyqdt{\query}{\gentupset}{\tup} & \text{if }\theta(\tup) \\
-	    0                       & \text{otherwise}.
-	    \end{cases} &
-	       \begin{aligned}
-	          \polyqdt{\query_1 \join \query_2}{\gentupset}{\tup} =\\ ~
-	        \end{aligned}&
-	          \begin{aligned}
-	            &\polyqdt{\query_1}{\gentupset}{\project_{\attr{\query_1}}{\tup}}  \\
-	            &~~~\cdot\polyqdt{\query_2}{\gentupset}{\project_{\attr{\query_2}}{\tup}}
-	          \end{aligned}\\
-	                                           & & & \polyqdt{\rel}{\gentupset}{\tup} = X_\tup
-	\end{align*}\\[-10mm]
-	\caption{Construction of the lineage (polynomial) for an $\raPlus$ query $\query$ over an arbitrary deterministic database $\gentupset$, where $\vct{X}$ consists of all $X_\tup$ over all $\rel$ in $\gentupset$ and $\tup$ in $\rel$. Here $\gentupset.\rel$ denotes the instance of relation $\rel$ in $\gentupset$.  Please note, after we introduce the reduction to $1$-\abbrBIDB, the base case will be expressed alternatively.}
-	\label{fig:nxDBSemantics}
-\end{figure}

 We drop $\query$, $\tupset$, and $\tup$ from $\apolyqdt$ when they are clear from the context or irrelevant to the discussion. We now specify the problem of computing the expectation of tuple multiplicity in the language of lineage polynomials:
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
--- a/main.pdf
+++ b/main.pdf
--- a/main.synctex.gz
+++ b/main.synctex.gz