Changes to 2-step intensional figure.

2022-01-31 15:39:13 -05:00 · 2022-01-31 15:39:13 -05:00 · 14f0eb9adf
parent 5e00f09da6
commit 14f0eb9adf
3 changed files with 26 additions and 28 deletions
--- a/intro-rewrite-070921.tex
+++ b/intro-rewrite-070921.tex
@ -4,20 +4,21 @@

 \secrev{
 This work explores the problem of computing the expectation of a tuple's multiplicity in an important special case of bag \abbrTIDB, which we call a \abbrCTIDB.  A \abbrCTIDB,
-$\pdb = \inparen{\worlds, \bpd}$ encodes a bag of uncertain tuples such that each tuple in $\pdb$ has a multiplicity of at most $\bound$.  The set of all worlds is encoded in $\worlds$, which is the set of all vectors of length $\abs{\tupset}$ such that each index corresponds to a distinct $\tup \in \tupset$ storing its multiplicity. $\bpd$ is a product distribution over the set of all worlds.  A given world $\worldvec = \inset{0,\ldots, \bound}^{\abs{\tupset}}$ can be interpreted such that, for each $\tup \in \tupset$, $\worldvec\pbox{\tup}$ is the multiplicity of $\tup$ in $\worldvec$.  The resulting product distribution can then be encoded as $\prob_{\tup} = \probOf\pbox{W\pbox{i} = j}$ (for $j \in\pbox{\bound}$), where each distribution is independent for $\tup \in \tupset$.
+$\pdb = \inparen{\worlds, \bpd}$ encodes a bag of uncertain tuples such that each tuple in $\pdb$ has a multiplicity of at most $\bound$.  The set of all worlds is encoded in $\worlds$, which is the set of all vectors of length $\abs{\tupset}$ such that each index corresponds to a distinct $\tup \in \tupset$ storing its multiplicity. $\bpd$ is a product distribution over the set of all worlds.  A given world $\worldvec = \inset{0,\ldots, \bound}^{\abs{\tupset}}$ can be interpreted such that, for each $\tup \in \tupset$, $\worldvec\pbox{\tup}$ is the multiplicity of $\tup$ in $\worldvec$.  The resulting product distribution can then be encoded as $\prob_{\tup} = \probOf\pbox{W\pbox{\tup} = j}$ (for $j \in\pbox{\bound}$), where each %distribution 
+$\tup$ is an independent random event. %for $\tup \in \tupset$.
 }  
 %\mypar{For a later section}
 %\sout{
 %Since each tuple in $\pdb$ has a mutually exclusive probability distribution over its possible multiplicities, it is natural to reduce a \abbrCTIDB to traditional (set) block independent database (\abbrBIDB).  We refer to the reduced \abbrBIDB as a $1$-\abbrBIDB, as it is the case that each tuple can appear in a possible world at most $c = 1$ time.  \Cref{fig:ctidb-red} shows an example of this reduction.
 %}  
 \secrev{
-Allowing for $\leq \bound$ multiplicities across all tuples gives rise to having $\leq \inparen{\bound+1}^\numvar$ possible worlds instead of the usual $2^\numvar$ possible worlds of a $1$-\abbrTIDB$, which (assuming set query semantics), is the same as the traditional set \abbrTIDB. 
-In this work, since we are generally considering bag query input, we will only be considering bag query semantics.
+Allowing for $\leq \bound$ multiplicities across all tuples gives rise to having $\leq \inparen{\bound+1}^\numvar$ possible worlds instead of the usual $2^\numvar$ possible worlds of a $1$-\abbrTIDB, which (assuming set query semantics), is the same as the traditional set \abbrTIDB. 
+In this work, since we are generally considering bag query input, we will only be considering bag query semantics.  We denote by $\query\inparen{\vct{W}}\inparen{\tup}$ the multiplicity of $\tup$ in query $\query$ over possible world $\vct{W}\in\worlds$.

 We can formally state this problem as:

 \begin{Problem}\label{prob:expect-mult}
-Given a \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query $\query$, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\randDB\sim\bpd}\pbox{\query\inparen{\randDB}\inparen{\tup}}$.
+Given a \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query $\query$, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\vct{W}\sim\bpd}\pbox{\query\inparen{\vct{W}}\inparen{\tup}}$.
 \end{Problem}
 \AH{I \emph{think} we use $\randDB$ to denote something different in one of the proofs.  Have to keep an eye open for this to avoid overloading notation.}

@ -144,7 +145,7 @@ $\Omega\inparen{\inparen{\qruntime{\query, \gentupset}}^{c_0\cdot k}}$ for {\em
 \caption{Our lower bounds for a specific hard query $Q$ parameterized by $k$. The $\pdb$ is over the same (family of) $\gentupset$ and those with `Multiple' in the second column need the algorithm to be able to handle multiple $\pd$ (for a given $\gentupset$). The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
 \label{tab:lbs}
 \end{table}
-\mypar{Our lower bound results} In table~\ref{tab:lbs} we show that depending on what hardness result/conjecture we assume, we get various emphatic versions of {\em no} as an answer to our question.  To make some sense of the other lower bounds in Table~\ref{tab:lbs}, we note that it is not too hard to show that $\timeOf{}^*(Q,\pdb) \le  O\inparen{\inparen{\qruntime{Q, \gentupset}}^k}$, where $k$ is the largest degree of the query $\query$ (i.e., join width) over all result tuples $\tup$ (and the parameter that defines our family of hard queries).
+\mypar{Our lower bound results} In table~\ref{tab:lbs} we show that depending on what hardness result/conjecture we assume, we get various emphatic versions of {\em no} as an answer to our question.  To make some sense of the other lower bounds in Table~\ref{tab:lbs}, we note that it is not too hard to show that $\timeOf{}^*(Q,\pdb) \le  O\inparen{\inparen{\qruntime{Q, \gentupset}}^k}$, where $k$ is the join width (our notion of join width follows from~\cref{def:degree-of-poly} and~\cref{fig:nxDBSemantics}.) of the query $\query$ over all result tuples $\tup$ (and the parameter that defines our family of hard queries).

 What our lower bound in the third row says is that one cannot get more than a polynomial improvement over essentially the trivial algorithm for~\cref{prob:expect-mult}.
 However, this result assumes a hardness conjecture that is not as well studied as those in the first two rows of the table (see \Cref{sec:hard} for more discussion on the hardness assumptions). Further, we note that existing results already imply the claimed lower bounds if we were to replace the $\qruntime{\query, \gentupset}$ by just $\abs{\gentupset}$ (indeed these results follow from known lower bound for deterministic query processing). Our contribution is to then identify a family of hard queries where deterministic query processing is `easy' but computing the expected multiplicities is hard. 
@ -243,10 +244,7 @@ $\expct\limits_{\vct{\randWorld}\sim\pdassign}\pbox{\poly^2\inparen{\vct{\randWo
 \end{footnotesize}
 \noindent This property leads us to consider a structure related to the lineage polynomial.
 \begin{Definition}\label{def:reduced-poly}
-For any polynomial $\poly(\vct{X})$ corresponding to a \abbrCTIDB (henceforth, \abbrCTIDB-lineage polynomial),
-%\BG{Better introduce the notion of TIDB lin poly before here, then it iis more clear?},
-%Atri: Done
- define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the standard monomial basis (\abbrSMB) \footnote{
+For any polynomial $\poly(\vct{X})$ define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the standard monomial basis (\abbrSMB) \footnote{
  This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
 }
 form of $\poly(\vct{X})$ to $1$.
--- a/poly-form.tex
+++ b/poly-form.tex
@ -22,7 +22,7 @@ Unless othewise noted, we consider all polynomials to be in \abbrSMB representat
 When it is unclear, we use $\smbOf{\poly}$ to denote the \abbrSMB form of a polynomial $\poly$.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\begin{Definition}[Degree]\label{def:degree}
+\begin{Definition}[Degree]\label{def:degree-of-poly}
 The degree of polynomial $\poly(\vct{X})$ is the largest $\sum_{i=1}^n d_i$ such that $c_{(d_1,\dots,d_n)}\ne 0$. % maximum sum of exponents, over all monomials in $\smbOf{\poly(\vct{X})}$.
 \end{Definition}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
--- a/two-step-model.tex
+++ b/two-step-model.tex
@ -15,26 +15,26 @@
 		\node[cylinder, text width=0.28\textwidth, align=center, draw=black, text=black, cylinder uses custom fill, cylinder body fill=blue!10, aspect=0.12, minimum height=5cm, minimum width=2.5cm, cylinder end fill=blue!50, shape border rotate=90] (cylinder) at (0, 0) {
 		\tabcolsep=0.1cm
 		\begin{tabular}{>{\small}c | >{\small}c | >{\small}c}
-				\multicolumn{3}{c}{$\boldsymbol{OnTime}$}\\
+				\multicolumn{3}{c}{$\boldsymbol{T}$}\\
 				%\toprule
-				City & $\Phi$  & \textbf{p}\\
+				Point & $\Phi$  & $\semN$\\
 				\midrule
-	                     Buffalo     & $A$ & 0.9 \\
-	                     Chicago     & $B$ & 0.5\\
-	                     Bremen      & $C$ & 0.5\\
-	                     Zurich      & $E$ & 1.0\\	                     
+	                     $e_1$    & $A$ & 1 \\
+	                     $e_2$     & $B$ & 1\\
+	                     $e_3$      & $C$ & 1\\
+	                     $e_4$      & $E$ & 1\\	                     
 			\end{tabular}\\
 			\tabcolsep=0.05cm
 			%\captionof{table}{Route}
 			\begin{tabular}{>{\footnotesize}c | >{\footnotesize}c | >{\footnotesize}c | >{\footnotesize}c}
-				\multicolumn{4}{c}{$\boldsymbol{Route$}}\\
+				\multicolumn{4}{c}{$\boldsymbol{R$}}\\
 				%\toprule
-				$\text{City}_1$ & $\text{City}_2$ & $\Phi$ & \textbf{p} \\
+				$\text{Point}_1$ & $\text{Point}_2$ & $\Phi$ & $\semN$ \\
 				\midrule
-	                    Buffalo         & Chicago         & $X$          & 1.0        \\
-	                    Chicago         & Zurich          & $Y$          & 1.0        \\
+	                    $e_1$         & $e_2$         & $X$          & 2       \\
+	                    $e_2$         & $e_4$          & $Y$          & 4        \\
 	                    %& $\cdots$        & $\cdots$        & $\cdots$     & $\cdots$   \\
-	                    Chicago         & Bremen          & $Z$          & 1.0        \\
+	                    $e_2$         & $e_3$          & $Z$          & 3       \\
 			\end{tabular}};
 			%label below cylinder
 			\node[below=0.2 cm of cylinder]{{\LARGE$ \tupset$}};
@ -51,11 +51,11 @@
 		 \begin{tabular}{>{\normalsize}c | >{\centering\arraybackslash\normalsize}m{1.95cm} | >{\centering\arraybackslash\small}m{1.95cm}}
 	            %\multicolumn{3}{c}{$\boldsymbol{\query(\pdb)}$}\\[1mm]
 	            %\toprule
-	            City    & $\Phi$ & Circuit\\%                          & $\expct_{\idb \sim \probDist}[\query(\db)(t)]$ \\ \hline
+	            Point    & $\Phi$ & Circuit\\%                          & $\expct_{\idb \sim \probDist}[\query(\db)(t)]$ \\ \hline
 			\midrule
 	          	%\hline 
 	          	%\\\\[-3.5\medskipamount]
-	                 Buffalo & $AX$ &\resizebox{!}{10mm}{
+	                 $e_1$ & $AX$ &\resizebox{!}{10mm}{
 	                       \begin{tikzpicture}[thick]
 	                       		\node[gen_tree_node](sink) at (0.5, 0.8){$\boldsymbol{\circmult}$};
 	                       		\node[gen_tree_node](source1) at (0, 0){$A$};
@ -64,7 +64,7 @@
 	                       		\draw[->] (source2)--(sink);
 					\end{tikzpicture}% & $0.5 \cdot 1.0 + 0.5 \cdot 1.0 = 1.0$   
 					}\\%                 & $0.9$                                            \\
-	                       Chicago & $B(Y + Z)$\newline \text{Or}\newline $BY+ BZ$&
+	                       $e_2$ & $B(Y + Z)$\newline \text{Or}\newline $BY+ BZ$&
 	                       \resizebox{!}{16mm} {
 						\begin{tikzpicture}[thick]
 							\node[gen_tree_node] (a1) at (1, 0){$Y$};
@ -116,17 +116,17 @@
 		\begin{tabular}{>{\small}c | >{\centering\arraybackslash\small}m{1.95cm}}
 			%\multicolumn{2}{c}{$\expct\pbox{\poly(\vct{X})}$}\\[1mm]
 			%\toprule
-			City & $\mathbb{E}[\poly(\vct{X})]$\\
+			Point & $\mathbb{E}[\poly(\vct{X})]$\\
 			\midrule%[0.05pt]
-			Buffalo & $1.0 \cdot 0.9 = 0.9$\\[3mm]
-			Chicago & $(0.5 \cdot 1.0) + $\newline $\hspace{0.2cm}(0.5 \cdot 1.0)$\newline $= 1.0$\\
+			$e_1$ & $A\cdot\probOf\pbox{A = 1}\inparen{X\cdot\probOf\pbox{X = 1} + X\cdot\probOf\pbox{X = 2}}$\\[2mm]%$1.0 \cdot 0.9 = 0.9$\\[3mm]
+			$e_2$ & $(0.5 \cdot 1.0) + $\newline $\hspace{0.2cm}(0.5 \cdot 1.0)$\newline $= 1.0$\\
 		\end{tabular}
 			};
 		%label of rounded rectangle
 		\node[below=0.2cm of rrect]{{\LARGE $\expct\pbox{\poly(\vct{X})}$}};
 	\end{tikzpicture}
 	}
-	\caption{Intensional Query Evaluation Model ($\query = \project_{\text{City}}\inparen{Route\join_{\text{City}_1 = City}OnTime}$).}
+	\caption{Intensional Query Evaluation Model ($\query = \project_{\text{City}}\inparen{T\join_{\text{City} = \text{City}_1}R}$).}
 	\label{fig:two-step}
 \end{figure}