Changes to 2-step intensional figure.

This commit is contained in:
Aaron Huber 2022-01-31 15:39:13 -05:00
parent 5e00f09da6
commit 14f0eb9adf
3 changed files with 26 additions and 28 deletions

View file

@ -4,20 +4,21 @@
\secrev{
This work explores the problem of computing the expectation of a tuple's multiplicity in an important special case of bag \abbrTIDB, which we call a \abbrCTIDB. A \abbrCTIDB,
$\pdb = \inparen{\worlds, \bpd}$ encodes a bag of uncertain tuples such that each tuple in $\pdb$ has a multiplicity of at most $\bound$. The set of all worlds is encoded in $\worlds$, which is the set of all vectors of length $\abs{\tupset}$ such that each index corresponds to a distinct $\tup \in \tupset$ storing its multiplicity. $\bpd$ is a product distribution over the set of all worlds. A given world $\worldvec = \inset{0,\ldots, \bound}^{\abs{\tupset}}$ can be interpreted such that, for each $\tup \in \tupset$, $\worldvec\pbox{\tup}$ is the multiplicity of $\tup$ in $\worldvec$. The resulting product distribution can then be encoded as $\prob_{\tup} = \probOf\pbox{W\pbox{i} = j}$ (for $j \in\pbox{\bound}$), where each distribution is independent for $\tup \in \tupset$.
$\pdb = \inparen{\worlds, \bpd}$ encodes a bag of uncertain tuples such that each tuple in $\pdb$ has a multiplicity of at most $\bound$. The set of all worlds is encoded in $\worlds$, which is the set of all vectors of length $\abs{\tupset}$ such that each index corresponds to a distinct $\tup \in \tupset$ storing its multiplicity. $\bpd$ is a product distribution over the set of all worlds. A given world $\worldvec = \inset{0,\ldots, \bound}^{\abs{\tupset}}$ can be interpreted such that, for each $\tup \in \tupset$, $\worldvec\pbox{\tup}$ is the multiplicity of $\tup$ in $\worldvec$. The resulting product distribution can then be encoded as $\prob_{\tup} = \probOf\pbox{W\pbox{\tup} = j}$ (for $j \in\pbox{\bound}$), where each %distribution
$\tup$ is an independent random event. %for $\tup \in \tupset$.
}
%\mypar{For a later section}
%\sout{
%Since each tuple in $\pdb$ has a mutually exclusive probability distribution over its possible multiplicities, it is natural to reduce a \abbrCTIDB to traditional (set) block independent database (\abbrBIDB). We refer to the reduced \abbrBIDB as a $1$-\abbrBIDB, as it is the case that each tuple can appear in a possible world at most $c = 1$ time. \Cref{fig:ctidb-red} shows an example of this reduction.
%}
\secrev{
Allowing for $\leq \bound$ multiplicities across all tuples gives rise to having $\leq \inparen{\bound+1}^\numvar$ possible worlds instead of the usual $2^\numvar$ possible worlds of a $1$-\abbrTIDB$, which (assuming set query semantics), is the same as the traditional set \abbrTIDB.
In this work, since we are generally considering bag query input, we will only be considering bag query semantics.
Allowing for $\leq \bound$ multiplicities across all tuples gives rise to having $\leq \inparen{\bound+1}^\numvar$ possible worlds instead of the usual $2^\numvar$ possible worlds of a $1$-\abbrTIDB, which (assuming set query semantics), is the same as the traditional set \abbrTIDB.
In this work, since we are generally considering bag query input, we will only be considering bag query semantics. We denote by $\query\inparen{\vct{W}}\inparen{\tup}$ the multiplicity of $\tup$ in query $\query$ over possible world $\vct{W}\in\worlds$.
We can formally state this problem as:
\begin{Problem}\label{prob:expect-mult}
Given a \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query $\query$, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\randDB\sim\bpd}\pbox{\query\inparen{\randDB}\inparen{\tup}}$.
Given a \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query $\query$, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\vct{W}\sim\bpd}\pbox{\query\inparen{\vct{W}}\inparen{\tup}}$.
\end{Problem}
\AH{I \emph{think} we use $\randDB$ to denote something different in one of the proofs. Have to keep an eye open for this to avoid overloading notation.}
@ -144,7 +145,7 @@ $\Omega\inparen{\inparen{\qruntime{\query, \gentupset}}^{c_0\cdot k}}$ for {\em
\caption{Our lower bounds for a specific hard query $Q$ parameterized by $k$. The $\pdb$ is over the same (family of) $\gentupset$ and those with `Multiple' in the second column need the algorithm to be able to handle multiple $\pd$ (for a given $\gentupset$). The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
\label{tab:lbs}
\end{table}
\mypar{Our lower bound results} In table~\ref{tab:lbs} we show that depending on what hardness result/conjecture we assume, we get various emphatic versions of {\em no} as an answer to our question. To make some sense of the other lower bounds in Table~\ref{tab:lbs}, we note that it is not too hard to show that $\timeOf{}^*(Q,\pdb) \le O\inparen{\inparen{\qruntime{Q, \gentupset}}^k}$, where $k$ is the largest degree of the query $\query$ (i.e., join width) over all result tuples $\tup$ (and the parameter that defines our family of hard queries).
\mypar{Our lower bound results} In table~\ref{tab:lbs} we show that depending on what hardness result/conjecture we assume, we get various emphatic versions of {\em no} as an answer to our question. To make some sense of the other lower bounds in Table~\ref{tab:lbs}, we note that it is not too hard to show that $\timeOf{}^*(Q,\pdb) \le O\inparen{\inparen{\qruntime{Q, \gentupset}}^k}$, where $k$ is the join width (our notion of join width follows from~\cref{def:degree-of-poly} and~\cref{fig:nxDBSemantics}.) of the query $\query$ over all result tuples $\tup$ (and the parameter that defines our family of hard queries).
What our lower bound in the third row says is that one cannot get more than a polynomial improvement over essentially the trivial algorithm for~\cref{prob:expect-mult}.
However, this result assumes a hardness conjecture that is not as well studied as those in the first two rows of the table (see \Cref{sec:hard} for more discussion on the hardness assumptions). Further, we note that existing results already imply the claimed lower bounds if we were to replace the $\qruntime{\query, \gentupset}$ by just $\abs{\gentupset}$ (indeed these results follow from known lower bound for deterministic query processing). Our contribution is to then identify a family of hard queries where deterministic query processing is `easy' but computing the expected multiplicities is hard.
@ -243,10 +244,7 @@ $\expct\limits_{\vct{\randWorld}\sim\pdassign}\pbox{\poly^2\inparen{\vct{\randWo
\end{footnotesize}
\noindent This property leads us to consider a structure related to the lineage polynomial.
\begin{Definition}\label{def:reduced-poly}
For any polynomial $\poly(\vct{X})$ corresponding to a \abbrCTIDB (henceforth, \abbrCTIDB-lineage polynomial),
%\BG{Better introduce the notion of TIDB lin poly before here, then it iis more clear?},
%Atri: Done
define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the standard monomial basis (\abbrSMB) \footnote{
For any polynomial $\poly(\vct{X})$ define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the standard monomial basis (\abbrSMB) \footnote{
This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
}
form of $\poly(\vct{X})$ to $1$.

View file

@ -22,7 +22,7 @@ Unless othewise noted, we consider all polynomials to be in \abbrSMB representat
When it is unclear, we use $\smbOf{\poly}$ to denote the \abbrSMB form of a polynomial $\poly$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Degree]\label{def:degree}
\begin{Definition}[Degree]\label{def:degree-of-poly}
The degree of polynomial $\poly(\vct{X})$ is the largest $\sum_{i=1}^n d_i$ such that $c_{(d_1,\dots,d_n)}\ne 0$. % maximum sum of exponents, over all monomials in $\smbOf{\poly(\vct{X})}$.
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

View file

@ -15,26 +15,26 @@
\node[cylinder, text width=0.28\textwidth, align=center, draw=black, text=black, cylinder uses custom fill, cylinder body fill=blue!10, aspect=0.12, minimum height=5cm, minimum width=2.5cm, cylinder end fill=blue!50, shape border rotate=90] (cylinder) at (0, 0) {
\tabcolsep=0.1cm
\begin{tabular}{>{\small}c | >{\small}c | >{\small}c}
\multicolumn{3}{c}{$\boldsymbol{OnTime}$}\\
\multicolumn{3}{c}{$\boldsymbol{T}$}\\
%\toprule
City & $\Phi$ & \textbf{p}\\
Point & $\Phi$ & $\semN$\\
\midrule
Buffalo & $A$ & 0.9 \\
Chicago & $B$ & 0.5\\
Bremen & $C$ & 0.5\\
Zurich & $E$ & 1.0\\
$e_1$ & $A$ & 1 \\
$e_2$ & $B$ & 1\\
$e_3$ & $C$ & 1\\
$e_4$ & $E$ & 1\\
\end{tabular}\\
\tabcolsep=0.05cm
%\captionof{table}{Route}
\begin{tabular}{>{\footnotesize}c | >{\footnotesize}c | >{\footnotesize}c | >{\footnotesize}c}
\multicolumn{4}{c}{$\boldsymbol{Route$}}\\
\multicolumn{4}{c}{$\boldsymbol{R$}}\\
%\toprule
$\text{City}_1$ & $\text{City}_2$ & $\Phi$ & \textbf{p} \\
$\text{Point}_1$ & $\text{Point}_2$ & $\Phi$ & $\semN$ \\
\midrule
Buffalo & Chicago & $X$ & 1.0 \\
Chicago & Zurich & $Y$ & 1.0 \\
$e_1$ & $e_2$ & $X$ & 2 \\
$e_2$ & $e_4$ & $Y$ & 4 \\
%& $\cdots$ & $\cdots$ & $\cdots$ & $\cdots$ \\
Chicago & Bremen & $Z$ & 1.0 \\
$e_2$ & $e_3$ & $Z$ & 3 \\
\end{tabular}};
%label below cylinder
\node[below=0.2 cm of cylinder]{{\LARGE$ \tupset$}};
@ -51,11 +51,11 @@
\begin{tabular}{>{\normalsize}c | >{\centering\arraybackslash\normalsize}m{1.95cm} | >{\centering\arraybackslash\small}m{1.95cm}}
%\multicolumn{3}{c}{$\boldsymbol{\query(\pdb)}$}\\[1mm]
%\toprule
City & $\Phi$ & Circuit\\% & $\expct_{\idb \sim \probDist}[\query(\db)(t)]$ \\ \hline
Point & $\Phi$ & Circuit\\% & $\expct_{\idb \sim \probDist}[\query(\db)(t)]$ \\ \hline
\midrule
%\hline
%\\\\[-3.5\medskipamount]
Buffalo & $AX$ &\resizebox{!}{10mm}{
$e_1$ & $AX$ &\resizebox{!}{10mm}{
\begin{tikzpicture}[thick]
\node[gen_tree_node](sink) at (0.5, 0.8){$\boldsymbol{\circmult}$};
\node[gen_tree_node](source1) at (0, 0){$A$};
@ -64,7 +64,7 @@
\draw[->] (source2)--(sink);
\end{tikzpicture}% & $0.5 \cdot 1.0 + 0.5 \cdot 1.0 = 1.0$
}\\% & $0.9$ \\
Chicago & $B(Y + Z)$\newline \text{Or}\newline $BY+ BZ$&
$e_2$ & $B(Y + Z)$\newline \text{Or}\newline $BY+ BZ$&
\resizebox{!}{16mm} {
\begin{tikzpicture}[thick]
\node[gen_tree_node] (a1) at (1, 0){$Y$};
@ -116,17 +116,17 @@
\begin{tabular}{>{\small}c | >{\centering\arraybackslash\small}m{1.95cm}}
%\multicolumn{2}{c}{$\expct\pbox{\poly(\vct{X})}$}\\[1mm]
%\toprule
City & $\mathbb{E}[\poly(\vct{X})]$\\
Point & $\mathbb{E}[\poly(\vct{X})]$\\
\midrule%[0.05pt]
Buffalo & $1.0 \cdot 0.9 = 0.9$\\[3mm]
Chicago & $(0.5 \cdot 1.0) + $\newline $\hspace{0.2cm}(0.5 \cdot 1.0)$\newline $= 1.0$\\
$e_1$ & $A\cdot\probOf\pbox{A = 1}\inparen{X\cdot\probOf\pbox{X = 1} + X\cdot\probOf\pbox{X = 2}}$\\[2mm]%$1.0 \cdot 0.9 = 0.9$\\[3mm]
$e_2$ & $(0.5 \cdot 1.0) + $\newline $\hspace{0.2cm}(0.5 \cdot 1.0)$\newline $= 1.0$\\
\end{tabular}
};
%label of rounded rectangle
\node[below=0.2cm of rrect]{{\LARGE $\expct\pbox{\poly(\vct{X})}$}};
\end{tikzpicture}
}
\caption{Intensional Query Evaluation Model ($\query = \project_{\text{City}}\inparen{Route\join_{\text{City}_1 = City}OnTime}$).}
\caption{Intensional Query Evaluation Model ($\query = \project_{\text{City}}\inparen{T\join_{\text{City} = \text{City}_1}R}$).}
\label{fig:two-step}
\end{figure}