Changes to 2-step intensional figure.
This commit is contained in:
parent
5e00f09da6
commit
14f0eb9adf
|
@ -4,20 +4,21 @@
|
||||||
|
|
||||||
\secrev{
|
\secrev{
|
||||||
This work explores the problem of computing the expectation of a tuple's multiplicity in an important special case of bag \abbrTIDB, which we call a \abbrCTIDB. A \abbrCTIDB,
|
This work explores the problem of computing the expectation of a tuple's multiplicity in an important special case of bag \abbrTIDB, which we call a \abbrCTIDB. A \abbrCTIDB,
|
||||||
$\pdb = \inparen{\worlds, \bpd}$ encodes a bag of uncertain tuples such that each tuple in $\pdb$ has a multiplicity of at most $\bound$. The set of all worlds is encoded in $\worlds$, which is the set of all vectors of length $\abs{\tupset}$ such that each index corresponds to a distinct $\tup \in \tupset$ storing its multiplicity. $\bpd$ is a product distribution over the set of all worlds. A given world $\worldvec = \inset{0,\ldots, \bound}^{\abs{\tupset}}$ can be interpreted such that, for each $\tup \in \tupset$, $\worldvec\pbox{\tup}$ is the multiplicity of $\tup$ in $\worldvec$. The resulting product distribution can then be encoded as $\prob_{\tup} = \probOf\pbox{W\pbox{i} = j}$ (for $j \in\pbox{\bound}$), where each distribution is independent for $\tup \in \tupset$.
|
$\pdb = \inparen{\worlds, \bpd}$ encodes a bag of uncertain tuples such that each tuple in $\pdb$ has a multiplicity of at most $\bound$. The set of all worlds is encoded in $\worlds$, which is the set of all vectors of length $\abs{\tupset}$ such that each index corresponds to a distinct $\tup \in \tupset$ storing its multiplicity. $\bpd$ is a product distribution over the set of all worlds. A given world $\worldvec = \inset{0,\ldots, \bound}^{\abs{\tupset}}$ can be interpreted such that, for each $\tup \in \tupset$, $\worldvec\pbox{\tup}$ is the multiplicity of $\tup$ in $\worldvec$. The resulting product distribution can then be encoded as $\prob_{\tup} = \probOf\pbox{W\pbox{\tup} = j}$ (for $j \in\pbox{\bound}$), where each %distribution
|
||||||
|
$\tup$ is an independent random event. %for $\tup \in \tupset$.
|
||||||
}
|
}
|
||||||
%\mypar{For a later section}
|
%\mypar{For a later section}
|
||||||
%\sout{
|
%\sout{
|
||||||
%Since each tuple in $\pdb$ has a mutually exclusive probability distribution over its possible multiplicities, it is natural to reduce a \abbrCTIDB to traditional (set) block independent database (\abbrBIDB). We refer to the reduced \abbrBIDB as a $1$-\abbrBIDB, as it is the case that each tuple can appear in a possible world at most $c = 1$ time. \Cref{fig:ctidb-red} shows an example of this reduction.
|
%Since each tuple in $\pdb$ has a mutually exclusive probability distribution over its possible multiplicities, it is natural to reduce a \abbrCTIDB to traditional (set) block independent database (\abbrBIDB). We refer to the reduced \abbrBIDB as a $1$-\abbrBIDB, as it is the case that each tuple can appear in a possible world at most $c = 1$ time. \Cref{fig:ctidb-red} shows an example of this reduction.
|
||||||
%}
|
%}
|
||||||
\secrev{
|
\secrev{
|
||||||
Allowing for $\leq \bound$ multiplicities across all tuples gives rise to having $\leq \inparen{\bound+1}^\numvar$ possible worlds instead of the usual $2^\numvar$ possible worlds of a $1$-\abbrTIDB$, which (assuming set query semantics), is the same as the traditional set \abbrTIDB.
|
Allowing for $\leq \bound$ multiplicities across all tuples gives rise to having $\leq \inparen{\bound+1}^\numvar$ possible worlds instead of the usual $2^\numvar$ possible worlds of a $1$-\abbrTIDB, which (assuming set query semantics), is the same as the traditional set \abbrTIDB.
|
||||||
In this work, since we are generally considering bag query input, we will only be considering bag query semantics.
|
In this work, since we are generally considering bag query input, we will only be considering bag query semantics. We denote by $\query\inparen{\vct{W}}\inparen{\tup}$ the multiplicity of $\tup$ in query $\query$ over possible world $\vct{W}\in\worlds$.
|
||||||
|
|
||||||
We can formally state this problem as:
|
We can formally state this problem as:
|
||||||
|
|
||||||
\begin{Problem}\label{prob:expect-mult}
|
\begin{Problem}\label{prob:expect-mult}
|
||||||
Given a \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query $\query$, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\randDB\sim\bpd}\pbox{\query\inparen{\randDB}\inparen{\tup}}$.
|
Given a \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query $\query$, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\vct{W}\sim\bpd}\pbox{\query\inparen{\vct{W}}\inparen{\tup}}$.
|
||||||
\end{Problem}
|
\end{Problem}
|
||||||
\AH{I \emph{think} we use $\randDB$ to denote something different in one of the proofs. Have to keep an eye open for this to avoid overloading notation.}
|
\AH{I \emph{think} we use $\randDB$ to denote something different in one of the proofs. Have to keep an eye open for this to avoid overloading notation.}
|
||||||
|
|
||||||
|
@ -144,7 +145,7 @@ $\Omega\inparen{\inparen{\qruntime{\query, \gentupset}}^{c_0\cdot k}}$ for {\em
|
||||||
\caption{Our lower bounds for a specific hard query $Q$ parameterized by $k$. The $\pdb$ is over the same (family of) $\gentupset$ and those with `Multiple' in the second column need the algorithm to be able to handle multiple $\pd$ (for a given $\gentupset$). The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
|
\caption{Our lower bounds for a specific hard query $Q$ parameterized by $k$. The $\pdb$ is over the same (family of) $\gentupset$ and those with `Multiple' in the second column need the algorithm to be able to handle multiple $\pd$ (for a given $\gentupset$). The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
|
||||||
\label{tab:lbs}
|
\label{tab:lbs}
|
||||||
\end{table}
|
\end{table}
|
||||||
\mypar{Our lower bound results} In table~\ref{tab:lbs} we show that depending on what hardness result/conjecture we assume, we get various emphatic versions of {\em no} as an answer to our question. To make some sense of the other lower bounds in Table~\ref{tab:lbs}, we note that it is not too hard to show that $\timeOf{}^*(Q,\pdb) \le O\inparen{\inparen{\qruntime{Q, \gentupset}}^k}$, where $k$ is the largest degree of the query $\query$ (i.e., join width) over all result tuples $\tup$ (and the parameter that defines our family of hard queries).
|
\mypar{Our lower bound results} In table~\ref{tab:lbs} we show that depending on what hardness result/conjecture we assume, we get various emphatic versions of {\em no} as an answer to our question. To make some sense of the other lower bounds in Table~\ref{tab:lbs}, we note that it is not too hard to show that $\timeOf{}^*(Q,\pdb) \le O\inparen{\inparen{\qruntime{Q, \gentupset}}^k}$, where $k$ is the join width (our notion of join width follows from~\cref{def:degree-of-poly} and~\cref{fig:nxDBSemantics}.) of the query $\query$ over all result tuples $\tup$ (and the parameter that defines our family of hard queries).
|
||||||
|
|
||||||
What our lower bound in the third row says is that one cannot get more than a polynomial improvement over essentially the trivial algorithm for~\cref{prob:expect-mult}.
|
What our lower bound in the third row says is that one cannot get more than a polynomial improvement over essentially the trivial algorithm for~\cref{prob:expect-mult}.
|
||||||
However, this result assumes a hardness conjecture that is not as well studied as those in the first two rows of the table (see \Cref{sec:hard} for more discussion on the hardness assumptions). Further, we note that existing results already imply the claimed lower bounds if we were to replace the $\qruntime{\query, \gentupset}$ by just $\abs{\gentupset}$ (indeed these results follow from known lower bound for deterministic query processing). Our contribution is to then identify a family of hard queries where deterministic query processing is `easy' but computing the expected multiplicities is hard.
|
However, this result assumes a hardness conjecture that is not as well studied as those in the first two rows of the table (see \Cref{sec:hard} for more discussion on the hardness assumptions). Further, we note that existing results already imply the claimed lower bounds if we were to replace the $\qruntime{\query, \gentupset}$ by just $\abs{\gentupset}$ (indeed these results follow from known lower bound for deterministic query processing). Our contribution is to then identify a family of hard queries where deterministic query processing is `easy' but computing the expected multiplicities is hard.
|
||||||
|
@ -243,10 +244,7 @@ $\expct\limits_{\vct{\randWorld}\sim\pdassign}\pbox{\poly^2\inparen{\vct{\randWo
|
||||||
\end{footnotesize}
|
\end{footnotesize}
|
||||||
\noindent This property leads us to consider a structure related to the lineage polynomial.
|
\noindent This property leads us to consider a structure related to the lineage polynomial.
|
||||||
\begin{Definition}\label{def:reduced-poly}
|
\begin{Definition}\label{def:reduced-poly}
|
||||||
For any polynomial $\poly(\vct{X})$ corresponding to a \abbrCTIDB (henceforth, \abbrCTIDB-lineage polynomial),
|
For any polynomial $\poly(\vct{X})$ define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the standard monomial basis (\abbrSMB) \footnote{
|
||||||
%\BG{Better introduce the notion of TIDB lin poly before here, then it iis more clear?},
|
|
||||||
%Atri: Done
|
|
||||||
define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in the standard monomial basis (\abbrSMB) \footnote{
|
|
||||||
This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
|
This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
|
||||||
}
|
}
|
||||||
form of $\poly(\vct{X})$ to $1$.
|
form of $\poly(\vct{X})$ to $1$.
|
||||||
|
|
|
@ -22,7 +22,7 @@ Unless othewise noted, we consider all polynomials to be in \abbrSMB representat
|
||||||
When it is unclear, we use $\smbOf{\poly}$ to denote the \abbrSMB form of a polynomial $\poly$.
|
When it is unclear, we use $\smbOf{\poly}$ to denote the \abbrSMB form of a polynomial $\poly$.
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\begin{Definition}[Degree]\label{def:degree}
|
\begin{Definition}[Degree]\label{def:degree-of-poly}
|
||||||
The degree of polynomial $\poly(\vct{X})$ is the largest $\sum_{i=1}^n d_i$ such that $c_{(d_1,\dots,d_n)}\ne 0$. % maximum sum of exponents, over all monomials in $\smbOf{\poly(\vct{X})}$.
|
The degree of polynomial $\poly(\vct{X})$ is the largest $\sum_{i=1}^n d_i$ such that $c_{(d_1,\dots,d_n)}\ne 0$. % maximum sum of exponents, over all monomials in $\smbOf{\poly(\vct{X})}$.
|
||||||
\end{Definition}
|
\end{Definition}
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
|
|
@ -15,26 +15,26 @@
|
||||||
\node[cylinder, text width=0.28\textwidth, align=center, draw=black, text=black, cylinder uses custom fill, cylinder body fill=blue!10, aspect=0.12, minimum height=5cm, minimum width=2.5cm, cylinder end fill=blue!50, shape border rotate=90] (cylinder) at (0, 0) {
|
\node[cylinder, text width=0.28\textwidth, align=center, draw=black, text=black, cylinder uses custom fill, cylinder body fill=blue!10, aspect=0.12, minimum height=5cm, minimum width=2.5cm, cylinder end fill=blue!50, shape border rotate=90] (cylinder) at (0, 0) {
|
||||||
\tabcolsep=0.1cm
|
\tabcolsep=0.1cm
|
||||||
\begin{tabular}{>{\small}c | >{\small}c | >{\small}c}
|
\begin{tabular}{>{\small}c | >{\small}c | >{\small}c}
|
||||||
\multicolumn{3}{c}{$\boldsymbol{OnTime}$}\\
|
\multicolumn{3}{c}{$\boldsymbol{T}$}\\
|
||||||
%\toprule
|
%\toprule
|
||||||
City & $\Phi$ & \textbf{p}\\
|
Point & $\Phi$ & $\semN$\\
|
||||||
\midrule
|
\midrule
|
||||||
Buffalo & $A$ & 0.9 \\
|
$e_1$ & $A$ & 1 \\
|
||||||
Chicago & $B$ & 0.5\\
|
$e_2$ & $B$ & 1\\
|
||||||
Bremen & $C$ & 0.5\\
|
$e_3$ & $C$ & 1\\
|
||||||
Zurich & $E$ & 1.0\\
|
$e_4$ & $E$ & 1\\
|
||||||
\end{tabular}\\
|
\end{tabular}\\
|
||||||
\tabcolsep=0.05cm
|
\tabcolsep=0.05cm
|
||||||
%\captionof{table}{Route}
|
%\captionof{table}{Route}
|
||||||
\begin{tabular}{>{\footnotesize}c | >{\footnotesize}c | >{\footnotesize}c | >{\footnotesize}c}
|
\begin{tabular}{>{\footnotesize}c | >{\footnotesize}c | >{\footnotesize}c | >{\footnotesize}c}
|
||||||
\multicolumn{4}{c}{$\boldsymbol{Route$}}\\
|
\multicolumn{4}{c}{$\boldsymbol{R$}}\\
|
||||||
%\toprule
|
%\toprule
|
||||||
$\text{City}_1$ & $\text{City}_2$ & $\Phi$ & \textbf{p} \\
|
$\text{Point}_1$ & $\text{Point}_2$ & $\Phi$ & $\semN$ \\
|
||||||
\midrule
|
\midrule
|
||||||
Buffalo & Chicago & $X$ & 1.0 \\
|
$e_1$ & $e_2$ & $X$ & 2 \\
|
||||||
Chicago & Zurich & $Y$ & 1.0 \\
|
$e_2$ & $e_4$ & $Y$ & 4 \\
|
||||||
%& $\cdots$ & $\cdots$ & $\cdots$ & $\cdots$ \\
|
%& $\cdots$ & $\cdots$ & $\cdots$ & $\cdots$ \\
|
||||||
Chicago & Bremen & $Z$ & 1.0 \\
|
$e_2$ & $e_3$ & $Z$ & 3 \\
|
||||||
\end{tabular}};
|
\end{tabular}};
|
||||||
%label below cylinder
|
%label below cylinder
|
||||||
\node[below=0.2 cm of cylinder]{{\LARGE$ \tupset$}};
|
\node[below=0.2 cm of cylinder]{{\LARGE$ \tupset$}};
|
||||||
|
@ -51,11 +51,11 @@
|
||||||
\begin{tabular}{>{\normalsize}c | >{\centering\arraybackslash\normalsize}m{1.95cm} | >{\centering\arraybackslash\small}m{1.95cm}}
|
\begin{tabular}{>{\normalsize}c | >{\centering\arraybackslash\normalsize}m{1.95cm} | >{\centering\arraybackslash\small}m{1.95cm}}
|
||||||
%\multicolumn{3}{c}{$\boldsymbol{\query(\pdb)}$}\\[1mm]
|
%\multicolumn{3}{c}{$\boldsymbol{\query(\pdb)}$}\\[1mm]
|
||||||
%\toprule
|
%\toprule
|
||||||
City & $\Phi$ & Circuit\\% & $\expct_{\idb \sim \probDist}[\query(\db)(t)]$ \\ \hline
|
Point & $\Phi$ & Circuit\\% & $\expct_{\idb \sim \probDist}[\query(\db)(t)]$ \\ \hline
|
||||||
\midrule
|
\midrule
|
||||||
%\hline
|
%\hline
|
||||||
%\\\\[-3.5\medskipamount]
|
%\\\\[-3.5\medskipamount]
|
||||||
Buffalo & $AX$ &\resizebox{!}{10mm}{
|
$e_1$ & $AX$ &\resizebox{!}{10mm}{
|
||||||
\begin{tikzpicture}[thick]
|
\begin{tikzpicture}[thick]
|
||||||
\node[gen_tree_node](sink) at (0.5, 0.8){$\boldsymbol{\circmult}$};
|
\node[gen_tree_node](sink) at (0.5, 0.8){$\boldsymbol{\circmult}$};
|
||||||
\node[gen_tree_node](source1) at (0, 0){$A$};
|
\node[gen_tree_node](source1) at (0, 0){$A$};
|
||||||
|
@ -64,7 +64,7 @@
|
||||||
\draw[->] (source2)--(sink);
|
\draw[->] (source2)--(sink);
|
||||||
\end{tikzpicture}% & $0.5 \cdot 1.0 + 0.5 \cdot 1.0 = 1.0$
|
\end{tikzpicture}% & $0.5 \cdot 1.0 + 0.5 \cdot 1.0 = 1.0$
|
||||||
}\\% & $0.9$ \\
|
}\\% & $0.9$ \\
|
||||||
Chicago & $B(Y + Z)$\newline \text{Or}\newline $BY+ BZ$&
|
$e_2$ & $B(Y + Z)$\newline \text{Or}\newline $BY+ BZ$&
|
||||||
\resizebox{!}{16mm} {
|
\resizebox{!}{16mm} {
|
||||||
\begin{tikzpicture}[thick]
|
\begin{tikzpicture}[thick]
|
||||||
\node[gen_tree_node] (a1) at (1, 0){$Y$};
|
\node[gen_tree_node] (a1) at (1, 0){$Y$};
|
||||||
|
@ -116,17 +116,17 @@
|
||||||
\begin{tabular}{>{\small}c | >{\centering\arraybackslash\small}m{1.95cm}}
|
\begin{tabular}{>{\small}c | >{\centering\arraybackslash\small}m{1.95cm}}
|
||||||
%\multicolumn{2}{c}{$\expct\pbox{\poly(\vct{X})}$}\\[1mm]
|
%\multicolumn{2}{c}{$\expct\pbox{\poly(\vct{X})}$}\\[1mm]
|
||||||
%\toprule
|
%\toprule
|
||||||
City & $\mathbb{E}[\poly(\vct{X})]$\\
|
Point & $\mathbb{E}[\poly(\vct{X})]$\\
|
||||||
\midrule%[0.05pt]
|
\midrule%[0.05pt]
|
||||||
Buffalo & $1.0 \cdot 0.9 = 0.9$\\[3mm]
|
$e_1$ & $A\cdot\probOf\pbox{A = 1}\inparen{X\cdot\probOf\pbox{X = 1} + X\cdot\probOf\pbox{X = 2}}$\\[2mm]%$1.0 \cdot 0.9 = 0.9$\\[3mm]
|
||||||
Chicago & $(0.5 \cdot 1.0) + $\newline $\hspace{0.2cm}(0.5 \cdot 1.0)$\newline $= 1.0$\\
|
$e_2$ & $(0.5 \cdot 1.0) + $\newline $\hspace{0.2cm}(0.5 \cdot 1.0)$\newline $= 1.0$\\
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
};
|
};
|
||||||
%label of rounded rectangle
|
%label of rounded rectangle
|
||||||
\node[below=0.2cm of rrect]{{\LARGE $\expct\pbox{\poly(\vct{X})}$}};
|
\node[below=0.2cm of rrect]{{\LARGE $\expct\pbox{\poly(\vct{X})}$}};
|
||||||
\end{tikzpicture}
|
\end{tikzpicture}
|
||||||
}
|
}
|
||||||
\caption{Intensional Query Evaluation Model ($\query = \project_{\text{City}}\inparen{Route\join_{\text{City}_1 = City}OnTime}$).}
|
\caption{Intensional Query Evaluation Model ($\query = \project_{\text{City}}\inparen{T\join_{\text{City} = \text{City}_1}R}$).}
|
||||||
\label{fig:two-step}
|
\label{fig:two-step}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue