Rearranged some figure/table.
parent
160e959c73
commit
a9aea5ecc3
|
@ -19,6 +19,39 @@ An $\raPlus$ query is a query expressed in positive relational algebra, i.e., us
|
|||
}
|
||||
$\query$, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\rvworld\sim\bpd}\pbox{\query\inparen{\rvworld}\inparen{\tup}}$.
|
||||
\end{Problem}
|
||||
|
||||
\begin{figure}[t!]
|
||||
\begin{align*}
|
||||
&\begin{aligned}[t]
|
||||
&\polyqdt{\project_A(\query)}{\gentupset}{\tup} =\\
|
||||
&~~\sum_{\tup': \project_A(\tup') = \tup} \polyqdt{\query}{\gentupset}{\tup'}
|
||||
\end{aligned}
|
||||
&
|
||||
&\begin{aligned}[t]
|
||||
&\polyqdt{\query_1 \union \query_2}{\gentupset}{\tup} =\\
|
||||
&\qquad \polyqdt{\query_1}{\gentupset}{\tup} + \polyqdt{\query_2}{\gentupset}{\tup}\\
|
||||
\end{aligned}\\
|
||||
&\begin{aligned}
|
||||
&\polyqdt{\select_\theta(\query)}{\gentupset}{\tup} =\\
|
||||
&~~ \begin{cases}
|
||||
\polyqdt{\query}{\gentupset}{\tup} & \text{if }\theta(\tup) \\
|
||||
0 & \text{otherwise}.
|
||||
\end{cases}
|
||||
\end{aligned}
|
||||
&
|
||||
&\begin{aligned}
|
||||
&\polyqdt{\query_1 \join \query_2}{\gentupset}{\tup} =\\
|
||||
&\qquad\polyqdt{\query_1}{\gentupset}{\project_{\attr{\query_1}}{\tup}}\\
|
||||
&\qquad\cdot\polyqdt{\query_2}{\gentupset}{\project_{\attr{\query_2}}{\tup}}
|
||||
\end{aligned}\\
|
||||
&&&\polyqdt{\rel}{\gentupset}{\tup} = X_\tup
|
||||
\end{align*}%\\[-10mm]
|
||||
\setlength{\abovecaptionskip}{-0.25cm}
|
||||
\caption{Construction of the lineage (polynomial) for an $\raPlus$ query $\query$ over an arbitrary deterministic database $\gentupset$, where $\vct{X}$ consists of all $X_\tup$ over all $\rel$ in $\gentupset$ and $\tup$ in $\rel$. Here $\gentupset.\rel$ denotes the instance of relation $\rel$ in $\gentupset$. Please note, after we introduce the reduction to $1$-\abbrBIDB, the base case will be expressed alternatively.}
|
||||
\label{fig:nxDBSemantics}
|
||||
\vspace{-0.53cm}
|
||||
\end{figure}
|
||||
|
||||
It is natural to explore computing the expected multiplicity of a result tuple as this is the analog for computing the marginal probability of a tuple in a set \abbrPDB.
|
||||
In this work we will assume that $c =\bigO{1}$ since this is what is typically seen in practice.
|
||||
Allowing for unbounded $c$ is an interesting open problem.
|
||||
|
@ -32,7 +65,8 @@ Specifically, in this work we ask if~\Cref{prob:expect-mult} can be solved in ti
|
|||
|
||||
Let $\qruntime{\query,\gentupset,\bound}$ (see~\Cref{sec:gen} for further details) denote the runtime for query $\query$, deterministic database $\gentupset$, and multiplicity bound $\bound$. This paper considers $\raPlus$ queries for which order of operations is \emph{explicit}, as opposed to other query languages, e.g. Datalog, UCQ. Thus, since order of operations affects runtime, we denote the optimized $\raPlus$ query picked by an arbitrary production system as $\optquery{\query} = \min_{\query'\in\raPlus, \query'\equiv\query}\qruntime{\query', \gentupset, \bound}$. Then $\qruntime{\optquery{\query}, \gentupset,\bound}$ is the runtime for the optimized query.\footnote{Note that our work applies to any $\query \in\raPlus$, which implies that specific heuristics for choosing an optimized query can be abstracted away, i.e., our work does not consider heuristic techniques.}
|
||||
|
||||
\begin{table}[t!]
|
||||
\begin{table*}[t!]
|
||||
\centering
|
||||
\begin{tabular}{|p{0.43\textwidth}|p{0.12\textwidth}|p{0.35\textwidth}|}
|
||||
\hline
|
||||
\textbf{Lower bound on $\timeOf{}^*(\qhard,\pdb)$} & \textbf{Num.} $\bpd$s
|
||||
|
@ -45,7 +79,9 @@ $\Omega\inparen{\inparen{\qruntime{\optquery{\qhard}, \tupset, \bound}}^{c_0\cdo
|
|||
\end{tabular}
|
||||
\caption{Our lower bounds for a specific hard query $\qhard$ parameterized by $k$. For $\pdb = \inset{\worlds, \bpd}$ those with `Multiple' in the second column need the algorithm to be able to handle multiple $\bpd$, i.e. probability distributions (for a given $\tupset$). The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
|
||||
\label{tab:lbs}
|
||||
\end{table}
|
||||
\vspace{-0.73cm}
|
||||
\end{table*}
|
||||
|
||||
\mypar{Our lower bound results}
|
||||
Our question is whether or not it is always true that $\timeOf{}^*\inparen{\query, \pdb}\leq\qruntime{\optquery{\query}, \tupset, \bound}$. Unfortunately this is not the case.
|
||||
~\Cref{tab:lbs} shows our results.
|
||||
|
@ -64,26 +100,6 @@ Further, our approximation algorithm works for a more general notion of bag \abb
|
|||
|
||||
\subsection{Polynomial Equivalence}\label{sec:intro-poly-equiv}
|
||||
A common encoding of probabilistic databases (e.g., in \cite{IL84a,Imielinski1989IncompleteII,Antova_fastand,DBLP:conf/vldb/AgrawalBSHNSW06} and many others) relies on annotating tuples with lineages or propositional formulas that describe the set of possible worlds that the tuple appears in. The bag semantics analog is a provenance/lineage polynomial (see~\Cref{fig:nxDBSemantics}) $\apolyqdt$~\cite{DBLP:conf/pods/GreenKT07}, a polynomial with non-zero integer coefficients and exponents, over variables $\vct{X}$ encoding input tuple multiplicities. Evaluating a lineage polynomial for a query result tuple $t_{out}$ by, for each tuple $\tup_{in}$, assigning the variable $X_{t_{in}}$ encoding the tuple's multiplicity to the tuple's multiplicity in the possible world yields the multiplicity of the $\tup_{out}$ in the query result for this world.
|
||||
\begin{figure}[b!]
|
||||
\begin{align*}
|
||||
\polyqdt{\project_A(\query)}{\gentupset}{\tup} =& \sum_{\tup': \project_A(\tup') = \tup} \polyqdt{\query}{\gentupset}{\tup'} &
|
||||
\polyqdt{\query_1 \union \query_2}{\gentupset}{\tup} =& \polyqdt{\query_1}{\gentupset}{\tup} + \polyqdt{\query_2}{\gentupset}{\tup}\\
|
||||
\polyqdt{\select_\theta(\query)}{\gentupset}{\tup} =& \begin{cases}
|
||||
\polyqdt{\query}{\gentupset}{\tup} & \text{if }\theta(\tup) \\
|
||||
0 & \text{otherwise}.
|
||||
\end{cases} &
|
||||
\begin{aligned}
|
||||
\polyqdt{\query_1 \join \query_2}{\gentupset}{\tup} =\\ ~
|
||||
\end{aligned}&
|
||||
\begin{aligned}
|
||||
&\polyqdt{\query_1}{\gentupset}{\project_{\attr{\query_1}}{\tup}} \\
|
||||
&~~~\cdot\polyqdt{\query_2}{\gentupset}{\project_{\attr{\query_2}}{\tup}}
|
||||
\end{aligned}\\
|
||||
& & & \polyqdt{\rel}{\gentupset}{\tup} = X_\tup
|
||||
\end{align*}\\[-10mm]
|
||||
\caption{Construction of the lineage (polynomial) for an $\raPlus$ query $\query$ over an arbitrary deterministic database $\gentupset$, where $\vct{X}$ consists of all $X_\tup$ over all $\rel$ in $\gentupset$ and $\tup$ in $\rel$. Here $\gentupset.\rel$ denotes the instance of relation $\rel$ in $\gentupset$. Please note, after we introduce the reduction to $1$-\abbrBIDB, the base case will be expressed alternatively.}
|
||||
\label{fig:nxDBSemantics}
|
||||
\end{figure}
|
||||
|
||||
We drop $\query$, $\tupset$, and $\tup$ from $\apolyqdt$ when they are clear from the context or irrelevant to the discussion. We now specify the problem of computing the expectation of tuple multiplicity in the language of lineage polynomials:
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
|
BIN
main.synctex.gz
BIN
main.synctex.gz
Binary file not shown.
Loading…
Reference in New Issue