47 lines
3.2 KiB
TeX
47 lines
3.2 KiB
TeX
%!TEX root=./main.tex
|
|
|
|
\subsection{Relationship to Deterministic Query Runtimes}\label{sec:gen}
|
|
In~\Cref{sec:intro}, we introduced the structure $T_{det}\inparen{\cdot}$ to analyze the runtime complexity of~\Cref{prob:expect-mult}.
|
|
To decouple our results from specific join algorithms, we first lower bound the cost of a join.
|
|
|
|
\begin{Definition}[Join Cost]
|
|
\label{def:join-cost}
|
|
Denote by $\jointime{R_1, \ldots, R_m}$ the runtime of an algorithm for computing the $m$-ary join $R_1 \bowtie \ldots \bowtie R_m$.
|
|
We require only that the algorithm must enumerate its output, i.e., that $\jointime{R_1, \ldots, R_m} \geq |R_1 \bowtie \ldots \bowtie R_m|$. With this definition of $\jointime{\cdot}$, worst-case optimal join algorithms are handled.
|
|
\end{Definition}
|
|
|
|
Worst-case optimal join algorithms~\cite{skew,ngo-survey} and query evaluation via factorized databases~\cite{factorized-db} (as well as work on FAQs~\cite{DBLP:conf/pods/KhamisNR16}) can be modeled as $\raPlus$ queries (though the query size is data dependent).
|
|
For these algorithms, $\jointime{R_1, \ldots, R_n}$ is linear in the {\em AGM bound}~\cite{AGM}.
|
|
Our cost model for general query evaluation follows from the join cost:
|
|
|
|
\noindent\resizebox{1\linewidth}{!}{
|
|
\begin{minipage}{1.0\linewidth}
|
|
\begin{align*}
|
|
\qruntimenoopt{R,\gentupset,\bound} & = |\gentupset.R| &
|
|
\qruntimenoopt{\sigma \query, \gentupset,\bound} & = \qruntimenoopt{\query,\gentupset} &
|
|
\qruntimenoopt{\pi \query, \gentupset,\bound} & = \qruntimenoopt{\query,\gentupset,\bound} + \abs{\query(\gentupset)}
|
|
\end{align*}\\[-15mm]
|
|
\begin{align*}
|
|
\qruntimenoopt{\query \cup \query', \gentupset,\bound} & = \qruntimenoopt{\query, \gentupset,\bound} +
|
|
\qruntimenoopt{\query', \gentupset,\bound} +
|
|
\abs{\query\inparen{\gentupset}}+\abs{\query'\inparen{\gentupset}} \\
|
|
\qruntimenoopt{\query_1 \bowtie \ldots \bowtie \query_m, \gentupset,\bound}
|
|
& = \qruntimenoopt{\query_1, \gentupset,\bound} + \ldots +
|
|
\qruntimenoopt{\query_m,\gentupset,\bound} +
|
|
\jointime{\query_1(\gentupset), \ldots, \query_m(\gentupset)}
|
|
\end{align*}
|
|
\end{minipage}
|
|
}\\
|
|
|
|
|
|
Under this model, an $\raPlus$ query $\query$ evaluated over database $\gentupset$ has runtime $O(\qruntimenoopt{Q,\gentupset, \bound})$.
|
|
We assume that full table scans are used for every base relation access. We can model index scans by treating an index scan query $\sigma_\theta(R)$ as a base relation.
|
|
|
|
\Cref{lem:circ-model-runtime} and \Cref{lem:tlc-is-the-same-as-det} show that for any $\raPlus$ query $\query$ and $\tupset$, there exists a circuit $\circuit^*$ such that $\timeOf{\abbrStepOne}(Q,\tupset,\circuit^*)$ and $|\circuit^*|$ are both $O(\qruntimenoopt{\optquery{\query}, \tupset,\bound})$. Recall we assumed these two bounds when we moved from \Cref{prob:big-o-joint-steps} to \Cref{prob:intro-stmt}. Lastly, we can handle FAQs and factorized databases by allowing for optimization, i.e. $\optquery{\query}$.
|
|
|
|
|
|
%%% Local Variables:
|
|
%%% mode: latex
|
|
%%% TeX-master: "main"
|
|
%%% End:
|