paper-BagRelationalPDBsAreHard/arXiv/circuits-model-runtime.tex

47 lines
3.2 KiB
TeX

%!TEX root=./main.tex
\subsection{Relationship to Deterministic Query Runtimes}\label{sec:gen}
In~\Cref{sec:intro}, we introduced the structure $T_{det}\inparen{\cdot}$ to analyze the runtime complexity of~\Cref{prob:expect-mult}.
To decouple our results from specific join algorithms, we first lower bound the cost of a join.
\begin{Definition}[Join Cost]
\label{def:join-cost}
Denote by $\jointime{R_1, \ldots, R_m}$ the runtime of an algorithm for computing the $m$-ary join $R_1 \bowtie \ldots \bowtie R_m$.
We require only that the algorithm must enumerate its output, i.e., that $\jointime{R_1, \ldots, R_m} \geq |R_1 \bowtie \ldots \bowtie R_m|$. With this definition of $\jointime{\cdot}$, worst-case optimal join algorithms are handled.
\end{Definition}
Worst-case optimal join algorithms~\cite{skew,ngo-survey} and query evaluation via factorized databases~\cite{factorized-db} (as well as work on FAQs~\cite{DBLP:conf/pods/KhamisNR16}) can be modeled as $\raPlus$ queries (though the query size is data dependent).
For these algorithms, $\jointime{R_1, \ldots, R_n}$ is linear in the {\em AGM bound}~\cite{AGM}.
Our cost model for general query evaluation follows from the join cost:
\noindent\resizebox{1\linewidth}{!}{
\begin{minipage}{1.0\linewidth}
\begin{align*}
\qruntimenoopt{R,\gentupset,\bound} & = |\gentupset.R| &
\qruntimenoopt{\sigma \query, \gentupset,\bound} & = \qruntimenoopt{\query,\gentupset} &
\qruntimenoopt{\pi \query, \gentupset,\bound} & = \qruntimenoopt{\query,\gentupset,\bound} + \abs{\query(\gentupset)}
\end{align*}\\[-15mm]
\begin{align*}
\qruntimenoopt{\query \cup \query', \gentupset,\bound} & = \qruntimenoopt{\query, \gentupset,\bound} +
\qruntimenoopt{\query', \gentupset,\bound} +
\abs{\query\inparen{\gentupset}}+\abs{\query'\inparen{\gentupset}} \\
\qruntimenoopt{\query_1 \bowtie \ldots \bowtie \query_m, \gentupset,\bound}
& = \qruntimenoopt{\query_1, \gentupset,\bound} + \ldots +
\qruntimenoopt{\query_m,\gentupset,\bound} +
\jointime{\query_1(\gentupset), \ldots, \query_m(\gentupset)}
\end{align*}
\end{minipage}
}\\
Under this model, an $\raPlus$ query $\query$ evaluated over database $\gentupset$ has runtime $O(\qruntimenoopt{Q,\gentupset, \bound})$.
We assume that full table scans are used for every base relation access. We can model index scans by treating an index scan query $\sigma_\theta(R)$ as a base relation.
\Cref{lem:circ-model-runtime} and \Cref{lem:tlc-is-the-same-as-det} show that for any $\raPlus$ query $\query$ and $\tupset$, there exists a circuit $\circuit^*$ such that $\timeOf{\abbrStepOne}(Q,\tupset,\circuit^*)$ and $|\circuit^*|$ are both $O(\qruntimenoopt{\optquery{\query}, \tupset,\bound})$. Recall we assumed these two bounds when we moved from \Cref{prob:big-o-joint-steps} to \Cref{prob:intro-stmt}. Lastly, we can handle FAQs and factorized databases by allowing for optimization, i.e. $\optquery{\query}$.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: