paper-KeepItSimple/sections/dynamic.tex

% -*- root: ../main.tex -*-


\begin{figure}
\centering
\includegraphics[width=.90\linewidth]{figures/graph_freqtime_micro.pdf}
\bfcaption{Intermittent workloads hurt runtime 2 ways:  directly, by sleeping and indirectly, by inducing slower CPU speeds}
\label{fig:speed_time_delay}
\end{figure}

%\subsection{The cost and problems of complex speed micromanagement}
%\label{complexity_cost}

Controlling CPU speed on phones stems from a set of fairly intricate subsystems -- the scheduler, idle policy, drivers, as well as the governor itself.
Under common circumstances, they adjust the speed constantly.
Despite the complexity, the system often makes bad choices and picks speed that hurt both energy and performance.
This is because past CPU utilization -- the bedrock metric of all dynamic governors -- has little to do with the ideal present CPU speed.
\fixme{because:  idle thread, slow rx ...}

\tinysection{Dynamic governors can hurt responsiveness}

% N.b. scheduling classes:  stop-dl-rt-cfs-idle.  DVFS only applies to cfs tasks (dl / rt tasks run at 100)
% N.b. schedutil, unlike others, estimates load per-task rather than per-core.  So handles task migration better.
% N.b. But schedutil still calculates based on "_recent_ load"

% schedutil and greedy:  zhou, p.3

The default governor policy, \schedutil, hurts responsiveness.
The \schedutil policy sets the CPU speed based on a rolling window of recent runqueue utilization.
On a phone, workloads typically do not saturate the CPUs but vary constantly in demand.
With history-driven dynamic policies such as \schedutil, this triggers constantly changing speeds\cite{nuessle2019benchmarking}.
Figure \ref{fig:missed_opportunities} shows how ramp-up already time hurts performance.
However, previous studies have additionally noted that intermittent workloads makes this problem significantly worse.\cite{nuessle2019benchmarking}
Figure \ref{fig:speed_time_delay} illustrates this:  We ran the same fixed workload with and without intermittent 5ms sleeps.
With no sleep intervals, the top graph shows the workload takes $\sim$7.1s to complete.
Adding 1000 5ms sleeps (the bottom graph) induces the governor to keep the speed much lower, hovering around 40\% of maximum throughout the run.
Of the additional 18.2s runtime, 5s stems from total sleeping, and $\sim$13.1s from running at a slower CPU speed.
We will show that real-world apps, when running the default policy, similarly spend significant time at unnecessarily low speeds.