paper-KeepItSimple/sections/related.tex

69 lines
5.4 KiB
TeX

% -*- root: ../main.tex -*-
%!TEX root=../main.tex
\paragraph{Governors}
Historically, systems have addressed the competing goals of energy and latency optimization by employing frequency scaling to change the speed at which the CPU runs.
On modern systems, CPUs typically consist of multiple cores, often of different types, that run at different speeds (known as P-states) or can be turned on and off into idle (known as C-states).
A policy, or `governor', sets the CPU's frequency (P-state) when there is pending computation, optimizing performance at the expense of energy, or visa versa.
The governor runs in conjunction with other policies, in particular (i) the scheduler -- which determines what tasks are run on what CPU cores and (ii) the idle policy -- which places CPUs with no pending work into a (idle) C-state.
Hardware design on phones can constrain governor policy calculations.
For example, CPU speeds often cannot be set on individual cores but only on groups of CPUs -- a constraint partly linked to the asymetric big-little CPU architecture, with 2 clusters of higher- and lower-performance CPU cores~\cite{big-little}.
% idle paper: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=60fdaa6a74dec29a0538325b742bee4097247c6d#page=119
\begin{figure}
\centering
\includegraphics[width=.85\linewidth]{figures/graph_energy_varying_sleep.pdf}
\bfcaption{Total energy per CPU policy for a 30s workload (3 runs, 90\% confidence)}
\label{fig:idle_impact}
\end{figure}
\paragraph{Idling overrides any speed}
When a CPU's runqueue has no tasks, the idle policy bypasses the governor's speed selection and instead shuts down unneeded cores.
Figure \ref{fig:idle_impact} illustrates this with a microbenchmark that continuously performs simple arithmetic computations (red circle), alternates computation and sleep in 15ms intervals (blue square), or continuously sleeps (green diamond).
The x-axis varies the fixed frequency to which the CPU is pinned, with the default \schedutil governor's behavior for comparison.
Total energy consumed is shown on the y-axis.
Power consumed by the sleeping task is largely independent of the CPU frequency, modulo minor system interrupts.
Energy consumed by the remaining tasks tracks CPU speed, as expected, with a flattening for the partially sleeping workload.
In summary, no matter what the requested speed by the CPU governor, when there is no work, the idle policy overrides the speed and shuts down the core, \emph{consuming negligible energy}.
We refer to the `speed' of the core in its idle state as $\fidle$
\subsection{Related Work}
% general trade-offs
Many papers have studied the performance-energy trade-off of governors.
Yao et al. \cite{492493} established an ideal framework, but assume prior knowledge of all workloads.
Dynamic systems, by contrast, must somehow gauge future work.
The common approach is to minimize energy usage subject to some performance constraint.
Calculating the constraint -- pending work -- takes several approaches.
The Polaris system \cite{korkmaz2018workload} tunes CPU speed to pending workloads based on userspace information.
% but focuses on server-scale database uses.
It requires knowledge of the pending amount of work and deadline target, information derivable from a specific type of workload, server-scale databases.
Instead of focusing on the current workload, Zhou et al. \cite{9591359} employ machine learning to predict it for a known QoS performance constraint.
Unsurprisingly, several studies have focused on the phone platform given the later's energy constraints, generally seeking to maintain user experience as the constraint.
The system proposed by Chen et al. \cite{7372574, 8356047} gauges workload on phone games by tracking CPU-GPU interaction and dynamically selects among existing governors.
Li et al. \cite{10.1145/3061639.3062239, 9153119} go further, predicting future work by categorizing game graphic scenes.
Broyde et al. \cite{8226044} combine scaling non-idle CPU count with CPU frequency to tune their system.
The Maestro system \cite{8410428}, like ours, recognizes that existing policies can unduly overreact, resulting in CPU overperformance.
Their system focuses on reducing thermal throttling inefficiencies this produces by damping this overperformance.
This system also includes cloud latency along with display quality in its constraint metric.
A different approach by Bui et al. \cite{10.1145/2789168.2790103} saves energy by running loads on phones' little CPUs
Rao et al. acknowledge the need for going beyond a blind general-purpose governor, and tuning performance to particular apps.\cite{rao2017application}
While the common approach to energy reduction cost measurement is to focus on framerate, there are others.
Zhisheng et al. \cite{10.1145/2973750.2973780} constrain streaming, analyzing their system in terms of underlying video quality.
Begem et al. take the opposite of the general approach and maximize performance pursuant to energy constraints on phones.\cite{7314145}
A system that potentially constrains computation resources needs to measure the cost.
Meeting query latencies or screendraws are common measurements used in previous studies.
None of these, to our knowledge, uses our approach of observing that an approximate energy-minimum setting already suffices to maintain acceptable performance targets, baring specific identifiable cases.
%One, by Kwok et al. \cite{7091048} -- no; latency over minutes (i.e. processed for later consumption)