69 lines
5.4 KiB
TeX
69 lines
5.4 KiB
TeX
% -*- root: ../main.tex -*-
|
|
%!TEX root=../main.tex
|
|
|
|
\paragraph{Governors}
|
|
|
|
Historically, systems have addressed the competing goals of energy and latency optimization by employing frequency scaling to change the speed at which the CPU runs.
|
|
On modern systems, CPUs typically consist of multiple cores, often of different types, that run at different speeds (known as P-states) or can be turned on and off into idle (known as C-states).
|
|
A policy, or `governor', sets the CPU's frequency (P-state) when there is pending computation, optimizing performance at the expense of energy, or visa versa.
|
|
The governor runs in conjunction with other policies, in particular (i) the scheduler -- which determines what tasks are run on what CPU cores and (ii) the idle policy -- which places CPUs with no pending work into a (idle) C-state.
|
|
Hardware design on phones can constrain governor policy calculations.
|
|
For example, CPU speeds often cannot be set on individual cores but only on groups of CPUs -- a constraint partly linked to the asymetric big-little CPU architecture, with 2 clusters of higher- and lower-performance CPU cores~\cite{big-little}.
|
|
|
|
% idle paper: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=60fdaa6a74dec29a0538325b742bee4097247c6d#page=119
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=.85\linewidth]{figures/graph_energy_varying_sleep.pdf}
|
|
\bfcaption{Total energy per CPU policy for a 30s workload (3 runs, 90\% confidence)}
|
|
\label{fig:idle_impact}
|
|
\end{figure}
|
|
|
|
|
|
\paragraph{Idling overrides any speed}
|
|
When a CPU's runqueue has no tasks, the idle policy bypasses the governor's speed selection and instead shuts down unneeded cores.
|
|
Figure \ref{fig:idle_impact} illustrates this with a microbenchmark that continuously performs simple arithmetic computations (red circle), alternates computation and sleep in 15ms intervals (blue square), or continuously sleeps (green diamond).
|
|
The x-axis varies the fixed frequency to which the CPU is pinned, with the default \schedutil governor's behavior for comparison.
|
|
Total energy consumed is shown on the y-axis.
|
|
Power consumed by the sleeping task is largely independent of the CPU frequency, modulo minor system interrupts.
|
|
Energy consumed by the remaining tasks tracks CPU speed, as expected, with a flattening for the partially sleeping workload.
|
|
In summary, no matter what the requested speed by the CPU governor, when there is no work, the idle policy overrides the speed and shuts down the core, \emph{consuming negligible energy}.
|
|
We refer to the `speed' of the core in its idle state as $\fidle$
|
|
|
|
|
|
\subsection{Related Work}
|
|
|
|
% general trade-offs
|
|
|
|
Many papers have studied the performance-energy trade-off of governors.
|
|
Yao et al. \cite{492493} established an ideal framework, but assume prior knowledge of all workloads.
|
|
Dynamic systems, by contrast, must somehow gauge future work.
|
|
The common approach is to minimize energy usage subject to some performance constraint.
|
|
Calculating the constraint -- pending work -- takes several approaches.
|
|
The Polaris system \cite{korkmaz2018workload} tunes CPU speed to pending workloads based on userspace information.
|
|
% but focuses on server-scale database uses.
|
|
It requires knowledge of the pending amount of work and deadline target, information derivable from a specific type of workload, server-scale databases.
|
|
Instead of focusing on the current workload, Zhou et al. \cite{9591359} employ machine learning to predict it for a known QoS performance constraint.
|
|
|
|
Unsurprisingly, several studies have focused on the phone platform given the later's energy constraints, generally seeking to maintain user experience as the constraint.
|
|
The system proposed by Chen et al. \cite{7372574, 8356047} gauges workload on phone games by tracking CPU-GPU interaction and dynamically selects among existing governors.
|
|
Li et al. \cite{10.1145/3061639.3062239, 9153119} go further, predicting future work by categorizing game graphic scenes.
|
|
Broyde et al. \cite{8226044} combine scaling non-idle CPU count with CPU frequency to tune their system.
|
|
The Maestro system \cite{8410428}, like ours, recognizes that existing policies can unduly overreact, resulting in CPU overperformance.
|
|
Their system focuses on reducing thermal throttling inefficiencies this produces by damping this overperformance.
|
|
This system also includes cloud latency along with display quality in its constraint metric.
|
|
A different approach by Bui et al. \cite{10.1145/2789168.2790103} saves energy by running loads on phones' little CPUs
|
|
Rao et al. acknowledge the need for going beyond a blind general-purpose governor, and tuning performance to particular apps.\cite{rao2017application}
|
|
|
|
While the common approach to energy reduction cost measurement is to focus on framerate, there are others.
|
|
Zhisheng et al. \cite{10.1145/2973750.2973780} constrain streaming, analyzing their system in terms of underlying video quality.
|
|
Begem et al. take the opposite of the general approach and maximize performance pursuant to energy constraints on phones.\cite{7314145}
|
|
A system that potentially constrains computation resources needs to measure the cost.
|
|
Meeting query latencies or screendraws are common measurements used in previous studies.
|
|
None of these, to our knowledge, uses our approach of observing that an approximate energy-minimum setting already suffices to maintain acceptable performance targets, baring specific identifiable cases.
|
|
|
|
%One, by Kwok et al. \cite{7091048} -- no; latency over minutes (i.e. processed for later consumption)
|
|
|
|
|
|
|