Merge branch 'master' of https://git.odin.cse.buffalo.edu/carlnues/paper-KeepItSimple
commit
d96747fab9
|
@ -1,10 +1,10 @@
|
|||
% -*- root: ../main.tex -*-
|
||||
%!TEX root=../main.tex
|
||||
|
||||
CPU frequency scaling is a critical component of power management on modern mobile phones, as a CPUs can (if not managed properly) represent a significant source of power consumption on the phone.
|
||||
On Android, a configurable policy (called the governor) dictates the CPU's frequency, and how it trades off performance for energy savings.
|
||||
Android's existing governors rely on recent CPU usage patterns to make this trade-off: The longer a CPU is active, the faster the governor sets the CPU's frequency.
|
||||
In this paper, we demonstrate that this assumption is flawed: the lower range of frequency settings on a typical mobile CPU does not actually save energy.
|
||||
CPU frequency scaling is a critical component of power management on modern mobile phones, as a CPUs can represent a significant source of power consumption on the phone.
|
||||
On Android the governor, a configurable policy, dictates the CPU's frequency and the performance to energy savings tradeoff.
|
||||
Android's existing governors rely on recent CPU usage patterns to make this trade-off. The longer a CPU is active, the faster the governor sets the CPU's frequency.
|
||||
In this paper, we demonstrate that this assumption is flawed. The lower range of frequency settings on a typical mobile CPU does not actually save energy.
|
||||
We introduce \systemname, a governor that leverages this observation.
|
||||
We show that this governor, in addition to being considerably simpler than \schedutil, can both improve performance and reduce energy consumption on a typical Android phone.
|
||||
|
||||
|
|
|
@ -3,49 +3,49 @@
|
|||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_jank_allapps.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_jank_allapps.pdf}
|
||||
\bfcaption{Display framedrop for apps under different CPU policies (10 runs, 90\% confidence)}
|
||||
\label{fig:jank_allapps}
|
||||
\end{figure*}
|
||||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_energy_allapps.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_energy_allapps.pdf}
|
||||
\bfcaption{Energy usage for apps under different CPU policies (10 runs, 90\% confidence)}
|
||||
\label{fig:energy_allapps}
|
||||
\end{figure*}
|
||||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_time_per_freq_yt.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_time_per_freq_yt.pdf}
|
||||
\bfcaption{Average time spent per CPU under the default policy for Youtube (Average of 10 runs, 90\% confidence)}
|
||||
\label{fig:time_per_freq_yt}
|
||||
\end{figure*}
|
||||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_time_per_freq_spot.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_time_per_freq_spot.pdf}
|
||||
\bfcaption{Average time spent per CPU under the default policy for Spotify (Average of 10 runs, 90\% confidence)}
|
||||
\label{fig:time_per_freq_spot}
|
||||
\end{figure*}
|
||||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_nonidletime_yt.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_nonidletime_yt.pdf}
|
||||
\bfcaption{CPU non-idle time for Youtube under different CPU policies (10 runs, 90\% confidence)}
|
||||
\label{fig:nonidle_yt}
|
||||
\end{figure*}
|
||||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_nonidletime_spot.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_nonidletime_spot.pdf}
|
||||
\bfcaption{CPU non-idle time for Spotify under different CPU policies (10 runs, 90\% confidence)}
|
||||
\label{fig:nonidle_spot}
|
||||
\end{figure*}
|
||||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_idlejank_heavyload.pdf}
|
||||
\includegraphics[width=.75\linewidth]{figures/graph_idlejank_heavyload.pdf}
|
||||
\bfcaption{The effect of additional background loads on user experience for given CPU policies}
|
||||
\label{fig:idlejank}
|
||||
\end{figure*}
|
||||
|
@ -79,7 +79,7 @@ Information on screen performance including framedrops came from the Android \te
|
|||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\paragraph{CPU Policies}
|
||||
We evaluate six different CPU policies under different workloads:
|
||||
We evaluate six different CPU policies:
|
||||
(i) the system default, \schedutil,
|
||||
(ii) a truncated \schedutil implemented by lower-bounding the CPU using the existing API discussed in section \ref{subsec:signal_perf_needs},
|
||||
(iii) a fixed 70\% speed using the existing \texttt{userspace} governor,
|
||||
|
@ -108,7 +108,7 @@ These workloads address evaluation claim (iii).
|
|||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Screen Jank}
|
||||
|
||||
\Cref{fig:jank_allapps} shows frame drop rates for the four workloads.
|
||||
\Cref{fig:jank_allapps} shows frame drop rates.
|
||||
These graphs address the performance aspect of claims (i) and (ii).
|
||||
On all workloads, \systemname and truncated \schedutil offer nearly identical or notably better performance than regular \schedutil.
|
||||
The Facebook load under \systemname costs an additional .3\%, or $\sim$.2 frames per second (12.6 frames per minute) at a 60fps display rate.
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=.95\linewidth]{figures/graph_missed_opportunities.pdf}
|
||||
\includegraphics[width=.8\linewidth]{figures/graph_missed_opportunities.pdf}
|
||||
\bfcaption{A trace of \schedutil's cpu frequency selections given in blue. The dotted red line shows a energy/latency optimal frequency choice ($\fenergy$).}
|
||||
\label{fig:missed_opportunities}
|
||||
\end{figure}
|
||||
|
@ -34,20 +34,20 @@ We identify flaws in the premises, and propose a new, simpler governor that has
|
|||
|
||||
Our fundamental insight, also observed by prior work~\cite{vogeleer2013energy, nuessle2019benchmarking}, is that there exists an energy-optimal frequency for each device (call it $\fenergy$).
|
||||
We argue that
|
||||
(i)~past CPU usage is not a meaningful signal for identifying the rare cases when speeds below $\fenergy$ are appropriate,
|
||||
(i)~past CPU usage is not a meaningful for identifying the rare cases when speeds below $\fenergy$ are appropriate,
|
||||
(ii)~speeds above $\fenergy$ are useful only in specific situations, often known in advance by user-space.
|
||||
\Cref{fig:missed_opportunities} illustrates the potential for improvement;
|
||||
(i)~\schedutil has a ramp-up period (first grey box) where the CPU is operating at speeds that sacrifice both energy and performance, and
|
||||
(ii)~\schedutil continues ramping up the frequency (second grey box) paying significant energy costs for often negligible visible benefits.
|
||||
|
||||
We propose a series of changes to \schedutil, ultimately converging on a radical proposal: Default the CPU's frequency to its $\fenergy$, switching to faster speeds based only on (already existent) signals from user-space.
|
||||
We propose a series of changes to \schedutil, ultimately converging on a radical proposal: default the CPU's frequency to its $\fenergy$, switching to faster speeds based only on (already existent) signals from user-space.
|
||||
Based on the simplicity of this approach, we call it the \systemname governor.
|
||||
Through experiments, we show that \systemname simultaneously improves performance, as well as energy usage:
|
||||
For example, a typical 25s Facebook app interaction run with \systemname consumes 11\% less energy and causes 17\% fewer UI screendrops than when run with default settings.
|
||||
We also explore a less radical proposal: \schedutil limited to selecting frequencies at or above $\fenergy$; We show that even with this minor change, significant gains are possible.
|
||||
Through experiments, we show that \systemname simultaneously improves performance and energy usage.
|
||||
For example, a typical 25s Facebook run with \systemname consumes 11\% less energy and causes 17\% fewer UI screendrops than when run with default settings.
|
||||
We also explore a less radical proposal: \schedutil limited to selecting frequencies at or above $\fenergy$. We show that even with this minor change, significant gains are possible.
|
||||
|
||||
We run our experiments on Google Pixel 2 devices with Android AOSP, evaluating \systemname against the system default and several other policies, using microbenchmarks and popular apps.
|
||||
These are representative of common platforms and uses in the real world.
|
||||
%We run our experiments on Google Pixel 2 devices with Android AOSP, evaluating \systemname against the system default and several other policies, using microbenchmarks and popular apps.
|
||||
%These are representative of common platforms and uses in the real world.
|
||||
|
||||
This paper is organized as follows:
|
||||
(i) We review background and related work in \Cref{sec:related}.
|
||||
|
|
|
@ -14,7 +14,7 @@ For example, CPU speeds often cannot be set on individual cores but only on grou
|
|||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=.95\linewidth]{figures/graph_energy_varying_sleep.pdf}
|
||||
\includegraphics[width=.85\linewidth]{figures/graph_energy_varying_sleep.pdf}
|
||||
\bfcaption{Total energy per CPU policy for a 30s workload (3 runs, 90\% confidence)}
|
||||
\label{fig:idle_impact}
|
||||
\end{figure}
|
||||
|
|
|
@ -66,7 +66,7 @@ in memory-bound workloads (i.e., with frequent CPU stalls resulting from cache m
|
|||
We note that we did not encounter any significant busy-waiting across all of the apps that we tested.
|
||||
}.
|
||||
|
||||
By comparison, tasks blocked on IO or user input are removed from the runqueue entirely, allowing the CPU to enter an idle state.
|
||||
By comparison, tasks blocked on IO or user input are removed from the runqueue, allowing the CPU to idle.
|
||||
%\todo{add an experiment that shows that tasks blocked on IO or user input or something similar have similar energy profiles.?}
|
||||
We note that CPU utilization is a measure of time spent idling or blocked on IO, and not the time the CPU spends running without doing useful work, supporting our second claim:
|
||||
|
||||
|
@ -93,7 +93,7 @@ All of the time spent in the areas marked Underperformance represents energy was
|
|||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_time_per_freq_fb.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_time_per_freq_fb.pdf}
|
||||
\bfcaption{Time per CPU at a given frequency for Facebook, 25s with default policy. (Avg. of 10 runs, 90\% confidence)}
|
||||
\label{fig:time_per_freq_fb}
|
||||
\end{figure*}
|
||||
|
@ -101,13 +101,12 @@ All of the time spent in the areas marked Underperformance represents energy was
|
|||
|
||||
\subsection{Truncated \schedutil}
|
||||
|
||||
To summarize, frequencies strictly below $\fenergy$ (excepting $\fidle$) consume more power per CPU cycle than $\fenergy$, and result in higher latencies.
|
||||
Frequencies strictly below $\fenergy$ (excepting $\fidle$) consume more power per CPU cycle than $\fenergy$, and result in higher latencies.
|
||||
In the absence of CPU stalls, spin-locks, and thermal throttling, frequencies in this range are strictly worse.
|
||||
Based on this observation and two further insights, we now propose our first adjustment to the \schedutil governor.
|
||||
|
||||
First, recall that the only signal used by \schedutil is recent past CPU usage.
|
||||
This signal conveys no information about CPU stalls, and so is not useful for deciding whether the CPU should be set to a frequency in this regime.
|
||||
|
||||
Second, we observe that workloads that trigger the relevant CPU behaviors are typically data-intensive and memory bound, or parallel workloads with high contention.
|
||||
Such workloads are often offloaded to more powerful cloud compute infrastructures; When run at the edge (e.g., for federated learning), it is typically when the device has a stable power source.
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_oscill_cycles.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_oscill_cycles.pdf}
|
||||
\bfcaption{Runtime and CPU cyclecount for a fixed compute under different CPU policies (10 runs, 90\% confidence)}
|
||||
\label{fig:cycles_time}
|
||||
\end{figure*}
|
||||
|
@ -40,7 +40,7 @@ Armed with this knowledge, we realize the \systemname governor: a simple policy
|
|||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_nonidletime_fb.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_nonidletime_fb.pdf}
|
||||
\bfcaption{CPU non-idle time for Facebook under different CPU policies (10 runs, 90\% confidence)}
|
||||
\label{fig:nonidle_fb}
|
||||
\end{figure*}
|
||||
|
@ -48,7 +48,7 @@ Armed with this knowledge, we realize the \systemname governor: a simple policy
|
|||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=.90\linewidth]{figures/graph_jank_perspeed_fb.pdf}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_jank_perspeed_fb.pdf}
|
||||
\bfcaption{Display framedrop for Facebook under different CPU policies (10 runs, 90\% confidence)}
|
||||
\label{fig:screendrops_per_freq_fb}
|
||||
\end{figure}
|
||||
|
@ -56,7 +56,7 @@ Armed with this knowledge, we realize the \systemname governor: a simple policy
|
|||
|
||||
\paragraph{Case Studies}
|
||||
We studied a range of apps, including Facebook, Youtube, and Spotify.
|
||||
For each app, we developed a simple, short scripted interaction to study in terms of its performance characteristics and energy usage.
|
||||
For each app, we developed a short scripted interaction to study its performance characteristics and energy usage.
|
||||
We focus here primarily on the Facebook workload that we previously introduced in \Cref{sec:low-speed-in-practice}, and return to the other workloads to verify our findings in \Cref{sec:evaluation}.
|
||||
Our first goal is to understand what user-perceivable value this energy overhead obtains for us.
|
||||
|
||||
|
@ -122,14 +122,14 @@ Even running the CPU at full speed, the system has more work.
|
|||
Adaptive widgets in mobile apps present an effectively infinite source of work to mobile CPUs over finite windows of interaction.
|
||||
}
|
||||
|
||||
By adapting itself to the available CPU power, Facebook signals an effectively infinite source of work to the governor, which responds by ramping the CPU up to full speed.
|
||||
By adapting to the available CPU power, Facebook signals an effectively infinite source of work to the governor, which responds by ramping the CPU up to full speed.
|
||||
In this situation, speeds above $\fenergy$ may actually provide a perceptible benefit.
|
||||
However, even at the CPU's maximum frequency, more work is created than the CPU can keep up with.
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=.95\linewidth]{figures/graph_u_fb.pdf}
|
||||
\bfcaption{Energy consumed for a fixed set of interations, given compute at different speeds \fixme{fullrun set}}
|
||||
\includegraphics[width=.87\linewidth]{figures/graph_u_fb.pdf}
|
||||
\bfcaption{Energy consumed for a fixed set of iterations, given compute at different speeds \fixme{fullrun set}}
|
||||
\label{fig:u_micro_fb}
|
||||
\end{figure}
|
||||
|
||||
|
|
Loading…
Reference in New Issue