Merge branch 'master' of https://git.odin.cse.buffalo.edu/carlnues/paper-KeepItSimple

2023-08-26 00:00:09 -04:00 · 2023-08-26 00:00:09 -04:00 · d96747fab9
parent bc606139fe 28b3b7ebee
commit d96747fab9
6 changed files with 32 additions and 33 deletions
--- a/sections/abstract.tex
+++ b/sections/abstract.tex
@ -1,10 +1,10 @@
 % -*- root: ../main.tex -*-
 %!TEX root=../main.tex

-CPU frequency scaling is a critical component of power management on modern mobile phones, as a CPUs can (if not managed properly) represent a significant source of power consumption on the phone.
-On Android, a configurable policy (called the governor) dictates the CPU's frequency, and how it trades off performance for energy savings.
-Android's existing governors rely on recent CPU usage patterns to make this trade-off: The longer a CPU is active, the faster the governor sets the CPU's frequency.
-In this paper, we demonstrate that this assumption is flawed: the lower range of frequency settings on a typical mobile CPU does not actually save energy.
+CPU frequency scaling is a critical component of power management on modern mobile phones, as a CPUs can represent a significant source of power consumption on the phone.
+On Android the governor, a configurable policy, dictates the CPU's frequency  and  the performance to energy savings tradeoff.
+Android's existing governors rely on recent CPU usage patterns to make this trade-off. The longer a CPU is active, the faster the governor sets the CPU's frequency.
+In this paper, we demonstrate that this assumption is flawed. The lower range of frequency settings on a typical mobile CPU does not actually save energy.
 We introduce \systemname, a governor that leverages this observation.
 We show that this governor, in addition to being considerably simpler than \schedutil, can both improve performance and reduce energy consumption on a typical Android phone.

--- a/sections/evaluation.tex
+++ b/sections/evaluation.tex
@ -3,49 +3,49 @@

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_jank_allapps.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_jank_allapps.pdf}
 \bfcaption{Display framedrop for apps under different CPU policies (10 runs, 90\% confidence)}
 \label{fig:jank_allapps}
 \end{figure*}

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_energy_allapps.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_energy_allapps.pdf}
 \bfcaption{Energy usage for apps under different CPU policies (10 runs, 90\% confidence)}
 \label{fig:energy_allapps}
 \end{figure*}

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_time_per_freq_yt.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_time_per_freq_yt.pdf}
 \bfcaption{Average time spent per CPU under the default policy for Youtube (Average of 10 runs, 90\% confidence)}
 \label{fig:time_per_freq_yt}
 \end{figure*}

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_time_per_freq_spot.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_time_per_freq_spot.pdf}
 \bfcaption{Average time spent per CPU under the default policy for Spotify (Average of 10 runs, 90\% confidence)}
 \label{fig:time_per_freq_spot}
 \end{figure*}

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_nonidletime_yt.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_nonidletime_yt.pdf}
 \bfcaption{CPU non-idle time for Youtube under different CPU policies (10 runs, 90\% confidence)}
 \label{fig:nonidle_yt}
 \end{figure*}

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_nonidletime_spot.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_nonidletime_spot.pdf}
 \bfcaption{CPU non-idle time for Spotify under different CPU policies (10 runs, 90\% confidence)}
 \label{fig:nonidle_spot}
 \end{figure*}

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_idlejank_heavyload.pdf}
+\includegraphics[width=.75\linewidth]{figures/graph_idlejank_heavyload.pdf}
 \bfcaption{The effect of additional background loads on user experience for given CPU policies}
 \label{fig:idlejank}
 \end{figure*}
@ -79,7 +79,7 @@ Information on screen performance including framedrops came from the Android \te

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \paragraph{CPU Policies}
-We evaluate six different CPU policies under different workloads:
+We evaluate six different CPU policies:
 (i) the system default, \schedutil,
 (ii) a truncated \schedutil implemented by lower-bounding the CPU using the existing API discussed in section \ref{subsec:signal_perf_needs},
 (iii) a fixed 70\% speed using the existing \texttt{userspace} governor,
@ -108,7 +108,7 @@ These workloads address evaluation claim (iii).
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Screen Jank}

-\Cref{fig:jank_allapps} shows frame drop rates for the four workloads.
+\Cref{fig:jank_allapps} shows frame drop rates.
 These graphs address the performance aspect of claims (i) and (ii).
 On all workloads, \systemname and truncated \schedutil offer nearly identical or notably better performance than regular \schedutil.
 The Facebook load under \systemname costs an additional .3\%, or $\sim$.2 frames per second (12.6 frames per minute) at a 60fps display rate.
--- a/sections/introduction.tex
+++ b/sections/introduction.tex
@ -3,7 +3,7 @@

 \begin{figure}
 \centering
-\includegraphics[width=.95\linewidth]{figures/graph_missed_opportunities.pdf}
+\includegraphics[width=.8\linewidth]{figures/graph_missed_opportunities.pdf}
 \bfcaption{A trace of \schedutil's cpu frequency selections given in blue.  The dotted red line shows a energy/latency optimal frequency choice ($\fenergy$).}
 \label{fig:missed_opportunities}
 \end{figure}
@ -34,20 +34,20 @@ We identify flaws in the premises, and propose a new, simpler governor that has

 Our fundamental insight, also observed by prior work~\cite{vogeleer2013energy, nuessle2019benchmarking}, is that there exists an energy-optimal frequency for each device (call it $\fenergy$).
 We argue that 
-(i)~past CPU usage is not a meaningful signal for identifying the rare cases when speeds below $\fenergy$ are appropriate, 
+(i)~past CPU usage is not a meaningful for identifying the rare cases when speeds below $\fenergy$ are appropriate, 
 (ii)~speeds above $\fenergy$ are useful only in specific situations, often known in advance by user-space.
 \Cref{fig:missed_opportunities} illustrates the potential for improvement; 
 (i)~\schedutil has a ramp-up period (first grey box) where the CPU is operating at speeds that sacrifice both energy and performance, and
 (ii)~\schedutil continues ramping up the frequency (second grey box) paying significant energy costs for often negligible visible benefits.

-We propose a series of changes to \schedutil, ultimately converging on a radical proposal: Default the CPU's frequency to its $\fenergy$, switching to faster speeds based only on (already existent) signals from user-space.
+We propose a series of changes to \schedutil, ultimately converging on a radical proposal: default the CPU's frequency to its $\fenergy$, switching to faster speeds based only on (already existent) signals from user-space.
 Based on the simplicity of this approach, we call it the \systemname governor.
-Through experiments, we show that \systemname simultaneously improves performance, as well as energy usage:
-For example, a typical 25s Facebook app interaction run with \systemname consumes 11\% less energy and causes 17\% fewer UI screendrops than when run with default settings.
-We also explore a less radical proposal: \schedutil limited to selecting frequencies at or above $\fenergy$; We show that even with this minor change, significant gains are possible.
+Through experiments, we show that \systemname simultaneously improves performance and energy usage.
+For example, a typical 25s Facebook run with \systemname consumes 11\% less energy and causes 17\% fewer UI screendrops than when run with default settings.
+We also explore a less radical proposal: \schedutil limited to selecting frequencies at or above $\fenergy$. We show that even with this minor change, significant gains are possible.

-We run our experiments on Google Pixel 2 devices with Android AOSP, evaluating \systemname against the system default and several other policies, using microbenchmarks and popular apps.
-These are representative of common platforms and uses in the real world.
+%We run our experiments on Google Pixel 2 devices with Android AOSP, evaluating \systemname against the system default and several other policies, using microbenchmarks and popular apps.
+%These are representative of common platforms and uses in the real world.

 This paper is organized as follows:
 (i) We review background and related work in \Cref{sec:related}.
--- a/sections/related.tex
+++ b/sections/related.tex
@ -14,7 +14,7 @@ For example, CPU speeds often cannot be set on individual cores but only on grou

 \begin{figure}
 \centering
-\includegraphics[width=.95\linewidth]{figures/graph_energy_varying_sleep.pdf}
+\includegraphics[width=.85\linewidth]{figures/graph_energy_varying_sleep.pdf}
 \bfcaption{Total energy per CPU policy for a 30s workload (3 runs, 90\% confidence)}
 \label{fig:idle_impact}
 \end{figure}
--- a/sections/unjustified.tex
+++ b/sections/unjustified.tex
@ -66,7 +66,7 @@ in memory-bound workloads (i.e., with frequent CPU stalls resulting from cache m
  We note that we did not encounter any significant busy-waiting across all of the apps that we tested.
 }.

-By comparison, tasks blocked on IO or user input are removed from the runqueue entirely, allowing the CPU to enter an idle state. 
+By comparison, tasks blocked on IO or user input are removed from the runqueue, allowing the CPU to idle. 
 %\todo{add an experiment that shows that tasks blocked on IO or user input or something similar have similar energy profiles.?}
 We note that CPU utilization is a measure of time spent idling or blocked on IO, and not the time the CPU spends running without doing useful work, supporting our second claim:

@ -93,7 +93,7 @@ All of the time spent in the areas marked Underperformance represents energy was

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_time_per_freq_fb.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_time_per_freq_fb.pdf}
 \bfcaption{Time per CPU at a given frequency for Facebook, 25s with default policy.  (Avg. of 10 runs, 90\% confidence)}
 \label{fig:time_per_freq_fb}
 \end{figure*}
@ -101,13 +101,12 @@ All of the time spent in the areas marked Underperformance represents energy was

 \subsection{Truncated \schedutil}

-To summarize, frequencies strictly below $\fenergy$ (excepting $\fidle$) consume more power per CPU cycle than $\fenergy$, and result in higher latencies.
+Frequencies strictly below $\fenergy$ (excepting $\fidle$) consume more power per CPU cycle than $\fenergy$, and result in higher latencies.
 In the absence of CPU stalls, spin-locks, and thermal throttling, frequencies in this range are strictly worse.
 Based on this observation and two further insights, we now propose our first adjustment to the \schedutil governor.

 First, recall that the only signal used by \schedutil is recent past CPU usage.
 This signal conveys no information about CPU stalls, and so is not useful for deciding whether the CPU should be set to a frequency in this regime.
-
 Second, we observe that workloads that trigger the relevant CPU behaviors are typically data-intensive and memory bound, or parallel workloads with high contention.
 Such workloads are often offloaded to more powerful cloud compute infrastructures; When run at the edge (e.g., for federated learning), it is typically when the device has a stable power source.

--- a/sections/wasted.tex
+++ b/sections/wasted.tex
@ -3,7 +3,7 @@

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_oscill_cycles.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_oscill_cycles.pdf}
 \bfcaption{Runtime and CPU cyclecount for a fixed compute under different CPU policies (10 runs, 90\% confidence)}
 \label{fig:cycles_time}
 \end{figure*}
@ -40,7 +40,7 @@ Armed with this knowledge, we realize the \systemname governor: a simple policy

 \begin{figure*}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_nonidletime_fb.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_nonidletime_fb.pdf}
 \bfcaption{CPU non-idle time for Facebook under different CPU policies (10 runs, 90\% confidence)}
 \label{fig:nonidle_fb}
 \end{figure*}
@ -48,7 +48,7 @@ Armed with this knowledge, we realize the \systemname governor: a simple policy

 \begin{figure}
 \centering
-\includegraphics[width=.90\linewidth]{figures/graph_jank_perspeed_fb.pdf}
+\includegraphics[width=.87\linewidth]{figures/graph_jank_perspeed_fb.pdf}
 \bfcaption{Display framedrop for Facebook under different CPU policies (10 runs, 90\% confidence)}
 \label{fig:screendrops_per_freq_fb}
 \end{figure}
@ -56,7 +56,7 @@ Armed with this knowledge, we realize the \systemname governor: a simple policy

 \paragraph{Case Studies}
 We studied a range of apps, including Facebook, Youtube, and Spotify.  
-For each app, we developed a simple, short scripted interaction to study in terms of its performance characteristics and energy usage. 
+For each app, we developed a short scripted interaction to study  its performance characteristics and energy usage. 
 We focus here primarily on the Facebook workload that we previously introduced in \Cref{sec:low-speed-in-practice}, and return to the other workloads to verify our findings in \Cref{sec:evaluation}.
 Our first goal is to understand what user-perceivable value this energy overhead obtains for us.

@ -122,14 +122,14 @@ Even running the CPU at full speed, the system has more work.
  Adaptive widgets in mobile apps present an effectively infinite source of work to mobile CPUs over finite windows of interaction.
 }

-By adapting itself to the available CPU power, Facebook signals an effectively infinite source of work to the governor, which responds by ramping the CPU up to full speed.
+By adapting to the available CPU power, Facebook signals an effectively infinite source of work to the governor, which responds by ramping the CPU up to full speed.
 In this situation, speeds above $\fenergy$ may actually provide a perceptible benefit.
 However, even at the CPU's maximum frequency, more work is created than the CPU can keep up with.

 \begin{figure}
 \centering
-\includegraphics[width=.95\linewidth]{figures/graph_u_fb.pdf}
-\bfcaption{Energy consumed for a fixed set of interations, given compute at different speeds \fixme{fullrun set}}
+\includegraphics[width=.87\linewidth]{figures/graph_u_fb.pdf}
+\bfcaption{Energy consumed for a fixed set of iterations, given compute at different speeds \fixme{fullrun set}}
 \label{fig:u_micro_fb}
 \end{figure}