sections/*: updates

This commit is contained in:
carlnues@buffalo.edu 2023-08-08 13:50:40 -04:00
parent 5cc7abb698
commit 5ba683936c
4 changed files with 67 additions and 58 deletions

View file

@ -4,11 +4,6 @@ The governors on phones trade off between performance maximization and energy mi
If performance is the only goal, the solution is clear: set the CPU to 100\%.
However, on phones, where latency is not always critical, this needlessly wastes energy.
Determining when to set the CPU speed to what is the issue.
%graph: 100% eats energy
We ran our experiments on Google Pixel 2 devices with Android AOSP, evaluating \systemname against the system default and several other policies, using microbenchmarks and popular apps.
These are representative of common platforms and uses in the real world.
\XXXnote{Setup description: Better at end of Intro? (to bolster our claim) (summary of s5.1 detailed description)}
\todo{Question}
\begin{figure}
\centering
@ -19,12 +14,6 @@ These are representative of common platforms and uses in the real world.
%\fixme{Add-in multiple threads, with varying 0-50-100 loads}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_freqtime_micro.pdf}
\bfcaption{Intermittent workloads hurt runtime 2 ways: diretly, by sleeping and indirectly, by inducing slower CPU speeds}
\label{fig:speed_time_delay}
\end{figure}
\subsection{Speed selection heuristics}
@ -33,7 +22,9 @@ These formulas all use some metric of previous CPU usage history to calculate fu
There are a number of implementations of this idea: Among previous governors used in Android, \texttt{ondemand} and \texttt{interactive} both use sampling to calculate the proportion of time the CPU non-idle, and use this to set CPU speed.\cite{ondemand-governor, interactive-governor}
% N.b. -- interactive is out-of-tree: see https://www.slideshare.net/opersys/scheduling-in-android-78020037
The current default CPU policy, \schedutil, bases speed from the proportion of recent work on the runqueue obtained from scheduler events.\cite{schedutil-governor}
\fixme{todo add formula}
\XXXnote{Add discussion re: p(id) controller / parabolic process / poor feeback?}
\todo{Question}
% util% (does?) correlate in 70-100 ("tradeoff") regime with minimizing missing deadlines
Ultimately, the goal of any governor is to run the CPU at the ideal speed for pending \textit{future work}.
The accuracy and efficiency of all reactive governors thus depends on the extent to which past performance is indicative of future needs.
@ -66,6 +57,13 @@ To summarize, not only does the kernel already hardcode when when the CPU should
%The time of the speed boost does not always match the period when the user is waiting, and can be optimized.
%\fixme{show}
\begin{figure}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_freqtime_micro.pdf}
\bfcaption{Intermittent workloads hurt runtime 2 ways: diretly, by sleeping and indirectly, by inducing slower CPU speeds}
\label{fig:speed_time_delay}
\end{figure}
\subsection{The cost and problems of complex speed micromanagement}
\label{complexity_cost}
@ -98,7 +96,7 @@ This is because past CPU utilization -- the bedrock metric of all dynamic govern
\begin{figure}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_jank_perspeed_fb.pdf}
\bfcaption{The negligible benefits of higher CPU speeds, for 30s Facebook interaction \fixme{ALSO ENERGY}}
\bfcaption{CPU speeds above $\sim$70\% do not yield significant decreases in framedrop proportion}
\label{fig:screendrops_per_freq_fb}
\end{figure}
@ -112,20 +110,15 @@ The default governor policy, \schedutil, hurts responsiveness.
The \schedutil policy sets the CPU speed based on a rolling window of recent runqueue utilization.
On a phone, workloads typically do not saturate the CPUs but vary constantly in demand.
With history-driven dynamic policies such as \schedutil, this triggers constantly changing speeds\cite{nuessle2019benchmarking}.
Figure \ref{fig:missed_opportunities} shows how this hurts performance (in addition to costing energy): The CPU takes .16s to reach \fenergy, and .3s to hit maximum.
\XXXnote{Re: fpow not yet shown beneficial: Side issue here; thrust is poor response. Fix: Remove energy ref?}
\todo{Question}
Previous studies have additionally noted that intermittent workloads makes this problem significantly worse.\cite{nuessle2019benchmarking}
Figure \ref{fig:missed_opportunities} shows how ramp-up already time hurts performance.
However,previous studies have additionally noted that intermittent workloads makes this problem significantly worse.\cite{nuessle2019benchmarking}
Figure \ref{fig:speed_time_delay} illustrates this: We ran the same fixed workload with and without intermittent 5ms sleeps.
With no sleep intervals, the top graph shows the workload takes $\sim$7.1s to complete.
%we previously noted the system takes about .14s to reach 100\% speed.
Adding 1000 5ms sleeps (the bottom graph) induces the governor to keep the speed much lower, hovering around 40\% of maximum throughout the run.
Of the additional 18.2s runtime, 5s stems from total sleeping, and $\sim$13.1s from running at a slower CPU speed.
We will show that real-world apps, when running the default policy, similarly spend significant time at unnecessarily low speeds.
\tinysection{Performance costs are partly due to switching costs}
\fixme{No; update discussion}
\tinysection{Runtime performance costs are partly due to switching costs}
We suspected this increased runtime might be due to overhead in either hardware, while the CPU is transitioning frequencies, or software, during complex calculations in the \schedutil governor.
@ -137,21 +130,16 @@ We tracked runtime and actual work performed, in CPU cycles.
Figure \ref{fig:cycles_time} shows the results, broken out by big and little CPU type.
Relative to the midspeed setting, the runtimes of the low and high speed policies were slower and higher, as expected.
The runtime of the oscillating policy was higher than the midspeed: very marginally ($\sim$.3\%) so for little CPUs, and 1.5\% for big CPUs.
The runtime of the oscillating policy was higher than that of the midspeed: very marginally ($\sim$.3\%) so for little CPUs, and 1.5\% for big CPUs.
The cyclecount of the oscillating policy was also $\sim$.3\% higher for little CPUs, suggesting performance overhead is due entirely to additional software computation in calling into the driver code in this case.
% default: lower cyclecount than oscillate => NOT due to complexity of system
For big CPUs, computation overhead was .08\% higher for the oscillating than for the midspeed policy, a magnitude smaller difference than the runtime cost.
Additional hardware delays likely contribute to frequency switching overhead.
%Significantly, the runtimes of the oscillating policy and the fixed medium speed policies -- the red and green bars in the leftmost bargraph -- were nearly identical, suggesting there is essentially no hardware overhead penalty to changing CPU speeds.
%Additionally, as evidenced by extremelly similar heights of the bars in the second graphs, the work performed in CPU cycles under all policies was nearly identical, within .03\% of each other.
%Any computational overhead in the default policy is thus negligible by comparison.
The switching cost in both cases, while small, suggests speed changes should be minimized where possible.
However, Figure \ref{fig:speed_time_delay} shows that running an intemittent workload, with many frequency changes, produces a huge change in runtime.
This suggests that the poor performance of \schedutil policy must lie less in frequency swithching overhead than esewhere: It makes bad CPU speed choices.
The biggest key to improvement lies in making better ones.
\fixme{a bit weak...}
% TODO: Add graph and / or discussion
However, Figure \ref{fig:speed_time_delay} shows that running an intemittent workload, with many frequency changes, produces a much larger change in runtime.
This is in large part due to the \schedutil policy simply picking slower speeds.
This suggests that improving runtime performance lies in part due to minimizing frequency switching costs, but but moreso simply to picking better speeds.
% REMOVE? Help or hurt? (undercuts overhead argument; strengthens speed choice argument)
%... we observe that there is a small amount of energy overhead to frequent speed changes.
@ -194,23 +182,45 @@ The biggest key to improvement lies in making better ones.
\label{fig:time_per_freq_fb}
\end{figure*}
\tinysection{Speeds over \fenergy provide negligble perceptible benefits}
\begin{figure*}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_nonidletime_fb.pdf}
%
\begin{subfigure}{0.45\textwidth}
\centering
%\includegraphics
\bfcaption{Little CPUs}
\end{subfigure}
\begin{subfigure}{0.45\textwidth}
\centering
%\includegraphics
\bfcaption{Big CPUs}
\end{subfigure}
\bfcaption{Given additional resources, apps will consume them: Increasing speed above 50-60\% does not significantly decrease CPU nonidle time}
\label{fig:nonidle_fb}
\end{figure*}
\tinysection{Speeds over \fenergy do not provide perceptible benefits}
A dual to the above problem, dynamic governors ramp CPU frequencies to unnecessarily high speeds.
Absent a specific, identified need -- such as shortening responsetime to a waiting user -- running the CPU above \fenergy offers negligible practical return for the additional energy.
Figure \ref{fig:screendrops_per_freq_fb} illustrates this.
We ran a scripted Facebook app interaction, scrolling for :30 through friends and then feed under different CPU policies: with the default \schedutil and with differing fixed speeds.
The output of phones is largely visual display maitenance; CPU policies should avoid damaging display quality.
Hence, for each run, we tracked the affect on display output as measured in proportion of framedrops (Android platform "jank").
\XXXnote{Can add in graph showing invariant CPU use versus speed. CPUs never saturated. Make sense?}
\todo{Question}
Absent a specific, identified need -- such as shortening responsetime to a waiting user -- running the CPU above \fenergy does not offer practical return for the additional energy.
\XXXnote{Need to justify the 70 figure here *being* fenergy -- first use; derails discussion}
Indeed, Figure \ref{fig:nonidle_fb} shows that, when given additional CPU resources, real world apps simply consume the additional offered resources.
We ran a scripted Facebook app interaction, scrolling for :30 through friends and then feed under different CPU policies: with the default \schedutil and with differing fixed speeds.
We measured the non-idle time of the CPUs through the Linux \texttt{sysfs} interface, bucketing by little and big CPU type.
We will later show that \fenergy can be reasonably approximated on our test device with a CPU speed of 70\%.
The graph shows that, as speed increases between 70\% and 100\%, the nonidle time of the CPUs did not decrease appreciably -- in contrast to microbenchmarks with deterministic workloads.
%Likely, the app is adjusting (**scrolling prefetch**)
Despite consuming the additional resources, apps do not show appreciable pragmatic benefit when given additional CPU speed above \fenergy.
Figure \ref{fig:screendrops_per_freq_fb} illustrates this for the same experiment.
The output of phones is largely visual display maitenance; CPU policies should avoid damaging display quality.
Hence, for each run, we additionally tracked the affect on display output as measured in proportion of framedrops, termed Android display \textit{jank}.
Runs with speeds above this produce drop rates of $\sim$2\% and below, lower than that of the default dynamic policy ($\sim$3\%).
Further, increasing the speed above 70\% does not improve the framedrop rate significantly.
As we will show in section \ref{subsec:regimes_energy_perf}, the default policy runs workloads for significant periods above \fenergy, despite apps simply consuming the additional resources and not offering perceptible benefit.
\XXXnote{Possibly combine Figures \ref{fig:nonidle_fb} and \ref{fig:screendrops_per_freq_fb}?}
\todo{Question}
%We first make the observation that, for our test device, a fixed speed of 70\% lies near the energy minimums of the runs in Figure \ref{fig:u_micro}.
Speeds \fenergy (70\%) and above all produce drop rates of $\sim$2\% and below, lower than that of the default dynamic policy ($\sim$3\%).
However, increasing the speed above \fenergy does not improve the framedrop rate significantly.
As we will show in section \ref{subsec:regimes_energy_perf}, the default policy nonetheless runs workloads for significant periods at these higher speeds -- despite their offering negligible benefit.
\tinysection{Dynamic governors can cost energy}
@ -229,6 +239,7 @@ Recall that when there is no work, the idle policy turns the CPU off.
Hence running the CPU too slowly keeps the CPU on for longer, wasting power.
Conversely, at faster speeds, there is an energy-performance tradeoff.
As we will show, the \schedutil policy frequently picks speeds at both the low and high end, costing energy.
\fixme{weak claim? Punting proof to later alright?}
\XXXnote{Move proof / discussion re: figure \ref{fig:time_per_freq_fb} from sec 3.2-3.3 back to here?}
\todo{Question}

View file

@ -61,7 +61,7 @@ We ran 2 types of workloads: First, standalone native microbenchmarks in C perf
Second, we used the Android UI Automator testing framework to perform scripted simulated interactions with real-world apps.\cite{uiautomator}
A UI Automator testing app mimiced typical user interactions, such as scrolling through the Facebook friends lists and feed.
We collected information on CPU speed and idlestate from both the Linux \texttt{ftrace} framework and from \texttt{sysfs}, and on CPU cycles from the perf event syscall.\cite{perf-event}
We collected information on CPU speed and idlestate from both the Linux \texttt{ftrace} framework and from \texttt{sysfs}, and on CPU cycles from the \texttt{perf\_event\_open} syscall.\cite{perf-event}
We also used \texttt{ftrace} to log testing parameter and state.
Information on screen performance including framedrops came from the Android \texttt{dumpsys gfxinfo} service.

View file

@ -27,8 +27,6 @@ Launch screen on; idle & 130 \\
\label{fig:showcase}
\end{figure}
%\XXXnote{Mention original goals? (provide compute when needed; remove when not) Spells out what idle makes redundant}
%\todo{Question}
CPUs form the computation heart of computing systems including phones.
They also consume considreable energy.
Phones must balance providing computation resources when needed and reducing resources, to save energy, when not.
@ -48,7 +46,6 @@ It is thus critical, for efficient energy management, to manage CPU usage proper
\subsection{The role of CPU governors}
%CPUs, particularly on energy-constrained platforms, can typically run at different speed settings.
There have been a number of policies, called \textit{governors}, to determine at which speed to run the CPU.
Most popular recent and current Linux governors, such as \texttt{ondemand, interactive}, and \texttt{conservative}, and the current Android system default, \schedutil, use a proportion of recent past CPU usage as a guide to set future speeds.
These governors run in conjunction with other policies, in particular (i) the scheduler -- which determines what tasks are run on what CPU cores and (ii) the idle policy -- which shuts down CPUs with no pending work.
@ -59,8 +56,8 @@ CPU speeds often cannot be set on individual cores but only on groups of CPUs --
\subsection{The problem with CPU governors on phones \fixme{OR A simpler governor}}
\XXXnote{First 3 of 4 para's in this subsection focus on current problems}
\todo{Question}
%\XXXnote{First 3 of 4 para's in this subsection focus on current problems}
%\todo{Question}
The default governor policy, despite the considerable sophistication involved in its implementation, is based on a flawed premise: That past utilization is a meaningful signal of the optimal CPU speed.
As we will show in this paper, this premise is based on a set of assumptions that are not applicable to modern mobile devices.
@ -77,12 +74,11 @@ To understand this, we present additional claims that we will later substantiate
\item A CPU frequency below \fenergy always wastes energy, except in very specific corner cases.
%thermal throttling or memory stalling
\item A CPU speed above \fenergy reduces useful latency in specific, identifiable situations, but in most other cases consumes energy for negligble benefit.
\item User apps, given additional CPU resources, will use them to negligible benefit.
\item User apps, given additional CPU resources, will not show perceptible benefit.
\end{enumerate*}
%\fixme{SHOW 4 in Eval}
\XXXnote{\#4 justification: Figure 7 good enough?}
\ref{fig:screendrops_per_freq_fb}
\todo{Question}
%\XXXnote{\#4 justification: Figure 7 good enough?}
%\ref{fig:screendrops_per_freq_fb}
% \todo{Question}
Figure \ref{fig:missed_opportunities} illustrates the core of these problems in practice.
We ran a short $\sim$.5s CPU-bound load on a previously idle phone using default settings, and tracked the effect on CPU speed (with 0 representing idle).
@ -99,8 +95,10 @@ This is depicted on the graph by the upper grey trapezoid.
In this paper, we present our governor, \systemname, which adopts a simpler heuristic based on common usage needs.
% which runs tasks at speeds that save energy compared to the system default -- speeds that, in practice, also prove sufficiently performant to maintain user experience.
It avoids the twin pitfalls of both overy slow speeds, which would not only hurt latency but also cost energy by increasing runtime, and also overly fast settings, which cost energy but yield little to no end benefit.
It avoids the twin pitfalls of both overy slow speeds, which would not only hurt latency but also cost energy by increasing runtime, and also overly fast settings, which cost energy but do not improve perceptible benefit.
\systemname leverages information from userspace, sometimes already furnished by the Android platform, to identify those common use cases that do warrant additional speed.
\fixme{implement}
We ran our experiments on Google Pixel 2 devices with Android AOSP, evaluating \systemname against the system default and several other policies, using microbenchmarks and popular apps.
These are representative of common platforms and uses in the real world.

View file

@ -99,7 +99,7 @@ To our knowlege, the Android platform has not taken advantage of this ability.
Establishing the exact energy-performance tradeoff requires precise usecase-dependent measurement.
\fixme{wordsmithing?}
In practice, we make a key observation that the speed necessary to maintain user experience hovers around \fenergy and rarely approaches \fperf.
Figure \ref{subsec:regimes_energy_perf} shows that CPU speeds above \fenergy offer negligible benefit, as measured by the primary user experience metric on phones, framedrop rate.
Figure \ref{subsec:regimes_energy_perf} shows that CPU speeds above \fenergy do not offer perceptible benefits, as measured by the primary user experience metric on phones, framedrop rate.
Indeed, using the simple fixed speed of \fenergy offers better measured results than the default.
Absent infrequent, identifiable periods when conditions warrant additional performance -- such as when the user waiting on a response from the phone --
\textit{running the CPUs at a preset fixed speed of \fenergy is sufficient to maintain user experience.}