sections/*: updates

This commit is contained in:
carlnues@buffalo.edu 2023-05-02 12:58:21 -04:00
parent 798fe0d7e5
commit 82e732f8ad
3 changed files with 129 additions and 31 deletions

View file

@ -1,15 +1,13 @@
% -*- root: ../main.tex -*-
\todo{dynamic or reactive?}
Generally, the governors on phones strive to meet two basic goals:
The governors on phones strive to meet two main goals:
First, they must set the CPU speed higher when there is pending computation, optimizing performance at the expenese of energy.
Second, they must set speed lower when computation needs decline or stop, sacrificing performance to save energy.
If performance is the only goal, as is often the case with data center servers, the solution is clear: set the CPU to 100\%.
Second, they must set speed lower when computation needs decline or stop, to save energy.
If performance is the only goal, as is the case with data center servers, the solution is clear: set the CPU to 100\%.
However, on phones, where latency is not always critical, this would needlessly waste energy.
%\XXXnote{microbench: 100\% eats energy}
The dynamically reactive, variable-speed nature of governors avoids this problem and achieves the previously stated goals: Computation speed is increased when needed, and decreased when not.
There are a number of implementations of this idea: Among previous governors in Android linux, \texttt{ondemand} and \texttt{interactive}, both use the recent proportion of CPU non-idle time to calcualte CPU speed.\cite{ondemand-governor, interactive-governor}
The current default CPU policy, \schedutil, bases speed from the proportion of recent work on the runqueue.\cite{schedutil-governor}
Determining when to set the CPU speed to what is the issue.
%graph: 100% eats energy
\begin{figure}
\centering
@ -20,24 +18,61 @@ The current default CPU policy, \schedutil, bases speed from the proportion of r
\fixme{Add-in multiple threads, with varying 0-50-100 loads}
\end{figure}
\subsection{Idle overrides any speed}
\begin{figure}
\centering
\includegraphics[width=.70\linewidth]{figures/graph_freqtime_micro.pdf}
\bfcaption{CPU speed and runtime for a fixed workload, different delays}
\label{fig:speed_time_delay}
\end{figure}
\fixme{ADD: discussion re: p-states and c-states -- we rely on the later to shut off cores but do not delve into tweaking it}
\subsection{Dynamic, reactive governors: An already incomplete solution}
Linux on Android has tried to answer this problem by using one of several dynamically reactive, variable-speed governors.
These formulas all use some metric of previous CPU usage history to calculate future CPU speed.
There are a number of implementations of this idea: Among previous governors used in Android, \texttt{ondemand} and \texttt{interactive} both use sampling to calculate the proportion of time the CPU non-idle, and use this to set CPU speed.\cite{ondemand-governor, interactive-governor}
% N.b. -- interactive is out-of-tree: see https://www.slideshare.net/opersys/scheduling-in-android-78020037
The current default CPU policy, \schedutil, bases speed from the proportion of recent work on the runqueue obtained from scheduler events.\cite{schedutil-governor}
\fixme{add formulae? (weeds...)}
Ultimately, the goal of any governor is to run the CPU at the ideal speed for pending \textit{future work}.
The accuracy and efficiency of all reactive governors thus depends on the extent to which past performance is indicative of future needs.
Existing exceptions that override the governor policy, at both the high and low end of speeds, begin to highlight the inadequacies of dynamic governors.
\tinysection{Idling overrides any speed}
%\fixme{ADD: discussion re: p-states and c-states -- we rely on the later to shut off cores but do not delve into tweaking it}
%An obvious question with running an app at a fixed speed is what happens when work finishes.
Running a CPU with no work wastes energy, and slowing the CPU saves energy.
Running a CPU with no work wastes energy.
Reactive governors address this by slowing the CPU.
%This complex speed-selection system is not the only way, however.
There is something better and simpler than slowing the CPU, however -- after all, when there is no work, the optimal CPU speed is not just slower but zero.
CPU speed policy overlaps significantly with another existing Linux system policy: that of CPU idling.
When there is no work -- specifically, when a CPU runqueue has no tasks -- the Linux idle policy on Android shuts down unneeded cores.
Lowering the speed to save energy in the CPU governor is thus unnecessary.
%There is something better and simpler than merely slowing CPU speed, however -- after all, when there is no work, the optimal CPU speed is not just slower but zero.
However, CPU speed policy on Android already overlaps significantly with another Linux system policy: that of CPU idling.
When there is no work -- specifically, when a CPU runqueue has no tasks -- this idle policy bypasses the speed selection and instead shuts down unneeded cores.
%We will show that, in typical phone use cases, lowering the speed to save energy in the CPU governor is thus unnecessary.
Figure \ref{fig:idle_impact} illustrates this: We ran varying levels of work for each of several fixed CPU speeds, as well as for the system default policy, for 20s.
The energy consumed by a continuous do-nothing workload (no sleeping) tracks CPU speed, as expected.
The energy curve produced by a partial workload (compute interleaved with sleeping) begins to flatten, however.
The plot of a fully sleeping task for different CPU policies produces a nearly flat line.
No matter what the requested speed by the CPU governor, when there is no work, the idle policy overrides this and shuts down the core.
This suggests that \textit{the best way to achieve the benefits of lowering CPU speed is by relying on idling.}
No matter what the requested speed by the CPU governor, when there is no work, the idle policy overrides the speed and shuts down the core.
This makes sense, as the best speed when there is no work is 0.
We will show that reliance on idling can apply more broadly, under common reduced work conditions, to save even more energy.
Ultimately, this is because \textit{the best CPU speed does often comes from idling and not from governor calculations}.
\tinysection{Android hardcodes performance boosts}
At the other end of the scale, running the CPU too slowly hurts performance.
Dynamic governors only gradually raise the speed of a loaded system.
The top 2 graphs of figure \ref{fig:speed_time_delay} illustrates this.
We ran a compute-bound workload with the default \schedutil policy.
It takes over .14s for the CPU to hit maximum; with repeated intemittent loads that are common on phones, this problem becomes much worse.
The Linux maintainers have recognized the need for better performance, and introduced an API for userspace processes to request a boost (or reduction) to the CPU speed otherwise calculated by the \schedutil governor.\cite{schedtune}
% (or request a reduction to speed)
The AOSP platform has used this feature in limited capacity, to set the CPU to 100\% for a short period during app launches.
The time of the speed boost does not always match the period when the user is waiting, and can be optimized.
\fixme{show}
%They have not incorporated this into the governor framework.
However, this again illustrates that the best CPU speed -- here, 100 -- \textit{derives from userspace and not from the dynamic, reactive governor}.
\subsection{The cost and problems of complex speed micromanagement}
@ -54,13 +89,6 @@ Despite the complexity, the system often makes bad choices and picks speed that
\label{fig:speed_time}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=.70\linewidth]{figures/graph_freqtime_micro.pdf}
\bfcaption{CPU speed and runtime for a fixed workload, different delays}
\label{fig:speed_time_delay}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=.70\linewidth]{figures/graph_oscill_cycles.png}
@ -87,9 +115,10 @@ Previous studies have noted that intermittent workloads significantly harm smart
Figure \ref{fig:speed_time_delay} shows how the combination of intermittent loads with lagging ramp-up speeds picked by the \schedutil policy increases runtime significantly.
We ran the same fixed workload with 2 different delay settings.
In both cases, the 2 righthand graphs are time zooms of the 2 left graphs to show detail.
With no sleep intervals (the top 2 graphs) -- where the workload is run continuously -- the system takes about .14s to reach 100\% speed.
With no sleep intervals (the top 2 graphs) -- where the workload is run continuously -- we previously noted the system takes about .14s to reach 100\% speed.
Adding periodic 5ms sleeps (the bottom 2 graphs) not only increases runtime by the sleep intervals themselves, but also each sleep induces the governor to keep the speed much lower, hovering around 40\% of maximum throughout the run.
In section \ref{WHAT}, we discuss the regular occurence of this slow runtime pattern in the Android system, and how our \systemname system improves it: Rather than relying on a reactive policy, we use hints from userspace to identify when to prioritize runtime and keep the CPU at 100\% when running -- and let the idle system shut the CPU off when not.
\fixme{distinguish from default}
We originally suspected this increased runtime might be due to overhead in either hardware, while the CPU is transitioning frequencies, or software, during complex calculations in the \schedutil governor.
However, Figure \ref{fig:cycles_time} shows this is not the case.
@ -131,9 +160,11 @@ Particularly, the system default policy consumes notably more energy than mid-sp
The energy inefficiency of the default policy can stem from picking a speed that is either too high or too low.
First, when running a compute-bound task, the \schedutil governor will tend to ramp up speed to maximum -- indeed, that what happens in Figure \ref{fig:u_micro}.
While this behavior is beneficial for interactive, compute-bound tasks when runtime is the priority, it is less desirable for background tasks.
The governor cannot distinguish when the additional energy may be justified and blindly adjusts to a high (yet ironically taking too long to adjust when warranted).
While this behavior is beneficial for interactive, compute-bound tasks when the user is waiting and runtime is the priority, it is less desirable for background tasks.
The governor cannot distinguish when the additional energy may be justified and will always blindly adjust to a high speed (yet ironically taking too long to adjust when warranted).
Background services downloading updates well in advance of need by the app and user fall into this category.
The Linux community has recognized this problem: the second primary reason they added the \texttt{schedtune} API to the \schedutil governor was to permit sidestepping of an energy-wasteful speed picked by the governor.
To our knowlege, AOSP has not taken advantage of this ability.
Second, the default policy frequently picks speeds that are too high during direct app interactive periods.
Figure \ref{fig:u_micro} shows that the energy penalty ramps sharply for the highest speed -- particularly so when multiple CPUs are being used (lightest, dashed lines in the graph), as is typically the case with real world apps.
@ -164,6 +195,74 @@ Finally, we observe that there is a small amount of energy overhead to frequent
\todo{this add confusion to the story "the problem is bad choices"?}
%%%% \begin insert old s3
\begin{figure}
\centering
\includegraphics[width=.95\linewidth]{figures/optimize_goal_cpu_speed.pdf}
\bfcaption{How CPU speed should flow from the current CPU goal}
\label{fig:optimize_goal_cpu_speed}
\end{figure}
\subsection{What parameters governors should be considering}
Dynamic reactive governors is that they ignore the most import important criterion, \textit{the current primary goal of the CPU}.
Specifically, the system, and CPU, should already know what it is currently primarily trying to achieve: optimizing to save energy, to improve performance, or to prevent memory stalls.
This information, in turn, should drive the selection of CPU speed.
We show that this selection system offers better energy in the common case, and better performance where needed, than the system default.
The system can derive its goal information from 2 particular sources: userspace and system interactivity.
Previous studies have shown the utility of applications using knowledge of their own workloads to set CPU speeds manually.\cite{korkmaz2018workload}
The Android system already partly leverages platform information of when to optimize app starts.
This can and should be expanded for more general usage.
Secondly, governors should consider the state of interactivity.
The default governor, when presented with a compute-intensive task, will quickly ramp speed to maximum.
Unless the user is actively waiting, this wastes energy.
Conversely, when the default governor becomes blocked on disk or net, it will lower CPU speed.
If the user is waiting, such as during an app coldstart, this hurts performance.
\fixme{dup earlier? examples}
\tinysection{Optimizing for energy}
This is the common case: Most of the time, the main app thread is blocking on user input, whether interactively with the screen on or while dozing with the screen off.
Here, the governor should aim to optimize (minimize) energy usage.
While there are typically background threads running, they are precomputing work for some future use, particularly pending screendraws.
In the this case, completing them as quickly as possible is not the goal.
Rather, they only need to be ready before periodic needs, such as user response or screendraw deadlines.
The key observation is that the presence of background tasks does \textit{not change the goal of optimizing for energy}.
The CPUs spend the bulk of their time in idle -- that is, there are plenty of potential compute resources available.
Thus, there is typically no need to run CPUs anywhere close to full speed.
We will show that an identifiable midspeed setting achieves the goal of optimizing for energy, while still meeting background UI screendraw deadlines.
\fixme{also show this works for download / audiostream}
\tinysection{Optimizing for performance}
Phones also have periods when the user is waiting.
Here, the goal of the governor should optimize the CPU for performance.
App installs, app coldstarts -- after an installed app gets killed due to memory pressure -- and new browser all tabs fit this case.
Specifically, there is no reason to run the CPU at any less than 100\%, as the default policy often does.
Notably, the nature of the CPU load by itself is insufficient to determine when to optimize for performance.
A long-running compute-heavy background task, that would trigger an energy-wasteful speed ramp-up under the default policy, should not justify changing optimization goals.
Rather, the governor should also consider whether the user is actually waiting.
Happily, the bulk of these cases -- when the Android system is interactive but the foreground is not ready to receive input are readily identifiable: Userspace, the platform or the app, knows when it needs to do a lot of work before it can present a foreground app ready to receive input.
We design our system to use this information and show that it offers better performance than the default case.
\tinysection{Optimizing to prevent CPU stalls}
Long-running, memory bound periods -- scanning a hash table or sorting a sparse array -- present an in-between case.
As we later show, there is generally no reason to run the CPU below a particular energy-optimal speed.
Memory-bound periods are such a reason, however: The CPU is necessarily running, and thus cannot be put into idle, but is stalling on memory access.
An even lower speed than the common case, dictated by the new bottleneck of memory access rather than UI screendraws, can offer additional energy savings.
While we have not identified any use cases that fall into this category, we observe that it can happen, and design our system to accommodate this case.
\todo{existing Linux knob?}
%%%% \end insert old s3
\subsection{A different approach}
Some system of changing CPU speed is necessary to achieve the base goals of furnishing performance when needed and conserving energy when not needed.

View file

@ -13,9 +13,8 @@ Launch screen on; idle & 130 \\
\label{fig:item_energy_cost}
\end{figure}
On modern systems, the CPUs that perform computation typically consist of multiple cores, often of different types, that run at different speeds and can be turned on and off.
On modern systems, the CPUs that perform computation typically consist of multiple cores, often of different types, that run at different speeds -- known as P-states -- and can be turned on and off into idle -- known as C-states.
The software policies that control what CPU cores run when and at what performance level must balance competing system design goals, particularly optimizing for energy versus for performance.
\fixme{mention idle states?}
\subsection{Phone CPU management is energy critical}

View file

@ -12,7 +12,7 @@ Conversely, when the default governor becomes blocked on disk or net, it will lo
If the user is waiting, such as during an app coldstart, this hurts performance.
Governors can, and should, obtain information about both pending workload type and interactivity state from an easy source: userspace.
Previous studies have already shown the utility of applications using knowledge of their own workloads to set CPU speeds manually.\cite[korkmaz2018workload]
Previous studies have already shown the utility of applications using knowledge of their own workloads to set CPU speeds manually.\cite{korkmaz2018workload}
However, this has inexplicably not been incorporated into in-kernel governor design.