sections/*: updates
This commit is contained in:
parent
798fe0d7e5
commit
82e732f8ad
|
@ -1,15 +1,13 @@
|
||||||
% -*- root: ../main.tex -*-
|
% -*- root: ../main.tex -*-
|
||||||
|
|
||||||
\todo{dynamic or reactive?}
|
The governors on phones strive to meet two main goals:
|
||||||
Generally, the governors on phones strive to meet two basic goals:
|
|
||||||
First, they must set the CPU speed higher when there is pending computation, optimizing performance at the expenese of energy.
|
First, they must set the CPU speed higher when there is pending computation, optimizing performance at the expenese of energy.
|
||||||
Second, they must set speed lower when computation needs decline or stop, sacrificing performance to save energy.
|
Second, they must set speed lower when computation needs decline or stop, to save energy.
|
||||||
If performance is the only goal, as is often the case with data center servers, the solution is clear: set the CPU to 100\%.
|
If performance is the only goal, as is the case with data center servers, the solution is clear: set the CPU to 100\%.
|
||||||
However, on phones, where latency is not always critical, this would needlessly waste energy.
|
However, on phones, where latency is not always critical, this would needlessly waste energy.
|
||||||
%\XXXnote{microbench: 100\% eats energy}
|
Determining when to set the CPU speed to what is the issue.
|
||||||
The dynamically reactive, variable-speed nature of governors avoids this problem and achieves the previously stated goals: Computation speed is increased when needed, and decreased when not.
|
%graph: 100% eats energy
|
||||||
There are a number of implementations of this idea: Among previous governors in Android linux, \texttt{ondemand} and \texttt{interactive}, both use the recent proportion of CPU non-idle time to calcualte CPU speed.\cite{ondemand-governor, interactive-governor}
|
|
||||||
The current default CPU policy, \schedutil, bases speed from the proportion of recent work on the runqueue.\cite{schedutil-governor}
|
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
|
@ -20,24 +18,61 @@ The current default CPU policy, \schedutil, bases speed from the proportion of r
|
||||||
\fixme{Add-in multiple threads, with varying 0-50-100 loads}
|
\fixme{Add-in multiple threads, with varying 0-50-100 loads}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\subsection{Idle overrides any speed}
|
\begin{figure}
|
||||||
|
\centering
|
||||||
|
\includegraphics[width=.70\linewidth]{figures/graph_freqtime_micro.pdf}
|
||||||
|
\bfcaption{CPU speed and runtime for a fixed workload, different delays}
|
||||||
|
\label{fig:speed_time_delay}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
\fixme{ADD: discussion re: p-states and c-states -- we rely on the later to shut off cores but do not delve into tweaking it}
|
\subsection{Dynamic, reactive governors: An already incomplete solution}
|
||||||
|
|
||||||
|
Linux on Android has tried to answer this problem by using one of several dynamically reactive, variable-speed governors.
|
||||||
|
These formulas all use some metric of previous CPU usage history to calculate future CPU speed.
|
||||||
|
There are a number of implementations of this idea: Among previous governors used in Android, \texttt{ondemand} and \texttt{interactive} both use sampling to calculate the proportion of time the CPU non-idle, and use this to set CPU speed.\cite{ondemand-governor, interactive-governor}
|
||||||
|
% N.b. -- interactive is out-of-tree: see https://www.slideshare.net/opersys/scheduling-in-android-78020037
|
||||||
|
The current default CPU policy, \schedutil, bases speed from the proportion of recent work on the runqueue obtained from scheduler events.\cite{schedutil-governor}
|
||||||
|
\fixme{add formulae? (weeds...)}
|
||||||
|
|
||||||
|
Ultimately, the goal of any governor is to run the CPU at the ideal speed for pending \textit{future work}.
|
||||||
|
The accuracy and efficiency of all reactive governors thus depends on the extent to which past performance is indicative of future needs.
|
||||||
|
Existing exceptions that override the governor policy, at both the high and low end of speeds, begin to highlight the inadequacies of dynamic governors.
|
||||||
|
|
||||||
|
\tinysection{Idling overrides any speed}
|
||||||
|
|
||||||
|
%\fixme{ADD: discussion re: p-states and c-states -- we rely on the later to shut off cores but do not delve into tweaking it}
|
||||||
|
|
||||||
%An obvious question with running an app at a fixed speed is what happens when work finishes.
|
%An obvious question with running an app at a fixed speed is what happens when work finishes.
|
||||||
Running a CPU with no work wastes energy, and slowing the CPU saves energy.
|
Running a CPU with no work wastes energy.
|
||||||
|
Reactive governors address this by slowing the CPU.
|
||||||
%This complex speed-selection system is not the only way, however.
|
%This complex speed-selection system is not the only way, however.
|
||||||
There is something better and simpler than slowing the CPU, however -- after all, when there is no work, the optimal CPU speed is not just slower but zero.
|
%There is something better and simpler than merely slowing CPU speed, however -- after all, when there is no work, the optimal CPU speed is not just slower but zero.
|
||||||
CPU speed policy overlaps significantly with another existing Linux system policy: that of CPU idling.
|
However, CPU speed policy on Android already overlaps significantly with another Linux system policy: that of CPU idling.
|
||||||
When there is no work -- specifically, when a CPU runqueue has no tasks -- the Linux idle policy on Android shuts down unneeded cores.
|
When there is no work -- specifically, when a CPU runqueue has no tasks -- this idle policy bypasses the speed selection and instead shuts down unneeded cores.
|
||||||
Lowering the speed to save energy in the CPU governor is thus unnecessary.
|
%We will show that, in typical phone use cases, lowering the speed to save energy in the CPU governor is thus unnecessary.
|
||||||
|
|
||||||
Figure \ref{fig:idle_impact} illustrates this: We ran varying levels of work for each of several fixed CPU speeds, as well as for the system default policy, for 20s.
|
Figure \ref{fig:idle_impact} illustrates this: We ran varying levels of work for each of several fixed CPU speeds, as well as for the system default policy, for 20s.
|
||||||
The energy consumed by a continuous do-nothing workload (no sleeping) tracks CPU speed, as expected.
|
The energy consumed by a continuous do-nothing workload (no sleeping) tracks CPU speed, as expected.
|
||||||
The energy curve produced by a partial workload (compute interleaved with sleeping) begins to flatten, however.
|
The energy curve produced by a partial workload (compute interleaved with sleeping) begins to flatten, however.
|
||||||
The plot of a fully sleeping task for different CPU policies produces a nearly flat line.
|
The plot of a fully sleeping task for different CPU policies produces a nearly flat line.
|
||||||
No matter what the requested speed by the CPU governor, when there is no work, the idle policy overrides this and shuts down the core.
|
No matter what the requested speed by the CPU governor, when there is no work, the idle policy overrides the speed and shuts down the core.
|
||||||
This suggests that \textit{the best way to achieve the benefits of lowering CPU speed is by relying on idling.}
|
This makes sense, as the best speed when there is no work is 0.
|
||||||
|
We will show that reliance on idling can apply more broadly, under common reduced work conditions, to save even more energy.
|
||||||
|
Ultimately, this is because \textit{the best CPU speed does often comes from idling and not from governor calculations}.
|
||||||
|
|
||||||
|
\tinysection{Android hardcodes performance boosts}
|
||||||
|
|
||||||
|
At the other end of the scale, running the CPU too slowly hurts performance.
|
||||||
|
Dynamic governors only gradually raise the speed of a loaded system.
|
||||||
|
The top 2 graphs of figure \ref{fig:speed_time_delay} illustrates this.
|
||||||
|
We ran a compute-bound workload with the default \schedutil policy.
|
||||||
|
It takes over .14s for the CPU to hit maximum; with repeated intemittent loads that are common on phones, this problem becomes much worse.
|
||||||
|
The Linux maintainers have recognized the need for better performance, and introduced an API for userspace processes to request a boost (or reduction) to the CPU speed otherwise calculated by the \schedutil governor.\cite{schedtune}
|
||||||
|
% (or request a reduction to speed)
|
||||||
|
The AOSP platform has used this feature in limited capacity, to set the CPU to 100\% for a short period during app launches.
|
||||||
|
The time of the speed boost does not always match the period when the user is waiting, and can be optimized.
|
||||||
|
\fixme{show}
|
||||||
|
%They have not incorporated this into the governor framework.
|
||||||
|
However, this again illustrates that the best CPU speed -- here, 100 -- \textit{derives from userspace and not from the dynamic, reactive governor}.
|
||||||
|
|
||||||
|
|
||||||
\subsection{The cost and problems of complex speed micromanagement}
|
\subsection{The cost and problems of complex speed micromanagement}
|
||||||
|
@ -54,13 +89,6 @@ Despite the complexity, the system often makes bad choices and picks speed that
|
||||||
\label{fig:speed_time}
|
\label{fig:speed_time}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\begin{figure}
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=.70\linewidth]{figures/graph_freqtime_micro.pdf}
|
|
||||||
\bfcaption{CPU speed and runtime for a fixed workload, different delays}
|
|
||||||
\label{fig:speed_time_delay}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=.70\linewidth]{figures/graph_oscill_cycles.png}
|
\includegraphics[width=.70\linewidth]{figures/graph_oscill_cycles.png}
|
||||||
|
@ -87,9 +115,10 @@ Previous studies have noted that intermittent workloads significantly harm smart
|
||||||
Figure \ref{fig:speed_time_delay} shows how the combination of intermittent loads with lagging ramp-up speeds picked by the \schedutil policy increases runtime significantly.
|
Figure \ref{fig:speed_time_delay} shows how the combination of intermittent loads with lagging ramp-up speeds picked by the \schedutil policy increases runtime significantly.
|
||||||
We ran the same fixed workload with 2 different delay settings.
|
We ran the same fixed workload with 2 different delay settings.
|
||||||
In both cases, the 2 righthand graphs are time zooms of the 2 left graphs to show detail.
|
In both cases, the 2 righthand graphs are time zooms of the 2 left graphs to show detail.
|
||||||
With no sleep intervals (the top 2 graphs) -- where the workload is run continuously -- the system takes about .14s to reach 100\% speed.
|
With no sleep intervals (the top 2 graphs) -- where the workload is run continuously -- we previously noted the system takes about .14s to reach 100\% speed.
|
||||||
Adding periodic 5ms sleeps (the bottom 2 graphs) not only increases runtime by the sleep intervals themselves, but also each sleep induces the governor to keep the speed much lower, hovering around 40\% of maximum throughout the run.
|
Adding periodic 5ms sleeps (the bottom 2 graphs) not only increases runtime by the sleep intervals themselves, but also each sleep induces the governor to keep the speed much lower, hovering around 40\% of maximum throughout the run.
|
||||||
In section \ref{WHAT}, we discuss the regular occurence of this slow runtime pattern in the Android system, and how our \systemname system improves it: Rather than relying on a reactive policy, we use hints from userspace to identify when to prioritize runtime and keep the CPU at 100\% when running -- and let the idle system shut the CPU off when not.
|
In section \ref{WHAT}, we discuss the regular occurence of this slow runtime pattern in the Android system, and how our \systemname system improves it: Rather than relying on a reactive policy, we use hints from userspace to identify when to prioritize runtime and keep the CPU at 100\% when running -- and let the idle system shut the CPU off when not.
|
||||||
|
\fixme{distinguish from default}
|
||||||
|
|
||||||
We originally suspected this increased runtime might be due to overhead in either hardware, while the CPU is transitioning frequencies, or software, during complex calculations in the \schedutil governor.
|
We originally suspected this increased runtime might be due to overhead in either hardware, while the CPU is transitioning frequencies, or software, during complex calculations in the \schedutil governor.
|
||||||
However, Figure \ref{fig:cycles_time} shows this is not the case.
|
However, Figure \ref{fig:cycles_time} shows this is not the case.
|
||||||
|
@ -131,9 +160,11 @@ Particularly, the system default policy consumes notably more energy than mid-sp
|
||||||
|
|
||||||
The energy inefficiency of the default policy can stem from picking a speed that is either too high or too low.
|
The energy inefficiency of the default policy can stem from picking a speed that is either too high or too low.
|
||||||
First, when running a compute-bound task, the \schedutil governor will tend to ramp up speed to maximum -- indeed, that what happens in Figure \ref{fig:u_micro}.
|
First, when running a compute-bound task, the \schedutil governor will tend to ramp up speed to maximum -- indeed, that what happens in Figure \ref{fig:u_micro}.
|
||||||
While this behavior is beneficial for interactive, compute-bound tasks when runtime is the priority, it is less desirable for background tasks.
|
While this behavior is beneficial for interactive, compute-bound tasks when the user is waiting and runtime is the priority, it is less desirable for background tasks.
|
||||||
The governor cannot distinguish when the additional energy may be justified and blindly adjusts to a high (yet ironically taking too long to adjust when warranted).
|
The governor cannot distinguish when the additional energy may be justified and will always blindly adjust to a high speed (yet ironically taking too long to adjust when warranted).
|
||||||
Background services downloading updates well in advance of need by the app and user fall into this category.
|
Background services downloading updates well in advance of need by the app and user fall into this category.
|
||||||
|
The Linux community has recognized this problem: the second primary reason they added the \texttt{schedtune} API to the \schedutil governor was to permit sidestepping of an energy-wasteful speed picked by the governor.
|
||||||
|
To our knowlege, AOSP has not taken advantage of this ability.
|
||||||
|
|
||||||
Second, the default policy frequently picks speeds that are too high during direct app interactive periods.
|
Second, the default policy frequently picks speeds that are too high during direct app interactive periods.
|
||||||
Figure \ref{fig:u_micro} shows that the energy penalty ramps sharply for the highest speed -- particularly so when multiple CPUs are being used (lightest, dashed lines in the graph), as is typically the case with real world apps.
|
Figure \ref{fig:u_micro} shows that the energy penalty ramps sharply for the highest speed -- particularly so when multiple CPUs are being used (lightest, dashed lines in the graph), as is typically the case with real world apps.
|
||||||
|
@ -164,6 +195,74 @@ Finally, we observe that there is a small amount of energy overhead to frequent
|
||||||
\todo{this add confusion to the story "the problem is bad choices"?}
|
\todo{this add confusion to the story "the problem is bad choices"?}
|
||||||
|
|
||||||
|
|
||||||
|
%%%% \begin insert old s3
|
||||||
|
|
||||||
|
|
||||||
|
\begin{figure}
|
||||||
|
\centering
|
||||||
|
\includegraphics[width=.95\linewidth]{figures/optimize_goal_cpu_speed.pdf}
|
||||||
|
\bfcaption{How CPU speed should flow from the current CPU goal}
|
||||||
|
\label{fig:optimize_goal_cpu_speed}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\subsection{What parameters governors should be considering}
|
||||||
|
|
||||||
|
Dynamic reactive governors is that they ignore the most import important criterion, \textit{the current primary goal of the CPU}.
|
||||||
|
Specifically, the system, and CPU, should already know what it is currently primarily trying to achieve: optimizing to save energy, to improve performance, or to prevent memory stalls.
|
||||||
|
This information, in turn, should drive the selection of CPU speed.
|
||||||
|
We show that this selection system offers better energy in the common case, and better performance where needed, than the system default.
|
||||||
|
|
||||||
|
The system can derive its goal information from 2 particular sources: userspace and system interactivity.
|
||||||
|
Previous studies have shown the utility of applications using knowledge of their own workloads to set CPU speeds manually.\cite{korkmaz2018workload}
|
||||||
|
The Android system already partly leverages platform information of when to optimize app starts.
|
||||||
|
This can and should be expanded for more general usage.
|
||||||
|
|
||||||
|
Secondly, governors should consider the state of interactivity.
|
||||||
|
The default governor, when presented with a compute-intensive task, will quickly ramp speed to maximum.
|
||||||
|
Unless the user is actively waiting, this wastes energy.
|
||||||
|
Conversely, when the default governor becomes blocked on disk or net, it will lower CPU speed.
|
||||||
|
If the user is waiting, such as during an app coldstart, this hurts performance.
|
||||||
|
\fixme{dup earlier? examples}
|
||||||
|
|
||||||
|
\tinysection{Optimizing for energy}
|
||||||
|
|
||||||
|
This is the common case: Most of the time, the main app thread is blocking on user input, whether interactively with the screen on or while dozing with the screen off.
|
||||||
|
Here, the governor should aim to optimize (minimize) energy usage.
|
||||||
|
While there are typically background threads running, they are precomputing work for some future use, particularly pending screendraws.
|
||||||
|
In the this case, completing them as quickly as possible is not the goal.
|
||||||
|
Rather, they only need to be ready before periodic needs, such as user response or screendraw deadlines.
|
||||||
|
The key observation is that the presence of background tasks does \textit{not change the goal of optimizing for energy}.
|
||||||
|
The CPUs spend the bulk of their time in idle -- that is, there are plenty of potential compute resources available.
|
||||||
|
Thus, there is typically no need to run CPUs anywhere close to full speed.
|
||||||
|
We will show that an identifiable midspeed setting achieves the goal of optimizing for energy, while still meeting background UI screendraw deadlines.
|
||||||
|
\fixme{also show this works for download / audiostream}
|
||||||
|
|
||||||
|
\tinysection{Optimizing for performance}
|
||||||
|
|
||||||
|
Phones also have periods when the user is waiting.
|
||||||
|
Here, the goal of the governor should optimize the CPU for performance.
|
||||||
|
App installs, app coldstarts -- after an installed app gets killed due to memory pressure -- and new browser all tabs fit this case.
|
||||||
|
Specifically, there is no reason to run the CPU at any less than 100\%, as the default policy often does.
|
||||||
|
|
||||||
|
Notably, the nature of the CPU load by itself is insufficient to determine when to optimize for performance.
|
||||||
|
A long-running compute-heavy background task, that would trigger an energy-wasteful speed ramp-up under the default policy, should not justify changing optimization goals.
|
||||||
|
Rather, the governor should also consider whether the user is actually waiting.
|
||||||
|
Happily, the bulk of these cases -- when the Android system is interactive but the foreground is not ready to receive input are readily identifiable: Userspace, the platform or the app, knows when it needs to do a lot of work before it can present a foreground app ready to receive input.
|
||||||
|
We design our system to use this information and show that it offers better performance than the default case.
|
||||||
|
|
||||||
|
\tinysection{Optimizing to prevent CPU stalls}
|
||||||
|
|
||||||
|
Long-running, memory bound periods -- scanning a hash table or sorting a sparse array -- present an in-between case.
|
||||||
|
As we later show, there is generally no reason to run the CPU below a particular energy-optimal speed.
|
||||||
|
Memory-bound periods are such a reason, however: The CPU is necessarily running, and thus cannot be put into idle, but is stalling on memory access.
|
||||||
|
An even lower speed than the common case, dictated by the new bottleneck of memory access rather than UI screendraws, can offer additional energy savings.
|
||||||
|
While we have not identified any use cases that fall into this category, we observe that it can happen, and design our system to accommodate this case.
|
||||||
|
\todo{existing Linux knob?}
|
||||||
|
|
||||||
|
|
||||||
|
%%%% \end insert old s3
|
||||||
|
|
||||||
|
|
||||||
\subsection{A different approach}
|
\subsection{A different approach}
|
||||||
|
|
||||||
Some system of changing CPU speed is necessary to achieve the base goals of furnishing performance when needed and conserving energy when not needed.
|
Some system of changing CPU speed is necessary to achieve the base goals of furnishing performance when needed and conserving energy when not needed.
|
||||||
|
|
|
@ -13,9 +13,8 @@ Launch screen on; idle & 130 \\
|
||||||
\label{fig:item_energy_cost}
|
\label{fig:item_energy_cost}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
On modern systems, the CPUs that perform computation typically consist of multiple cores, often of different types, that run at different speeds and can be turned on and off.
|
On modern systems, the CPUs that perform computation typically consist of multiple cores, often of different types, that run at different speeds -- known as P-states -- and can be turned on and off into idle -- known as C-states.
|
||||||
The software policies that control what CPU cores run when and at what performance level must balance competing system design goals, particularly optimizing for energy versus for performance.
|
The software policies that control what CPU cores run when and at what performance level must balance competing system design goals, particularly optimizing for energy versus for performance.
|
||||||
\fixme{mention idle states?}
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Phone CPU management is energy critical}
|
\subsection{Phone CPU management is energy critical}
|
||||||
|
|
|
@ -12,7 +12,7 @@ Conversely, when the default governor becomes blocked on disk or net, it will lo
|
||||||
If the user is waiting, such as during an app coldstart, this hurts performance.
|
If the user is waiting, such as during an app coldstart, this hurts performance.
|
||||||
|
|
||||||
Governors can, and should, obtain information about both pending workload type and interactivity state from an easy source: userspace.
|
Governors can, and should, obtain information about both pending workload type and interactivity state from an easy source: userspace.
|
||||||
Previous studies have already shown the utility of applications using knowledge of their own workloads to set CPU speeds manually.\cite[korkmaz2018workload]
|
Previous studies have already shown the utility of applications using knowledge of their own workloads to set CPU speeds manually.\cite{korkmaz2018workload}
|
||||||
However, this has inexplicably not been incorporated into in-kernel governor design.
|
However, this has inexplicably not been incorporated into in-kernel governor design.
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue