intro, design, eval, notes: cleanup commit
parent
25951cf2e1
commit
09b9522d95
|
@ -14,6 +14,7 @@ Block on predictable GPS & TBD & TBD & TBD \\
|
|||
\label{decision_tree}
|
||||
\end{figure*}
|
||||
|
||||
|
||||
%\tinysection{(preliminary thoughts)}
|
||||
|
||||
General idea: Share what we are about to do with the kernel.
|
||||
|
@ -25,12 +26,17 @@ Match list: See Figure \ref{decision_tree}
|
|||
|
||||
%**VERIFY: PocketData microload (above): any better
|
||||
|
||||
|
||||
\subsection{Decision Logic}
|
||||
|
||||
Basic decision tree: Set default 70\% for general case, unless overridden by a hint otherwise: either mem-bound (<50) or compute bound (100).
|
||||
Implement time-out for 100 case (stability / security) -- we have observed these periods are < timeout in practice.
|
||||
\fixme{BETTER:} The system will ignore the hint if the device is in non-interactive state (screenoff).
|
||||
We do not address different simultaneous CPU speeds: On our device, the speeds of the 4 big and 4 little core clusters must be set as a block.
|
||||
\fixme{verify 2x -- cores 0-3 are freq-locked}
|
||||
Conflicts: between memory (50) and compute (100): future work
|
||||
|
||||
|
||||
\subsection{Implementation}
|
||||
|
||||
Mod kernel with new governor: syscall to accept hint and apply a taint to the proc (app).
|
||||
|
@ -42,12 +48,14 @@ Likely, only some high-compute games or miners may need this.
|
|||
Phones are not designed to be run at continuous saturation -- thermal limiting would kick in soon.\cite{WHAT}
|
||||
\fixme{CONFIRM THIS}
|
||||
|
||||
|
||||
\subsection{Stability and Security}
|
||||
|
||||
Yes, this API broadens attack surface.
|
||||
No, it is not any worse than before in practice.
|
||||
A misbehaving app can already spin and consume resources.\ref{maiti2015jouler}
|
||||
We set a hard timeout to 100 -- in practice this is safer than default.
|
||||
\fixme{RATHER: Use screenstate} Ignore hint if device (not just task) is in non-interactive mode (screenoff).
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -34,6 +34,18 @@
|
|||
We evaluate \systemname by comparing performance of illustrative and representative workloads on our system.
|
||||
We comapare against similar results obtained using default system settings, as well as with other CPU speed settings.
|
||||
|
||||
We assert the following:
|
||||
\begin{itemize}
|
||||
\item[0]{(Introduction) Is the (overhead) of the \schedutil governor due to HW or SW overhead?}
|
||||
\item[1]{Is there an energy optimal speed for apps?}
|
||||
\item[2]{Does running apps at an energy optimal speed saturate available CPUs?}
|
||||
\item[3]{What is the cost, in frame drops, of running interactive apps at an energy optimal speed?}
|
||||
\item[3.1]{Is this cost acceptable (within 1fps of the default policy?)}
|
||||
\item[4]{How much background CPU load does it take to cause unacceptable interactivity?}
|
||||
\item[4.1]{How much CPU load does a representative background load impose?}
|
||||
\item[5]{Are there other optimal speeds besides energy- and performance-optimal? What?}
|
||||
\end{itemize}
|
||||
\fixme{Do we try to quantify the performance optimal case?}
|
||||
|
||||
\subsection{Evaluation platform}
|
||||
|
||||
|
@ -44,12 +56,13 @@ The kernel can then use this information at its discretion to set an appropriate
|
|||
|
||||
Our evaluation system consists of parts to \fixme{Discuss evaluation setup -- scripts, UIAutomator, ftrace etc.}
|
||||
|
||||
Figure \ref{fig:u_micro_fb} shows results obtained.
|
||||
As before, a mid-speed CPU policy proves better than the system default policy.
|
||||
Additionally, \systemname also offers better energy performance.\todo{SHOW THIS}
|
||||
|
||||
\subsection{Optimal speed for non-interactive periods}
|
||||
|
||||
\subsection{}
|
||||
\fixme{TODO:} Show this is not uncommon: that significant work happens when the phone is "idle".
|
||||
Examples: system\_server, persistent, Facebook, Fluffychat
|
||||
|
||||
\subsection{Optimal speed for real-world apps}
|
||||
|
||||
We earlier observed in \ref{subsec:optimal_speed} that, for a given amount of compute, there exists an energy-optimal speed.
|
||||
We now study real-world apps under different CPU policies.
|
||||
|
@ -57,8 +70,12 @@ The question is whether our previous observation -- that there is an energy-opti
|
|||
We run scripts to simulate typical user interactions on the \facebook app under different CPU policies: the system default, various fixed speeds, and under
|
||||
\systemname
|
||||
|
||||
Figure \ref{fig:u_micro_fb} shows results obtained.
|
||||
As before, a mid-speed CPU policy proves better than the system default policy.
|
||||
Additionally, \systemname also offers better energy performance.\todo{SHOW THIS}
|
||||
|
||||
\subsection{The cost of optimal speeds in interactive apps}
|
||||
|
||||
\subsection{Cost of energy-optimal speeds in interactive apps}
|
||||
|
||||
While a simpler fixed speed policy yields optimal energy, this potentially comes at a cost.
|
||||
The output of phone apps is largely a visual display.
|
||||
|
@ -67,13 +84,17 @@ Previous studies that have constrained system resources available for interactiv
|
|||
As apps are closed source, we are unable to control the exact amount of compute.
|
||||
However, apps spend the vast bulk of their time waiting for user input. \cite{ANY??}
|
||||
While there are many background tasks running, they come nowhere to saturating available CPU resources.
|
||||
Figure \ref{fig:cpuusage} shows the total CPU non-idle time (usage) for the big and little CPUs for the \facebook interactive script.
|
||||
While the kernel balances the total load on each cluster, the important observation is that, for all CPU policies, there is plenty of the the CPUs spend idle.
|
||||
Figure \ref{fig:speed_time} shows that, under the default \texttt{schedutil} policy, CPU speed rarely hits maximum.
|
||||
\fixme{CONFIRM THIS}
|
||||
%\todo{SHOW: per-core CPU idle\% graph}
|
||||
Figure \ref{fig:cpuusage} shows the total CPU non-idle time (usage) for the big and little CPUs for the \facebook interactive script, run for different CPU policies.
|
||||
The kernel scheduler apportions work such that proportion of non-idle time for each of the {big, little} 4-core CPU clusters is the same across different policies and speeds.
|
||||
The important observation is that, for all CPU policies, there is still plenty of time the CPUs spend idle.
|
||||
In particular, the CPU usage of the default \schedutil governor is nearly that of \systemname.
|
||||
\fixme{SHOW THIS (add ours)}
|
||||
Figure \ref{fig:speed_time} additionally shows that, under the default \schedutil policy, CPU speed rarely hits maximum.
|
||||
\fixme{CONFIRM THIS}
|
||||
\fixme{Add: cpu cycle-count graph}
|
||||
|
||||
Hence, adjusting CPU speed within reason does not appreciably affect user experience.
|
||||
\fixme{weak; redo}
|
||||
Figure \ref{fig:idlejank} shows the cost, in CPU usage and in screen jank, of running the \facebook interaction under different CPU policies and under different background CPU loads.
|
||||
The leftmost part of the graph, with the smallest circles (representing a normal interaction, with no additional background load) shows that a fixed speed of 70\% or greater produces a measured screen jank rate that is essentially idential with that of the system default.
|
||||
Even speeds above 40\% produce rates of within 50\% of the system default.
|
||||
|
@ -95,6 +116,12 @@ In actual usage, a user would likely never encounter this level of background us
|
|||
The CPU usage imposed by downloading a large file consumes approximately 50\% of a single core -- far below the microloads we imposed.
|
||||
\fixme{Confirm this}
|
||||
|
||||
\fixme{ALSO.}
|
||||
Test sleeping background energy.
|
||||
system\_state, gms.persistent, facebook, and fluffychat
|
||||
Sound app (no screen but interactive)
|
||||
\todo{Test sleeping background energy}
|
||||
|
||||
|
||||
\subsection{When energy-optimal is not optimal}
|
||||
|
||||
|
|
|
@ -38,6 +38,42 @@ Launch screen on; idle & 130 \\
|
|||
\label{fig:u_micro}
|
||||
\end{figure}
|
||||
|
||||
|
||||
Systems must often balance the trade-off between optimizing for performance and energy.
|
||||
CPUs, particularly on energy-constrained platforms such as mobile, can typically run at different speed settings.
|
||||
There have been a number of policies developed for the Linux and mobile platform to determine at which speed to run the CPU, called \textit{governors}.
|
||||
Most use a variation of using some proportion of recent past CPU usage as a guide to set future speed, including the current Android system default, \schedutil.
|
||||
These governors, despite the considerable sophistication involved in their implementation, frequently wind up making sub-optimal choices that waste energy yet yield negligible if any performance boost.
|
||||
Instead, our system \systemname avoids this problem by running tasks in the general case at an energy-optimal speed that, in practice, proves sufficiently performant.
|
||||
We leverage information from the platform in userspace to identify common corner cases that warrant additional speed.
|
||||
|
||||
Generally, the governors on phones strive to meet two basic goals:
|
||||
First, they must set the CPU speed higher when there is pending computation, optimizing performance at the expenese of energy.
|
||||
Second, they must set speed lower when computation needs decline or stop, sacrificing performance to save energy.
|
||||
Simply setting the CPU speed to maximum, as is often done with data center servers, would needlessly waste energy when latency is not critical.
|
||||
The dynamically responsive, variable-speed nature of phone governors achieves the previously stated goals: Computation speed is increased when needed, and decreased when not.
|
||||
|
||||
%This complex speed-selection system is not the only way, however.
|
||||
There is something both better and simpler than slowing the CPU.
|
||||
CPU speed policy overlaps significantly with another existing Linux system policy: that of CPU idling.
|
||||
When there is no work, the optimal CPU speed is not just slower but zero.
|
||||
Indeed, system idling on Android phones already does just that: it turns off unneeded CPU cores.
|
||||
|
||||
Absent performance requirements, then, such as when the user is waiting on a compute-bound task, the design becomes simple: Run the CPU at an energy-optimal speed.
|
||||
Let the idle subsystem turn off unneeded CPUs.
|
||||
This design is ideal for non-interactive periods of the phone.
|
||||
It is, less obviously, also suitable for running typical interactive apps.
|
||||
The energy-optimal speed for actual computation -- as distinguished from shutting down the CPU when there is no work -- is not the slowest CPU speed; nearly the contrary, it is typically around 70\% for several Nexus and Pixel devices we have tested.
|
||||
We make the key observation, which we evaluate in Section \ref{sec:evaluation}, that this energy-optimal speed on phones is \textit{also fast enough} to run representative interactive apps at acceptable performance.
|
||||
\fixme{Discuss performance-optimized case and userspace hint}
|
||||
|
||||
%% NOT just overhead -- rather, default is just often making bad choices (speeds)
|
||||
%These can be fairly sophisticated, involving complex software calculation and frequent speed adjustment.
|
||||
% NO -- witness cloud centers just setting to Performance and taking the energy hit
|
||||
|
||||
|
||||
\subsection{CPU management is energy critical}
|
||||
|
||||
Energy usage on embedded mobile systems is a critial metric.
|
||||
While many hardware systems use energy, the biggest consumer are the CPU cores.
|
||||
Table \ref{fig:item_energy_cost} shows the energy consumed by the system for a fixed time when run under different conditions.
|
||||
|
@ -46,10 +82,16 @@ Saturating a single CPU, with a blank display, consumes over double the energy t
|
|||
\XXXnote{Other studies?}
|
||||
Thus the biggest potential energy savings stems from optimizing CPU usage.
|
||||
|
||||
There have been a number of system policies developed for the CPU energy-performance tradeoff, called \textit{governors}.
|
||||
\fixme{ondemand, interactive}
|
||||
|
||||
\subsection{Current CPU management Is complex}
|
||||
|
||||
% N.b. scheduling classes: stop-dl-rt-cfs-idle. DVFS only applies to cfs tasks (dl / rt tasks run at 100)
|
||||
% N.b. schedutil, unlike others, estimates load per-task rather than per-core. So handles task migration better.
|
||||
% N.b. But schedutil still calculates based on "_recent_ load"
|
||||
|
||||
\fixme{discuss various governors: ondemand, interactive}
|
||||
\fixme{discuss complicated tradeoffs and aims in governors}
|
||||
The current system default governor policy, \texttt{schedutil}, sets the CPU speed proportionate to the time fraction that the CPU has been non-idle.\cite{WHAT}
|
||||
The current system default governor policy, \schedutil, sets the CPU speed proportionate to the time fraction that the CPU has been non-idle.\cite{WHAT}
|
||||
This has the benefit of adjusting speed to anticipated demand based on past usage.
|
||||
It also however requires considerable software computation.
|
||||
On a phone, workloads typically do not saturate the CPUs but vary constantly in demand.
|
||||
|
@ -60,12 +102,14 @@ These frequent speed changes introduce overhead, where the CPUs are adjusting to
|
|||
\fixme{phone: per-cluster}
|
||||
|
||||
Instead of this complexity, would a simpler approach suffice?
|
||||
We show that, for typical phone workloads, running them at a fixed energy-optimized speed not only saves on energy versus the system default CPU policy but does so with minimal to no performance cost.
|
||||
We show that, for typical interactive workloads, running them at a fixed energy-optimized speed not only saves on energy versus the system default CPU policy but does so with minimal to no performance cost.
|
||||
For non-interactive periods, when the screen is off, there is virtually no reason to run the device at anything but an energy-optimal setting.
|
||||
Occasions when additional performance is needed -- essentially, CPU-bound workloads -- tend to be both relatively uncommon and clearly identifiable.
|
||||
When performance is not needed -- there is no compute to be done, and the runqueue is empty -- another system policy, the \textit{idle policy}, already addresses this, as we discuss below.
|
||||
We design a system, \systemname, to run the broadcase at a speed that saves energy at acceptable cost compared to the default system policy.
|
||||
We leverage hints from userspace to identify the outlier cases where we want to prioritize performance, and run those with equivalent or better performance than the default.
|
||||
|
||||
|
||||
\subsection{There is an energy-optimal speed}
|
||||
\label{subsec:optimal_speed}
|
||||
|
||||
|
@ -83,7 +127,8 @@ However, the variance is close enough that a single speed -- for our test platfo
|
|||
|
||||
An obvious question with running an app at a fixed speed is what happens when work finishes.
|
||||
Running a CPU with no work wastes energy.
|
||||
The system default CPU policy, \texttt{schedutil}, addresses this in part by tying CPU speed to the proportion of recent work on the runqueue.
|
||||
The system default CPU policy, \schedutil, addresses this in part by tying CPU speed to the proportion of recent work on the runqueue.
|
||||
\fixme{not quite true -- fix}
|
||||
When no work is to be done -- specifically, when a CPU runqueue has no tasks to run -- the Linux system idle policy shuts down unneeded cores.
|
||||
Lowering the speed to save energy in the CPU governor is thus unnecessary.
|
||||
|
||||
|
|
|
@ -5,11 +5,14 @@
|
|||
\item[0]{NEW: 2+ real-world apps (Youtube+)}
|
||||
\item[1]{Figure \ref{fig:u_micro_fb} with \systemname}
|
||||
\item[2]{NEW: Per-core stacked idle\% graph}
|
||||
\item[3]{Figure \ref{fig:drops} with \systemname policy and with energy numbers for all policies}
|
||||
\item[4]{Figure \ref{fig:idlejank} with \systemname policy and with download background load}
|
||||
\item[3]{Figure \ref{fig:idlejank} with \systemname policy and with energy numbers for all policies}
|
||||
\item[4]{Figure \ref{fig:idlejank} with download background load}
|
||||
\item[5]{NEW: latency-energy graph for CPU-bound tasks (coldstart, install)}
|
||||
\item[6]{NEW: memory-bound microbench graph}
|
||||
\item[7]{NEW: game app}
|
||||
\item[7]{NEW: study game app: how much does it saturate CPU?}
|
||||
\item[8]{NEW: noninteractive period energy study}
|
||||
\item[9]{NEW: sound app -- i.e. no screen but interactive}
|
||||
\item[10]{NEW: microbenchmarks 1-3 for source of overhead: HW or SW}
|
||||
\end{itemize}
|
||||
|
||||
%CYCLE COUNT: Show that the work done is, approximately, the same
|
||||
|
|
Loading…
Reference in New Issue