intro, design, eval, notes: cleanup commit

master
carlnues@buffalo.edu 2022-10-31 13:25:19 -04:00
parent 25951cf2e1
commit 09b9522d95
4 changed files with 101 additions and 18 deletions

View File

@ -14,6 +14,7 @@ Block on predictable GPS & TBD & TBD & TBD \\
\label{decision_tree}
\end{figure*}
%\tinysection{(preliminary thoughts)}
General idea: Share what we are about to do with the kernel.
@ -25,12 +26,17 @@ Match list: See Figure \ref{decision_tree}
%**VERIFY: PocketData microload (above): any better
\subsection{Decision Logic}
Basic decision tree: Set default 70\% for general case, unless overridden by a hint otherwise: either mem-bound (<50) or compute bound (100).
Implement time-out for 100 case (stability / security) -- we have observed these periods are < timeout in practice.
\fixme{BETTER:} The system will ignore the hint if the device is in non-interactive state (screenoff).
We do not address different simultaneous CPU speeds: On our device, the speeds of the 4 big and 4 little core clusters must be set as a block.
\fixme{verify 2x -- cores 0-3 are freq-locked}
Conflicts: between memory (50) and compute (100): future work
\subsection{Implementation}
Mod kernel with new governor: syscall to accept hint and apply a taint to the proc (app).
@ -42,12 +48,14 @@ Likely, only some high-compute games or miners may need this.
Phones are not designed to be run at continuous saturation -- thermal limiting would kick in soon.\cite{WHAT}
\fixme{CONFIRM THIS}
\subsection{Stability and Security}
Yes, this API broadens attack surface.
No, it is not any worse than before in practice.
A misbehaving app can already spin and consume resources.\ref{maiti2015jouler}
We set a hard timeout to 100 -- in practice this is safer than default.
\fixme{RATHER: Use screenstate} Ignore hint if device (not just task) is in non-interactive mode (screenoff).

View File

@ -34,6 +34,18 @@
We evaluate \systemname by comparing performance of illustrative and representative workloads on our system.
We comapare against similar results obtained using default system settings, as well as with other CPU speed settings.
We assert the following:
\begin{itemize}
\item[0]{(Introduction) Is the (overhead) of the \schedutil governor due to HW or SW overhead?}
\item[1]{Is there an energy optimal speed for apps?}
\item[2]{Does running apps at an energy optimal speed saturate available CPUs?}
\item[3]{What is the cost, in frame drops, of running interactive apps at an energy optimal speed?}
\item[3.1]{Is this cost acceptable (within 1fps of the default policy?)}
\item[4]{How much background CPU load does it take to cause unacceptable interactivity?}
\item[4.1]{How much CPU load does a representative background load impose?}
\item[5]{Are there other optimal speeds besides energy- and performance-optimal? What?}
\end{itemize}
\fixme{Do we try to quantify the performance optimal case?}
\subsection{Evaluation platform}
@ -44,12 +56,13 @@ The kernel can then use this information at its discretion to set an appropriate
Our evaluation system consists of parts to \fixme{Discuss evaluation setup -- scripts, UIAutomator, ftrace etc.}
Figure \ref{fig:u_micro_fb} shows results obtained.
As before, a mid-speed CPU policy proves better than the system default policy.
Additionally, \systemname also offers better energy performance.\todo{SHOW THIS}
\subsection{Optimal speed for non-interactive periods}
\subsection{}
\fixme{TODO:} Show this is not uncommon: that significant work happens when the phone is "idle".
Examples: system\_server, persistent, Facebook, Fluffychat
\subsection{Optimal speed for real-world apps}
We earlier observed in \ref{subsec:optimal_speed} that, for a given amount of compute, there exists an energy-optimal speed.
We now study real-world apps under different CPU policies.
@ -57,8 +70,12 @@ The question is whether our previous observation -- that there is an energy-opti
We run scripts to simulate typical user interactions on the \facebook app under different CPU policies: the system default, various fixed speeds, and under
\systemname
Figure \ref{fig:u_micro_fb} shows results obtained.
As before, a mid-speed CPU policy proves better than the system default policy.
Additionally, \systemname also offers better energy performance.\todo{SHOW THIS}
\subsection{The cost of optimal speeds in interactive apps}
\subsection{Cost of energy-optimal speeds in interactive apps}
While a simpler fixed speed policy yields optimal energy, this potentially comes at a cost.
The output of phone apps is largely a visual display.
@ -67,13 +84,17 @@ Previous studies that have constrained system resources available for interactiv
As apps are closed source, we are unable to control the exact amount of compute.
However, apps spend the vast bulk of their time waiting for user input. \cite{ANY??}
While there are many background tasks running, they come nowhere to saturating available CPU resources.
Figure \ref{fig:cpuusage} shows the total CPU non-idle time (usage) for the big and little CPUs for the \facebook interactive script.
While the kernel balances the total load on each cluster, the important observation is that, for all CPU policies, there is plenty of the the CPUs spend idle.
Figure \ref{fig:speed_time} shows that, under the default \texttt{schedutil} policy, CPU speed rarely hits maximum.
\fixme{CONFIRM THIS}
%\todo{SHOW: per-core CPU idle\% graph}
Figure \ref{fig:cpuusage} shows the total CPU non-idle time (usage) for the big and little CPUs for the \facebook interactive script, run for different CPU policies.
The kernel scheduler apportions work such that proportion of non-idle time for each of the {big, little} 4-core CPU clusters is the same across different policies and speeds.
The important observation is that, for all CPU policies, there is still plenty of time the CPUs spend idle.
In particular, the CPU usage of the default \schedutil governor is nearly that of \systemname.
\fixme{SHOW THIS (add ours)}
Figure \ref{fig:speed_time} additionally shows that, under the default \schedutil policy, CPU speed rarely hits maximum.
\fixme{CONFIRM THIS}
\fixme{Add: cpu cycle-count graph}
Hence, adjusting CPU speed within reason does not appreciably affect user experience.
\fixme{weak; redo}
Figure \ref{fig:idlejank} shows the cost, in CPU usage and in screen jank, of running the \facebook interaction under different CPU policies and under different background CPU loads.
The leftmost part of the graph, with the smallest circles (representing a normal interaction, with no additional background load) shows that a fixed speed of 70\% or greater produces a measured screen jank rate that is essentially idential with that of the system default.
Even speeds above 40\% produce rates of within 50\% of the system default.
@ -95,6 +116,12 @@ In actual usage, a user would likely never encounter this level of background us
The CPU usage imposed by downloading a large file consumes approximately 50\% of a single core -- far below the microloads we imposed.
\fixme{Confirm this}
\fixme{ALSO.}
Test sleeping background energy.
system\_state, gms.persistent, facebook, and fluffychat
Sound app (no screen but interactive)
\todo{Test sleeping background energy}
\subsection{When energy-optimal is not optimal}

View File

@ -38,6 +38,42 @@ Launch screen on; idle & 130 \\
\label{fig:u_micro}
\end{figure}
Systems must often balance the trade-off between optimizing for performance and energy.
CPUs, particularly on energy-constrained platforms such as mobile, can typically run at different speed settings.
There have been a number of policies developed for the Linux and mobile platform to determine at which speed to run the CPU, called \textit{governors}.
Most use a variation of using some proportion of recent past CPU usage as a guide to set future speed, including the current Android system default, \schedutil.
These governors, despite the considerable sophistication involved in their implementation, frequently wind up making sub-optimal choices that waste energy yet yield negligible if any performance boost.
Instead, our system \systemname avoids this problem by running tasks in the general case at an energy-optimal speed that, in practice, proves sufficiently performant.
We leverage information from the platform in userspace to identify common corner cases that warrant additional speed.
Generally, the governors on phones strive to meet two basic goals:
First, they must set the CPU speed higher when there is pending computation, optimizing performance at the expenese of energy.
Second, they must set speed lower when computation needs decline or stop, sacrificing performance to save energy.
Simply setting the CPU speed to maximum, as is often done with data center servers, would needlessly waste energy when latency is not critical.
The dynamically responsive, variable-speed nature of phone governors achieves the previously stated goals: Computation speed is increased when needed, and decreased when not.
%This complex speed-selection system is not the only way, however.
There is something both better and simpler than slowing the CPU.
CPU speed policy overlaps significantly with another existing Linux system policy: that of CPU idling.
When there is no work, the optimal CPU speed is not just slower but zero.
Indeed, system idling on Android phones already does just that: it turns off unneeded CPU cores.
Absent performance requirements, then, such as when the user is waiting on a compute-bound task, the design becomes simple: Run the CPU at an energy-optimal speed.
Let the idle subsystem turn off unneeded CPUs.
This design is ideal for non-interactive periods of the phone.
It is, less obviously, also suitable for running typical interactive apps.
The energy-optimal speed for actual computation -- as distinguished from shutting down the CPU when there is no work -- is not the slowest CPU speed; nearly the contrary, it is typically around 70\% for several Nexus and Pixel devices we have tested.
We make the key observation, which we evaluate in Section \ref{sec:evaluation}, that this energy-optimal speed on phones is \textit{also fast enough} to run representative interactive apps at acceptable performance.
\fixme{Discuss performance-optimized case and userspace hint}
%% NOT just overhead -- rather, default is just often making bad choices (speeds)
%These can be fairly sophisticated, involving complex software calculation and frequent speed adjustment.
% NO -- witness cloud centers just setting to Performance and taking the energy hit
\subsection{CPU management is energy critical}
Energy usage on embedded mobile systems is a critial metric.
While many hardware systems use energy, the biggest consumer are the CPU cores.
Table \ref{fig:item_energy_cost} shows the energy consumed by the system for a fixed time when run under different conditions.
@ -46,10 +82,16 @@ Saturating a single CPU, with a blank display, consumes over double the energy t
\XXXnote{Other studies?}
Thus the biggest potential energy savings stems from optimizing CPU usage.
There have been a number of system policies developed for the CPU energy-performance tradeoff, called \textit{governors}.
\fixme{ondemand, interactive}
\subsection{Current CPU management Is complex}
% N.b. scheduling classes: stop-dl-rt-cfs-idle. DVFS only applies to cfs tasks (dl / rt tasks run at 100)
% N.b. schedutil, unlike others, estimates load per-task rather than per-core. So handles task migration better.
% N.b. But schedutil still calculates based on "_recent_ load"
\fixme{discuss various governors: ondemand, interactive}
\fixme{discuss complicated tradeoffs and aims in governors}
The current system default governor policy, \texttt{schedutil}, sets the CPU speed proportionate to the time fraction that the CPU has been non-idle.\cite{WHAT}
The current system default governor policy, \schedutil, sets the CPU speed proportionate to the time fraction that the CPU has been non-idle.\cite{WHAT}
This has the benefit of adjusting speed to anticipated demand based on past usage.
It also however requires considerable software computation.
On a phone, workloads typically do not saturate the CPUs but vary constantly in demand.
@ -60,12 +102,14 @@ These frequent speed changes introduce overhead, where the CPUs are adjusting to
\fixme{phone: per-cluster}
Instead of this complexity, would a simpler approach suffice?
We show that, for typical phone workloads, running them at a fixed energy-optimized speed not only saves on energy versus the system default CPU policy but does so with minimal to no performance cost.
We show that, for typical interactive workloads, running them at a fixed energy-optimized speed not only saves on energy versus the system default CPU policy but does so with minimal to no performance cost.
For non-interactive periods, when the screen is off, there is virtually no reason to run the device at anything but an energy-optimal setting.
Occasions when additional performance is needed -- essentially, CPU-bound workloads -- tend to be both relatively uncommon and clearly identifiable.
When performance is not needed -- there is no compute to be done, and the runqueue is empty -- another system policy, the \textit{idle policy}, already addresses this, as we discuss below.
We design a system, \systemname, to run the broadcase at a speed that saves energy at acceptable cost compared to the default system policy.
We leverage hints from userspace to identify the outlier cases where we want to prioritize performance, and run those with equivalent or better performance than the default.
\subsection{There is an energy-optimal speed}
\label{subsec:optimal_speed}
@ -83,7 +127,8 @@ However, the variance is close enough that a single speed -- for our test platfo
An obvious question with running an app at a fixed speed is what happens when work finishes.
Running a CPU with no work wastes energy.
The system default CPU policy, \texttt{schedutil}, addresses this in part by tying CPU speed to the proportion of recent work on the runqueue.
The system default CPU policy, \schedutil, addresses this in part by tying CPU speed to the proportion of recent work on the runqueue.
\fixme{not quite true -- fix}
When no work is to be done -- specifically, when a CPU runqueue has no tasks to run -- the Linux system idle policy shuts down unneeded cores.
Lowering the speed to save energy in the CPU governor is thus unnecessary.

View File

@ -5,11 +5,14 @@
\item[0]{NEW: 2+ real-world apps (Youtube+)}
\item[1]{Figure \ref{fig:u_micro_fb} with \systemname}
\item[2]{NEW: Per-core stacked idle\% graph}
\item[3]{Figure \ref{fig:drops} with \systemname policy and with energy numbers for all policies}
\item[4]{Figure \ref{fig:idlejank} with \systemname policy and with download background load}
\item[3]{Figure \ref{fig:idlejank} with \systemname policy and with energy numbers for all policies}
\item[4]{Figure \ref{fig:idlejank} with download background load}
\item[5]{NEW: latency-energy graph for CPU-bound tasks (coldstart, install)}
\item[6]{NEW: memory-bound microbench graph}
\item[7]{NEW: game app}
\item[7]{NEW: study game app: how much does it saturate CPU?}
\item[8]{NEW: noninteractive period energy study}
\item[9]{NEW: sound app -- i.e. no screen but interactive}
\item[10]{NEW: microbenchmarks 1-3 for source of overhead: HW or SW}
\end{itemize}
%CYCLE COUNT: Show that the work done is, approximately, the same