sections/*: updates

This commit is contained in:
carlnues@buffalo.edu 2023-08-01 14:01:21 -04:00
parent 15f4397b27
commit a1f7ecb2d1
3 changed files with 96 additions and 40 deletions

View file

@ -1,11 +1,6 @@
% -*- root: ../main.tex -*-
The governors on phones trade off between performance maximization and energy minimization.
\XXXnote{Mention original goals? (provide compute when needed; remove when not) Spells out what idle makes redundant}
\todo{Question}
%strive to meet two main goals:
%First, they must set the CPU speed higher when there is pending computation, optimizing performance at the expenese of energy.
%Second, they must set speed lower when computation needs decline or stop, to save energy.
If performance is the only goal, the solution is clear: set the CPU to 100\%.
However, on phones, where latency is not always critical, this needlessly wastes energy.
Determining when to set the CPU speed to what is the issue.
@ -27,7 +22,7 @@ These are representative of common platforms and uses in the real world.
\begin{figure}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_freqtime_micro.pdf}
\bfcaption{CPU speed and runtime for a fixed workload, different delays}
\bfcaption{Intermittent workloads hurt runtime 2 ways: diretly, by sleeping and indirectly, by inducing slower CPU speeds}
\label{fig:speed_time_delay}
\end{figure}
@ -83,24 +78,27 @@ This is because past CPU utilization -- the bedrock metric of all dynamic govern
\begin{figure*}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_oscill_cycles.pdf}
%
\begin{subfigure}{0.45\textwidth}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_oscill_cycles_little.pdf}
%\includegraphics
\bfcaption{Little CPUs}
\end{subfigure}
\begin{subfigure}{0.45\textwidth}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_oscill_cycles_big.pdf}
%\includegraphics
\bfcaption{Big CPUs}
\end{subfigure}
\bfcaption{CPU cycles for a fixed workload under different policies}
%
\bfcaption{Changing CPU speeds imposes computation and runtime costs \fixme{ALSO ENERGY}}
\label{fig:cycles_time}
\end{figure*}
\begin{figure}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_drops.png}
\bfcaption{The negligible benefits of higher CPU speeds, for 30s Facebook interaction \fixme{Need to beautify}}
\includegraphics[width=.95\linewidth]{figures/graph_jank_perspeed_fb.pdf}
\bfcaption{The negligible benefits of higher CPU speeds, for 30s Facebook interaction \fixme{ALSO ENERGY}}
\label{fig:screendrops_per_freq_fb}
\end{figure}
@ -115,6 +113,8 @@ The \schedutil policy sets the CPU speed based on a rolling window of recent run
On a phone, workloads typically do not saturate the CPUs but vary constantly in demand.
With history-driven dynamic policies such as \schedutil, this triggers constantly changing speeds\cite{nuessle2019benchmarking}.
Figure \ref{fig:missed_opportunities} shows how this hurts performance (in addition to costing energy): The CPU takes .16s to reach \fenergy, and .3s to hit maximum.
\XXXnote{Re: fpow not yet shown beneficial: Side issue here; thrust is poor response. Fix: Remove energy ref?}
\todo{Question}
Previous studies have additionally noted that intermittent workloads makes this problem significantly worse.\cite{nuessle2019benchmarking}
Figure \ref{fig:speed_time_delay} illustrates this: We ran the same fixed workload with and without intermittent 5ms sleeps.
With no sleep intervals, the top graph shows the workload takes $\sim$7.1s to complete.
@ -123,41 +123,76 @@ Adding 1000 5ms sleeps (the bottom graph) induces the governor to keep the speed
Of the additional 18.2s runtime, 5s stems from total sleeping, and $\sim$13.1s from running at a slower CPU speed.
We will show that real-world apps, when running the default policy, similarly spend significant time at unnecessarily low speeds.
\tinysection{Performance costs are purely due to bad speed selection and not to switching costs}
\tinysection{Performance costs are partly due to switching costs}
\fixme{No; update discussion}
We originally suspected this increased runtime might be due to overhead in either hardware, while the CPU is transitioning frequencies, or software, during complex calculations in the \schedutil governor.
However, Figure \ref{fig:cycles_time} shows this is not the case.
We ran a fixed workload, tracking runtime and work performed (measured in HW CPU cycles), under different CPU policies: The default, fixed low, medium, and high speeds, and a policy that rapidly oscillated between the low and high speeds.
We suspected this increased runtime might be due to overhead in either hardware, while the CPU is transitioning frequencies, or software, during complex calculations in the \schedutil governor.
To explore this, we ran a fixed workload, tracking runtime and work performed (measured in HW CPU cycles), under different CPU policies: The default, fixed low, medium, and high speeds, and a policy that rapidly oscillated between the low and high speeds.
We selected the the fixed medium speed to equal exactly the mean of the fixed low and fixed high speeds.
In the absence of any overhead, the runtime of the fixed midspeed and oscillating speed policies should therefore be nearly identical.
We chose the rate of oscillation (about 3ms) to mimic the rate of speed changes typically observed under the system default when running common real world apps.
We tracked runtime and actual work performed, in CPU cycles.
Figure \ref{fig:cycles_time} shows the results, broken out by big and little CPU type.
The runtimes of the low and high speed policies were slower and higher, as expected.
Significantly, the runtimes of the oscillating policy and the fixed medium speed policies -- the red and green bars in the leftmost bargraph -- were nearly identical, suggesting there is essentially no hardware overhead penalty to changing CPU speeds.
Additionally, as evidenced by extremelly similar heights of the bars in the second graphs, the work performed in CPU cycles under all policies was nearly identical, within .03\% of each other.
Any computational overhead in the default policy is thus negligible by comparison.
The poor performance of \schedutil policy, therefore, must lie not in hardware or software overhead, but elsewhere: It makes bad CPU speed choices.
The key to improvement lies in making better ones.
Relative to the midspeed setting, the runtimes of the low and high speed policies were slower and higher, as expected.
The runtime of the oscillating policy was higher than the midspeed: very marginally ($\sim$.3\%) so for little CPUs, and 1.5\% for big CPUs.
The cyclecount of the oscillating policy was also $\sim$.3\% higher for little CPUs, suggesting performance overhead is due entirely to additional software computation in calling into the driver code in this case.
% default: lower cyclecount than oscillate => NOT due to complexity of system
For big CPUs, computation overhead was .08\% higher for the oscillating than for the midspeed policy, a magnitude smaller difference than the runtime cost.
Additional hardware delays likely contribute to frequency switching overhead.
%Significantly, the runtimes of the oscillating policy and the fixed medium speed policies -- the red and green bars in the leftmost bargraph -- were nearly identical, suggesting there is essentially no hardware overhead penalty to changing CPU speeds.
%Additionally, as evidenced by extremelly similar heights of the bars in the second graphs, the work performed in CPU cycles under all policies was nearly identical, within .03\% of each other.
%Any computational overhead in the default policy is thus negligible by comparison.
The switching cost in both cases, while small, suggests speed changes should be minimized where possible.
However, Figure \ref{fig:speed_time_delay} shows that running an intemittent workload, with many frequency changes, produces a huge change in runtime.
This suggests that the poor performance of \schedutil policy must lie less in frequency swithching overhead than esewhere: It makes bad CPU speed choices.
The biggest key to improvement lies in making better ones.
\fixme{a bit weak...}
% TODO: Add graph and / or discussion
%... we observe that there is a small amount of energy overhead to frequent speed changes.
\begin{figure}
\begin{figure*}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_u_fixedlen_multicore.pdf}
\bfcaption{Energy consumed for a given compute per CPU, for different policies}
%
\begin{subfigure}{0.45\textwidth}
\centering
%\includegraphics
\bfcaption{Little CPUs}
\end{subfigure}
\begin{subfigure}{0.45\textwidth}
\centering
%\includegraphics
\bfcaption{Big CPUs}
\end{subfigure}
%
\bfcaption{Best energy usage comes from running the CPU at a midspeed, and avoiding low and high speeds}
\label{fig:u_micro}
\end{figure}
\end{figure*}
%Test sleeping background energy.
\begin{figure}
\begin{figure*}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_time_per_freq_fb.pdf}
\bfcaption{Time spent by CPU clusters at different speeds, for 30s Facebook interaction}
%
\begin{subfigure}{0.45\textwidth}
\centering
%\includegraphics
\bfcaption{Little CPUs}
\end{subfigure}
\begin{subfigure}{0.45\textwidth}
\centering
%\includegraphics
\bfcaption{Big CPUs}
\end{subfigure}
\bfcaption{Apps spend significant non-idle time at energy inefficient speeds, both too slow (underperforming) and too fast (overperforming)}
\label{fig:time_per_freq_fb}
\end{figure}
\end{figure*}
\tinysection{Speeds over \fenergy provide negligble perceptible benefits}
@ -167,13 +202,20 @@ Figure \ref{fig:screendrops_per_freq_fb} illustrates this.
We ran a scripted Facebook app interaction, scrolling for :30 through friends and then feed under different CPU policies: with the default \schedutil and with differing fixed speeds.
The output of phones is largely visual display maitenance; CPU policies should avoid damaging display quality.
Hence, for each run, we tracked the affect on display output as measured in proportion of framedrops (Android platform "jank").
\XXXnote{Can add in graph showing invariant CPU use versus speed. CPUs never saturated. Make sense?}
\todo{Question}
\XXXnote{Need to justify the 70 figure here *being* fenergy -- first use; derails discussion}
\todo{Question}
%We first make the observation that, for our test device, a fixed speed of 70\% lies near the energy minimums of the runs in Figure \ref{fig:u_micro}.
Speeds \fenergy (70\%) and above all produce drop rates of $\sim$2\% and below, lower than that of the default dynamic policy ($\sim$3\%).
However, increasing the speed above \fenergy does not improve the framedrop rate significantly.
As we will show in section \ref{subsec:regimes_energy_perf}, the default policy nonetheless runs workloads for significant periods at these higher speeds -- despite their offering negligible benefit.
\tinysection{Dynamic governors can cost energy}
\XXXnote{To reframe wrt f0 < fpow -- we introduce this idea in next sections 3.1 and 3.2; derails?}
\todo{Question}
Previous works \cite{vogeleer2013energy, nuessle2019benchmarking} have suggested that, for a given workload, there is an energy optimal speed.
Figure \ref{fig:u_micro} shows the cost in energy and runtime of running a fixed amount of compute per CPU, under different CPU policies: for the default CPU policy and for several fixed speed settings.
We also vary the number of loads from 1-4, with each pinned to a separate CPU within a CPU cluster, and whether the loads are run on either big or little CPUs.

View file

@ -3,7 +3,7 @@
\begin{figure}
\centering
\includegraphics[width=.95\linewidth]{figures/graph_missed_opportunities.pdf}
\bfcaption{How phone governors hurt both energy and performance}
\bfcaption{Phone governors hurt both energy and performance by running the CPU at wateful speeds and taking time to ramp up}
\label{fig:missed_opportunities}
\end{figure}
@ -16,7 +16,7 @@ Launch screen on; idle & 130 \\
1 CPU saturated; screen off & 310 \\
2 CPUs saturated; screen off & 560 \\
\end{tabular}
\bfcaption{Energy use by item \fixme{REFORMAT NO V-RULES}}
\bfcaption{CPU usage dominates energy consumption}
\label{fig:item_energy_cost}
\end{figure}
@ -27,11 +27,18 @@ Launch screen on; idle & 130 \\
\label{fig:showcase}
\end{figure}
%\XXXnote{Mention original goals? (provide compute when needed; remove when not) Spells out what idle makes redundant}
%\todo{Question}
CPUs form the computation heart of computing systems including phones.
They also consume considreable energy.
Phones must balance providing computation resources when needed and reducing resources, to save energy, when not.
Historically, systems have addressed these two competing goals by employing frequency scaling to change the speed at which the CPU runs.
They set the CPU speed higher when there is pending computation, optimizing performance at the expenese of energy.
They set speed lower when computation needs decline or stop, to save energy.
On modern systems, CPUs typically consist of multiple cores, often of different types, that run at different speeds (known as P-states) or can be turned on and off into idle (known as C-states).
The software policies that control what CPU cores run when and at what performance level must balance competing system design goals, particularly optimizing for energy versus for performance.
%\subsection{Phone CPU management is energy critical}
Phones, as embedded devices, must be particularly cognizant of energy.
A major power consumer on phones is the CPU cores.
Table \ref{fig:item_energy_cost} shows the energy consumed by the system for a fixed time when run under different conditions.
@ -50,9 +57,12 @@ Hardware design on phones can constrain governor policy calculations.
CPU speeds often cannot be set on individual cores but only on groups of CPUs -- a constraint stemming from the assymetric big-little CPU architecture, with 2 clusters of higher- and lower-performance CPU cores.\cite{big-little}
\subsection{The problem with CPU governors on phones}
\subsection{The problem with CPU governors on phones \fixme{OR A simpler governor}}
The default \schedutil governor policy, despite the considerable sophistication involved in its implementation, is based on a flawed premise: That past utilization is a meaningful signal of the optimal CPU speed.
\XXXnote{First 3 of 4 para's in this subsection focus on current problems}
\todo{Question}
The default governor policy, despite the considerable sophistication involved in its implementation, is based on a flawed premise: That past utilization is a meaningful signal of the optimal CPU speed.
As we will show in this paper, this premise is based on a set of assumptions that are not applicable to modern mobile devices.
%% SHOW THIS...
@ -64,14 +74,17 @@ As we will show in this paper, this premise is based on a set of assumptions tha
To understand this, we present additional claims that we will later substantiate:
\begin{enumerate*}
\item For a device and a workload, there exists a CPU frequency that minimizes energy usage. We denote this \fenergy.
\item A CPU frequency below \fenergy wastes always wastes energy, except in very specific corner cases.
\item A CPU frequency below \fenergy always wastes energy, except in very specific corner cases.
%thermal throttling or memory stalling
\item A CPU speed above \fenergy reduces useful latency in specific, identifiable situations, but generally consumes energy for negligble benefit.
\item A CPU speed above \fenergy reduces useful latency in specific, identifiable situations, but in most other cases consumes energy for negligble benefit.
\item User apps, given additional CPU resources, will use them to negligible benefit.
\end{enumerate*}
\fixme{SHOW 4 in Eval}
%\fixme{SHOW 4 in Eval}
\XXXnote{\#4 justification: Figure 7 good enough?}
\ref{fig:screendrops_per_freq_fb}
\todo{Question}
Figure \ref{fig:missed_opportunities} illustrates these problems in practice.
Figure \ref{fig:missed_opportunities} illustrates the core of these problems in practice.
We ran a short $\sim$.5s CPU-bound load on a previously idle phone using default settings, and tracked the effect on CPU speed (with 0 representing idle).
The default governor begins to notice the load, and begins ramping speed until it hits 100\% maximum.
The solid blue line represents actual CPU speed, with 0 representing idle.
@ -84,8 +97,9 @@ Once the CPU speed rises above this speed, the system is now trading off between
As we discuss later, for most usages on phones, this added performance is unnecessary --
This is depicted on the graph by the upper grey trapezoid.
In this paper, we present our system, \systemname, which runs tasks at speeds that save energy compared to the system default -- speeds that, in practice, also prove sufficiently performant to maintain user experience.
This avoids both overy slow speeds, which would not only hurt latency but also cost energy by increasing runtime, and also overly fast settings, which cost energy but yield little to no end benefit.
In this paper, we present our governor, \systemname, which adopts a simpler heuristic based on common usage needs.
% which runs tasks at speeds that save energy compared to the system default -- speeds that, in practice, also prove sufficiently performant to maintain user experience.
It avoids the twin pitfalls of both overy slow speeds, which would not only hurt latency but also cost energy by increasing runtime, and also overly fast settings, which cost energy but yield little to no end benefit.
\systemname leverages information from userspace, sometimes already furnished by the Android platform, to identify those common use cases that do warrant additional speed.
\fixme{implement}

View file

@ -3,7 +3,7 @@
\begin{figure}
\centering
\includegraphics[width=.95\linewidth]{figures/optimize_goal_cpu_speed.pdf}
\bfcaption{How CPU speed should flow from the current CPU goal \fixme{TODO: SHRINK FIGURE}}
\bfcaption{How CPU speed should flow from the current CPU goal}
\label{fig:optimize_goal_cpu_speed}
\end{figure}