184 lines
13 KiB
TeX
184 lines
13 KiB
TeX
% -*- root: ../main.tex -*-
|
|
%!TEX root=../main.tex
|
|
|
|
\begin{figure*}
|
|
\centering
|
|
\includegraphics[width=.87\linewidth]{figures/graph_jank_allapps.pdf}
|
|
\bfcaption{Display framedrop for apps under different CPU policies (10 runs, 90\% confidence)}
|
|
\label{fig:jank_allapps}
|
|
\end{figure*}
|
|
|
|
\begin{figure*}
|
|
\centering
|
|
\includegraphics[width=.87\linewidth]{figures/graph_energy_allapps.pdf}
|
|
\bfcaption{Energy usage for apps under different CPU policies (10 runs, 90\% confidence)}
|
|
\label{fig:energy_allapps}
|
|
\end{figure*}
|
|
|
|
\begin{figure*}
|
|
\centering
|
|
\includegraphics[width=.87\linewidth]{figures/graph_time_per_freq_yt.pdf}
|
|
\bfcaption{Average time spent per CPU under the default policy for Youtube (Average of 10 runs, 90\% confidence)}
|
|
\label{fig:time_per_freq_yt}
|
|
\end{figure*}
|
|
|
|
\begin{figure*}
|
|
\centering
|
|
\includegraphics[width=.87\linewidth]{figures/graph_time_per_freq_spot.pdf}
|
|
\bfcaption{Average time spent per CPU under the default policy for Spotify (Average of 10 runs, 90\% confidence)}
|
|
\label{fig:time_per_freq_spot}
|
|
\end{figure*}
|
|
|
|
\begin{figure*}
|
|
\centering
|
|
\includegraphics[width=.87\linewidth]{figures/graph_nonidletime_yt.pdf}
|
|
\bfcaption{CPU non-idle time for Youtube under different CPU policies (10 runs, 90\% confidence)}
|
|
\label{fig:nonidle_yt}
|
|
\end{figure*}
|
|
|
|
\begin{figure*}
|
|
\centering
|
|
\includegraphics[width=.87\linewidth]{figures/graph_nonidletime_spot.pdf}
|
|
\bfcaption{CPU non-idle time for Spotify under different CPU policies (10 runs, 90\% confidence)}
|
|
\label{fig:nonidle_spot}
|
|
\end{figure*}
|
|
|
|
\begin{figure*}
|
|
\centering
|
|
\includegraphics[width=.75\linewidth]{figures/graph_idlejank_heavyload.pdf}
|
|
\bfcaption{The effect of additional background loads on user experience for given CPU policies}
|
|
\label{fig:idlejank}
|
|
\end{figure*}
|
|
|
|
|
|
We now evaluate the \systemname and truncated \schedutil governors. %, by comparing their performance on a range of representative workloads the default Android \schedutil governor.
|
|
Concretely, we evaluate the claims that on normal workloads:
|
|
(i) truncated \schedutil achieves better performance than regular \schedutil without significantly increasing energy consumption, and
|
|
(ii) \systemname achieves better energy consumption than \schedutil, without significantly increasing screen jank.
|
|
We further conduct several experiments to confirm our observations from \Cref{sec:wasted}, namely that:
|
|
(iii) the adaptive app pattern is not unique to facebook,
|
|
(iv) apps spend significant time below $\fenergy$.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\paragraph{Evaluation platform}
|
|
|
|
Our results were obtained using stock Google Pixel 2 devices running Android AOSP 10 with 4 GB RAM and 128 GB SSD storage and the Snapdragon 835 chipset~\cite{snapdragon-835}.
|
|
Standalone microbenchmarks were implemented in C, while end-to-end macrobenchmarks were performed using the Android UI Automator testing framework to perform scripted, simulated interactions with real-world apps~\cite{uiautomator}.
|
|
One of the phones was modified to obtain energy measurements using the Monsoon HVPM power meter~\cite{monsoon}.
|
|
Our evaluation system consists of a pair of shell scripts running on the phone and an external monitor, respectively.
|
|
The external script sleeps for 10s to ensure quiescence and prevent inter-trial artifacts, and initializes both the Monsoon meter and the on-phone script.
|
|
The on-phone script sleeps for 20s to ensure that the Monsoon meter is capturing data, sets the desired governor policy, and starts the experiment.
|
|
%When the experiment concludes, the on-phone script sleeps for a further 10s to ensure that the Monsoon meter captures the full trace, and notifies the external script that the experiment has concluded.
|
|
%The external script concludes by retrieving relevant artifacts from the phone, excluding data transfer from any energy or performance measurements.
|
|
We collected information on CPU speed and idlestate from both the Linux \texttt{ftrace} framework and from \texttt{sysfs}, and on CPU cycles from the \texttt{perf\_event\_open} syscall~\cite{perf-event}.
|
|
We also used \texttt{ftrace} to log testing parameter and state.
|
|
Information on screen performance including framedrops came from the Android \texttt{dumpsys gfxinfo} service.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\paragraph{CPU Policies}
|
|
We evaluate six different CPU policies:
|
|
(i) the system default, \schedutil,
|
|
(ii) a truncated \schedutil implemented by lower-bounding the CPU to 70\% using the existing API discussed in section \ref{subsec:signal_perf_needs},
|
|
(iii) a fixed 70\% speed using the existing \texttt{userspace} governor,
|
|
(iv) \systemname with speeds lower bounded at 70\%,
|
|
(v) unmodified \systemname with default speed of fixed 70\%, and
|
|
(vi) the \texttt{performance} governor.
|
|
We include (ii) and (iii) to compare the general performance of the truncated \schedutil and a common-case $\sim$70\% speed policies when implemented under the existing API with the equivalents implemented using \systemname.
|
|
Under default Linux, a specific CPU speed requested gets implemented as the next-highest speed in a preset series of supported speeds in \texttt{scaling\_available\_frequencies} in \texttt{sysfs}.
|
|
We follow this behavior with our system.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\paragraph{Workloads}
|
|
We consider four separate workloads, the first 3 involving individual apps: (i) Facebook, (ii) YouTube, and (iii) Spotify.
|
|
The fourth (iv) workload combines the Facebook and Spotify loads.
|
|
%These were designed to mimic common user phone interactions.
|
|
The \textbf{Facebook} workload was described in \Cref{sec:low-speed-in-practice}.
|
|
The \textbf{YouTube} workload starts the app, and searches a popular video by its name.
|
|
The app selects the first hit, starts the video, and waits for 30 seconds.
|
|
The specific video was selected to get a predictable high rate of being served random motion video ads at the start.
|
|
The \textbf{Spotify} workload starts the app searches for a common musical selection.
|
|
It starts the first suggestion and waits for 30 seconds while the audio plays with the app in the foreground.
|
|
Lastly, the \textbf{Combined} workload examines the system under commonplace additional stress.
|
|
It runs the original Facebook workload in the foreground while the Spotify app streams audio continuously in the background.
|
|
These workloads address evaluation claim (iii).
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Screen Jank}
|
|
|
|
\Cref{fig:jank_allapps} shows frame drop rates.
|
|
These graphs address the performance aspect of claims (i) and (ii).
|
|
On all workloads, \systemname and truncated \schedutil offer nearly identical or notably better performance than regular \schedutil.
|
|
The Facebook load under \systemname costs an additional .3\%, or $\sim$.2 frames per second (12.6 frames per minute) at a 60fps display rate.
|
|
We argue this does not noticeably affect user experience and is more than acceptable given the greater than 10\% energy savings.
|
|
The results of the truncated \schedutil policies and of fixedspeed 70\% similarly offer significant energy savings at small to zero cost.
|
|
|
|
|
|
Youtube shows a clear performance win for \systemname, producing 5.2\% fewer screendrops than with the default.
|
|
The truncated \schedutil policy under \systemname and the fixed speed 70\% policy also offer notably improved sreendrop rates, with 4.3\% and 3.6\% lower drop rates respectively.
|
|
UI performance under \systemname for both the Spotify and the Combined workloads, like that for Facebook, costs .3\% fps compared to the default -- a cost we again argue is both very minimal and acceptable.
|
|
The other non-default policies for both Spotify and Combined also offer either essentially the same or even somewhat better performance than the default: Truncated \schedutil and fixed 70\% under the existing API for Spotify both offer a $\sim$2.5\% lower framedrop rate.
|
|
%Finally, we observe that even with the increased background load of the Combined workload,
|
|
In summary: \systemname, with a considerably simpler policy mechanism, offers essentially the same performance, measured in user experience screendrops, to that of \systemname, in common app workloads.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Ramp-Up Times}
|
|
|
|
To attribute the improvement in performance, we measure the CPU frequencies selected by \schedutil.
|
|
\Cref{fig:time_per_freq_fb,fig:time_per_freq_yt,fig:time_per_freq_spot} plot a CDF of the difference between these two selections.
|
|
This addresses evaluation claim (iv) from above:
|
|
We note that for a significant fraction of the workload (5\% for Facebook, 15\% for Youtube, 12\% for Spotify), the frequency selected by \schedutil is significantly (up to 50\%) lower.
|
|
This is \schedutil's ramp-up period, where it selects frequencies lower than $\fenergy$.
|
|
We attribute the relative performance of \systemname to eliminating the ramp-up period where \systemname selects speeds below $\fenergy$.
|
|
Although each workload spends part of its time at a higher frequency in \schedutil compared to \systemname, it spends more time ramping up to $\fenergy$ than at a higher speed.
|
|
In summary, the improved performance of both truncated \schedutil and \systemname can be attributed to \schedutil's ramp-up period.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Jank Under High-Load}
|
|
|
|
%We next explore the level of additional load required to degrade the user experience.
|
|
For this experiment, we run the facebook workload in the presence of background tasks.
|
|
These background tasks generate additional background load by performing simple arithmetic with periodically injected sleeps at varying intervals.
|
|
%We collect non-idle time through sysfs and framedrop rate through Android GFX as before.
|
|
%We pin one load-producing task to each of the 8 CPU cores.
|
|
\Cref{fig:idlejank} illustrates the effect of the added CPU load on the measured jank.
|
|
The x-axis shows the average load across all 8 CPU cores (based on the injected sleeps), and the frame-drop rate is shown on the y-axis.
|
|
Note that a smaller sleep interval equates to a higher load.
|
|
|
|
|
|
The leftmost part of the graph, with the smallest circles (representing a normal interaction, with no additional background load) shows that a fixed speed of 70\% or greater produces a measured screen drop rate that is essentially idential with that of the system default.
|
|
Up to a sustained load of about 70\% across \emph{all} CPU cores, the system is able to keep up with screen redraw events, with a significant effect on jank only at the lowest 2 CPU frequencies.
|
|
In actual usage, a user would likely never encounter this level of background usage; it takes significant, and unrealistic, additional workload to degrade the user experience.
|
|
|
|
A more representative evaluation case of high loads is that offered by our fourth Combined workload: Browsing through Facebook while listening to Spotify music in the background.
|
|
As we discuss above, \Cref{fig:jank_allapps} shows the cost of the additional background load is quite small in terms of frame drops; 2 of the non-default policies offer improvements.
|
|
In common settings, background load does not pose a threat to the performance of \systemname.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Energy Usage}
|
|
|
|
\Cref{fig:energy_allapps} shows energy usage for the four workloads, addressing the last aspect of evaluation claims (i) and (ii).
|
|
The Facebook workload under \systemname consumes significantly less (11.5\%) energy compared to the default.
|
|
Indeed, all of the non-default policies except \texttt{performance} also best \schedutil.
|
|
|
|
Youtube under \systemname also saves energy, albeit less at a 1.6\% savings versus default.
|
|
Spotify actually costs 2.3\% more.
|
|
Note that this is Spotify running interactively.
|
|
The use case of Spotify in the Combined workload, where it is running in the background, is likely much more dominant in actual real world usage.
|
|
The energy consumed by the Combined workload, unsurprisingly, is significantly higher across the board than that of the individual app loads.
|
|
Here, \systemname uses 5.6\% less energy than the default.
|
|
Once again, all of the non-default policies save \texttt{performance} do too.
|
|
Common apps under common usage cases show \systemname offers notable energy savings compared to the default.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Idle Time}
|
|
|
|
We next review our findings from \Cref{sec:adaptiveApps}, that typical apps increase their offered load as CPU capacity increases.
|
|
\Cref{fig:nonidle_fb,fig:nonidle_yt,fig:nonidle_spot} illustrate the time fraction the CPU spends doing work in each workload as CPU frequency increases.
|
|
Recall that, assuming the amount of work stays constant in a fixed-duration workload, the time spent non-idle would show an inverse-linear relationship with the CPU frequency.
|
|
As with Facebook, both Youtube and Spotify shows a much flatter relationship, particularly on the big cores.
|
|
|
|
|