edits (primarily eval)

2023-08-25 21:16:26 -04:00 · 2023-08-25 21:16:26 -04:00 · 7133a48c6f
parent 62207ec3ed
commit 7133a48c6f
3 changed files with 44 additions and 19 deletions
--- a/sections/evaluation.tex
+++ b/sections/evaluation.tex
@ -53,13 +53,16 @@

 We now evaluate the \systemname and truncated \schedutil governors, by comparing their performance on a range of representative workloads the default Android \schedutil governor.
 Concretely, we evaluate the claims that on normal workloads:
-(i) truncated \schedutil achieves significantly better performance than regular \schedutil without significantly increasing energy consumption, and
-(ii) \systemname achieves significantly better energy consumption than \schedutil, without significantly increasing screen jank.
+(i) truncated \schedutil achieves better performance than regular \schedutil without significantly increasing energy consumption, and
+(ii) \systemname achieves better energy consumption than \schedutil, without significantly increasing screen jank.

 We further conduct several experiments to confirm our observations from \Cref{sec:wasted}, namely that:
 (iii) the adaptive app pattern is not unique to facebook, 
 (iv) apps spend significant time below $\fenergy$.

+\todo{FIXME}
+\fixme{Change clustergraph labels:  Kiss 70 => Kiss}
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \paragraph{Evaluation platform}

@ -78,31 +81,48 @@ We also used \texttt{ftrace} to log testing parameter and state.
 Information on screen performance including framedrops came from the Android \texttt{dumpsys gfxinfo} service.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\paragraph{CPU Policies}
+We evaluate six different CPU policies under different workloads:
+(i) the system default, \schedutil,
+(ii) a truncated \schedutil implemented by lower-bounding the CPU using the existing API discused in subsection \ref{subsec:signal_perf_needs},
+(iii) a fixed 70\% speed using the existing \texttt{userspace} governor,
+(iv) a truncated \schedutil implemented with \systemname,
+(v) unmodified \systemname, and
+(vi) the \texttt{performance} governor.
+We include (ii) and (iii) to compare the general performance of the truncated \schedutil and a general-case $\sim$70\% speed policies when implemented under the existing API with that when implemented using \systemname.
+Under default Linux, a specific CPU speed requested gets implemented as the next-highest speed in a preset series of supported speeds in texttt{scaling\_available\_frequencies} in texttt{sysfs}.
+We follow this behavior with our system.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \paragraph{Workloads}
-We consider three separate workloads: (i) Facebook, (ii) YouTube, and (iii) Spotify.
+We consider four separate workloads, the first 3 involving indiviual apps:  (i) Facebook, (ii) YouTube, and (iii) Spotify.
+The fourth (iv) workload combines the Facebook and Spotify loads.
+These were designed to mimic common user phone interactions.
 The \textbf{Facebook} workload was described in \Cref{sec:low-speed-in-practice}.
 The \textbf{YouTube} workload starts the app, and searches a popular video by its name.  
 The app selects the first hit, starts the video, and waits for 30 seconds.
 The specific video was selected to get a predictable high rate of being served random motion video ads at the start.
-The \textbf{Spotify} workload...
-\todo{Fill in details}.
+The \textbf{Spotify} workload starts the app searches for a common musical selection.
+It starts the first suggestion and waits for 30 seconds while the audio plays with the app in the foreground.
+Lastly, the \textbf{Combined} workload examines the system under additional stress.
+It runs the original Facebook workload in the foreground while the Spotify app streams audio continuously in the background.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Screen Jank}

-%\begin{figure}
-%\centering
-%\includegraphics[width=.95\linewidth]{figures/graph_jank_perspeed_yt.pdf}
-%\bfcaption{Display framedrop proportion for a :30 Youtube interaction under different CPU policies (10 runs, 90\% confidence)}
-%\label{fig:screendrops_per_freq_yt}
-%\end{figure}
+\Cref{fig:jank_allapps} show frame drop rates for the four workloads.
+These graphs address the performance aspect of claims (i) and (ii).
+On all workloads, \systemname and truncated \schedutil offer nearly identical or notably better performance than regular \schedutil.
+The Facebook load under \systemname costs an additional .3\%, or $\sim$.2 frames per second at 60fps.
+We argue this does not noticably affect user experience and is more than acceptable given the greater than 10\% energy savings.
+The results of the truncated \schedutil policies and of fixedspeed 70\% similar offer significant energy savings at small to zero cost.

-\Cref{fig:jank_allapps} show frame drop rates for the three workloads.
-\todo{discuss}
-
-These graphs confirm the performance aspect of claims (i) and (ii).
-On all workloads, \systemname and truncated \schedutil both outperform regular \schedutil.
-\systemname has a 10-25\% lower frame drop rate, varying by workload.
+Youtube shows a clear performance win for \systemname compared to the default.
+The truncated \schedutil policies and fixed speed 70\% policy also offer improved performance to the default.
+Performance under \systemname for both Spotify and the Combined workloads, like that for Facebook, costs .3\% fps compared to the default -- a cost we again argue is very minimal and acceptable.
+The other non-default policies for both Spotify and Combined also offer either essentially the same or even somewhat better performance than the default.
+Particularly, the increased background load of Combined does not change screendrop rate appreciably.
+In summary:  \systemname, with a considerably simpler policy mechanism, offers essentially the same performance, measured in user experience screendrops, to that of \systemname, in common app workloads.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Ramp-Up Times}
@ -134,6 +154,10 @@ The leftmost part of the graph, with the smallest circles (representing a normal
 Up to a sustained load of about 70\% across \emph{all} CPU cores, the system is able to keep up with screen redraw events, with a significant effect on jank only at the lowest 2 CPU frequencies.
 In actual usage, a user would likely never encounter this level of background usage; it takes significant, and unrealistic, additional workload to degrade the user experience.

+A more representative evaluation case of high loads is that offered by our fourth Combined workload:  Browsing through Facebook while listening to Spotify music in the background.
+As we discuss above, \Cref{fig:jank_allapps} shows the cost of the additional background load is quite small in terms of frame drops; 2 of the non-default policies offer improvements.
+In common settings, background load does not pose a threat to the performance of \systemname.
+

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Energy Usage}
--- a/sections/introduction.tex
+++ b/sections/introduction.tex
@ -22,7 +22,6 @@ Launch screen on; idle & 130 \\
 \end{figure}

 \todo{FIXME}
-\fixme{Ensure paper builds *without* draft mode for hyperlinks}

 CPUs consume considerable energy on mobile phones.
 As Table \ref{fig:item_energy_cost} shows, a single (big) CPU core on a Pixel 2, running at full speed with the screen off, consumes almost three times the energy of the display, and a second core running at full speed almost doubles that.
--- a/sections/wasted.tex
+++ b/sections/wasted.tex
@ -129,6 +129,7 @@ However, even at the CPU's maximum frequency, more work is created than the CPU
 \end{figure}

 \Cref{fig:u_micro_fb} shows power consumption for the Facebook workload, padded with idle time to a fixed 40s period.
+\todo{FIXME}
 Operating the CPU at maximum frequency imposes an energy overhead of approximately $1$mAh compared to operating at $\fenergy \approx 70\%$ of its maximum.
 This represents about $\frac{1}{2700}$ of the typical Pixel 2's maximum battery capacity.

@ -139,12 +140,13 @@ If added performance is desirable in this use case and others like it, then the
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 \subsection{Signaling Performance Needs}
+\label{subsec:signal_perf_needs}

 The more interesting systems design question is how to select CPU speeds in the presence of adaptive applications, when the additional energy investment does not provide value.
 Specifically, adaptive apps (while in-use, e.g., scrolling through a list) create a functionally infinite source of work.
 The CPU usage profiles presented by an adaptive app and a user legitimately waiting on a CPU-bound task (e.g., cold-start) are identical, rendering them indistinguishable to \schedutil.

-Fortunately, the Linux maintainers have already recognized the need for better user-space signalling of performance needs.
+Fortunately, the Linux maintainers have already recognized the need for better user-space signaling of performance needs.
 In 2015, the Linux kernel  added a virtual filesystem, mounted at \texttt{/dev/stune/} that provides virtual file hooks:
 \begin{itemize}
 \item[]{\texttt{schedtune.boost}}