added eval section

This commit is contained in:
carlnues@buffalo.edu 2022-10-17 13:07:13 -04:00
parent 120b4c619d
commit 755e1e7c01

115
sections/evaluation.tex Normal file
View file

@ -0,0 +1,115 @@
% -*- root: ../main.tex -*-
\begin{figure}
\centering
\includegraphics[width=.45\textwidth]{figures/graph_u.png} %test123.pdf}
\bfcaption{Energy consumed for a given compute at different speeds}
\label{fig:u_micro}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=.45\textwidth]{figures/graph_u_fb.png}
\bfcaption{Energy consumed for a fixed set of interations, given compute at different speeds}
\label{fig:u_micro_fb}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=.45\textwidth]{figures/graph_drops.png}
\bfcaption{Screen drops for a given interactive workload, run with different CPU policies}
\label{fig:drops}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=.45\textwidth]{figures/graph_idlejank.png}
\bfcaption{The relation between available CPU resources and user experience, for given CPU policies}
\label{fig:idlejank}
\end{figure}
We evaluate \systemname by comparing performance of illustrative and representative workloads on our system.
We comapare against similar results obtained using default system settings, as well as with other CPU speed settings.
\tinysection{Evaluation platform}
Our results were obtained using the Google Pixel 2 device running Android OS WHAT with \fixme{specify} CPU and RAM.
One of the phones was modified to obtain energy measurements using the Monsoon HVPM power meter.\cite{monsoon}
We modded the Linux \fixme{VERSION} kernel with \fixme{governor implementation synopsis} to the kernel to act on a hint about pending system usage passed as a syscall from userspace.
The kernel can then use this information at its discretion to set an appropriate CPU speed.
Our evaluation system consists of parts to \fixme{Discuss evaluation setup -- scripts, UIAutomator, ftrace etc.}
\tinysection{There is an energy-optimal speed}
Previous works \cite{nuessle2019benchmarking, what} have suggested that, for a given workload, there is an energy optimal speed.
This speed falls at some point between the CPU's minimum and maximum settings.
Figure \ref{fig:u_micro} shows the results of a fixed amount of compute (\todo{discuss setup}) under different CPU policies.
Particularly, the system default policy consumes notably more energy than a mid-speed setting.
Etc. etc.
Next, we studied real-world apps under different CPU policies.
The big question is whether our previous observation -- that there is an energy-optimal speed -- still holds.
We run scripts to simulate typical user interactions on the Facebook app under different CPU policies: the system default, various fixed speeds, and under
\systemname
Figure \ref{fig:u_micro_fb} shows results obtained.
As before, a mid-speed CPU policy proves better than the system default policy.
Additionally, \systemname also offers better energy performance.\todo{SHOW THIS}
\tinysection{The cost of optimal speeds in interactive apps}
While a simpler fixed speed policy yields optimal energy, this potentially comes at a cost.
The output of phone apps is largely a visual display.
Previous studies that have constrained system resources available for interactive apps have evaluated their cost on the basis of screen display metrics, in particular what Android terms screen jank -- the proportion of dropped frames. \cite{THIS, THAT}
As apps are closed source, we are unable to control the exact amount of compute.
However, apps spend the vast bulk of their time waiting for user input. \cite{ANY??}
While there are background tasks going on, they typically come nowhere to saturating CPU resources.\todo{SHOW: per-core CPU idle\% graph}
Hence, adjusting CPU speed within reason does not appreciably affect user experience.
Figure \ref{fig:drops} shows the cost, energy and in screen drops, for a given CPU policy.
Policies \fixme{WHAT} show \fixme{WHAT} savings in energy over the system default.
For these, even fixed speeds above 40\% produce drop rates (jank) of within 50\% of the system default.
In practice, this averages to approximately 1 extra dropped frame per second, a figure we argue is acceptable.\todo{CITE OTHER STUDIES}
The proportion of drops is typically better for a fixed speed over 70\% than for the system default policy.
\systemname likewise shows equivalent user experience -- while saving \fixme{WHAT} on energy.\todo{SHOW: with 1 our governor and 2 energy metrics}
The key observation is that, at the CPU usage level imposed by apps, there are still plenty of unused resources to ensure quality user experience with \systemname.
The question then arises -- at what level of resource constraint does experience begin to suffer?
To answer this, Figure \ref{fig:idlejank} shows the \facebook experiment with additional background tasks that consume CPU cycles.
Results show the CPU idle time and screen jank rate for each of the CPU policies and for each of several levels of background work.
The background tasks are do-nothing loads, run on each of the 8 CPU cores throughout the duration of the interactive experiment.
We can achieve acceptable cost -- a jank rate below the blue bar on the graph -- with CPU speeds above 40\% \fixme{verify} and background load rates of 20ms sleep intervals and above.
\systemname falls in this category.
\todo{SHOW: with download app in background}
To characterize the amount of background work this represents, table \fixme{SHOW: DO THIS} shows the proportion of CPU usage these loads consume when run by themselves.\todo{Good idea? I think this helps}
In actual usage, a user would likely never encounter this level of background usage.
The CPU usage imposed by downloading a large file consumes approximately 50\% of a single core -- far below the microloads we imposed.\todo{SHOW: with download background task}
\tinysection{When energy-optimal is not optimal}
An energy-optimal policy is not always best.
In particular, when the user is waiting and computation is the major bottleneck, in most cases the system should prioritize latency.
Figure \fixme{WHAT} shows the latencies of 2 common such situations, app installation and coldstart, for different CPU policies.
\systemname identifies and outperforms the default in both latency and energy.\todo{SHOW this}
\fixme{GRAPHS TODO:}
\begin{itemize}
\item[1]{Figure \ref{fig:u_micro_fb} with \systemname}
\item[2]{NEW: Per-core stacked idle\% graph}
\item[3]{Figure \ref{fig:drops} with \systemname policy and with energy numbers for all policies}
\item[4]{Figure \ref{fig:idlejank} with \systemname policy and with download background load}
\item[5]{NEW: latency-energy graph for CPU-bound tasks (coldstart, install)}
\end{itemize}
%CYCLE COUNT: Show that the work done is, approximately, the same