paper-KeepItSimple/sections/notes.tex

111 lines
4.6 KiB
TeX

% -*- root: ../main.tex -*-
\fixme{EVAL TODO:}
\begin{itemize}
\item[0]{NEW: 2+ real-world apps (Youtube+)}
\item[1]{Figure \ref{fig:u_micro_fb} with \systemname}
\item[2]{NEW: Per-core stacked idle\% graph}
\item[3]{Figure \ref{fig:idlejank} with \systemname policy and with energy numbers for all policies}
\item[4]{Figure \ref{fig:idlejank} with download background load}
\item[5]{NEW: latency-energy graph for CPU-bound tasks (coldstart, install)}
\item[6]{NEW: memory-bound microbench graph}
\item[7]{NEW: study game app: how much does it saturate CPU?}
\item[8]{NEW: noninteractive period energy study}
\item[9]{NEW: sound app -- i.e. no screen but interactive}
\item[10]{NEW: microbenchmarks 1-3 for source of overhead: HW or SW}
\end{itemize}
%CYCLE COUNT: Show that the work done is, approximately, the same
Implementation Steps:
\begin{itemize}
\item[1]{Implement syscall interface}
\item[2]{Implement task state additions}
\item[3]{Implement CPU state additions}
\item[4]{Implement task-CPU message passing}
\item[5]{Mod platform library to supply hints}
\end{itemize}
State items:
\begin{itemize}
\item{taint flag to track when a hint is available for a task => CPU}
\item{amount of anticipated compute until block (only needed first time)}
\item{type of block (likely can be inferred)}
\item{amount of anticipated compute post-block}
\item{need-by time (interactivity flag)}
\end{itemize}
\tinysection{RECENT CONFIRMATIONS / RESULTS}
TODO. \systemname
For now, our policy is simple: Implement the speed from the hint on all 8 cores (both 4-clusters).
We will likely eventually want to apply different policies to different clusters, based on (1) how many tasks the app spawns and (2) if there are hint conflicts.
\tinysection{ADDITIONAL TODO}
--Confirm behavior for other apps (interactive, static, animated)
--Resolve conflicts among hints
Example: We want to run a compute-bound task in the background. But the system blocks on another compute-bound foreground task (app launch)
--Take an additional parameter: System energy (battery) level
If low, consider setting 100\% to 70\% settings
N.b. not really a from-userspace hint
--For I/O blocking hint (SQLite tri-sync):
Need to quantify the block and duration proportions for which this works
Would a stairstep speed be still better?
--Unresolved issues:
(micro) What is the speed switching cost (quantify in both energy and time) (hardware)
(macro) What is the kernel doing in inefficient runs? Spinning?
examples: PocketData loads; FB user interaction (guessing governor complexity and inefficiency -- best to keep a simple speed; Kiss)
===
New experiments TODO (9-5-22):
Run phone at idle for 5 minutes -- at varying wake states, with screen on and off -- verify that fixed speed is indeed best there, too.
I.e. that the idle governor is shutting down cores most of the time, and that *not* ramping up from 0 constantly for the ongoing system background tasks (system\_server process) is best.
Run memory-bound workloads on devices (Pixel 2 and Nexus 6 and others)
Try to replicate issues with scrolly list
\tinysection{Assorted thoughts -- relocate as needed}
(see cse501 ppt ideas, and meeting notes late August / early September
General observation:
General observation (already observed in previous papers): There is an energy U-minimum for doing a fixed amount of work.
All other things being equal, set CPU to that speed.
Obviously, this is not optimal for performance -- 100\% is always best there.
POLICY:
Generally: Prioritize minimizing energy.
Comment: Keep an eye on cost (frame drops).
May need to tweak speed.
Exception: If user is waiting on a compute-bound workload (bootup, app cold start...), set to 100\%.
Exception: memory bound workloads, where the CPU is necessarily running but not doing anything useful (it is stalling on memory), set lower.
Side observation: Sometimes, the system is not doing (much?) useful work.
Example 1: FaceBook -- an extra dropped frame for notably less work -- no great loss
Example 2: The schedutil governor itself: it triggers frequent frequency scaling, with attendant overhead, both in terms of computation and in terms of CPU stalling to adjust to new frequencies
General observation: Use userspace for additional useful information
General observation: In interactive mode, this is a type of RT system -- need to hit periodic deadlines, viz., frame refreshes.
Not critical, but desirable.
KISS policy: Per-phone, not per CPU.
Linux scheduler already distributes tasks amongs CPUs \XXXnote{Verify more}
Plus, on phones, can only set speed per-cluster, not per-core.
Upshot: Set policy for all 8 cores.
Side benefit: Simpler...