paper-KeepItSimple/sections/notes.tex

% -*- root: ../main.tex -*-

\fixme{EVAL TODO:}
\begin{itemize}
\item[0]{NEW:  2+ real-world apps (Youtube+)}
\item[1]{Figure \ref{fig:u_micro_fb} with \systemname}
\item[2]{NEW:  Per-core stacked idle\% graph}
\item[3]{Figure \ref{fig:idlejank} with \systemname policy and with energy numbers for all policies}
\item[4]{Figure \ref{fig:idlejank} with download background load}
\item[5]{NEW:  latency-energy graph for CPU-bound tasks (coldstart, install)}
\item[6]{NEW:  memory-bound microbench graph}
\item[7]{NEW:  study game app:  how much does it saturate CPU?}
\item[8]{NEW:  noninteractive period energy study}
\item[9]{NEW:  sound app -- i.e. no screen but interactive}
\item[10]{NEW:  microbenchmarks 1-3 for source of overhead:  HW or SW}
\end{itemize}

%CYCLE COUNT:  Show that the work done is, approximately, the same

Implementation Steps:
\begin{itemize}
\item[1]{Implement syscall interface}
\item[2]{Implement task state additions}
\item[3]{Implement CPU state additions}
\item[4]{Implement task-CPU message passing}
\item[5]{Mod platform library to supply hints}
\end{itemize}

State items:
\begin{itemize}
\item{taint flag to track when a hint is available for a task => CPU}
\item{amount of anticipated compute until block (only needed first time)}
\item{type of block (likely can be inferred)}
\item{amount of anticipated compute post-block}
\item{need-by time (interactivity flag)}
\end{itemize}

\tinysection{RECENT CONFIRMATIONS / RESULTS}

TODO.  \systemname
For now, our policy is simple:  Implement the speed from the hint on all 8 cores (both 4-clusters).
We will likely eventually want to apply different policies to different clusters, based on (1) how many tasks the app spawns and (2) if there are hint conflicts.


\tinysection{ADDITIONAL TODO}


--Confirm behavior for other apps (interactive, static, animated)

--Resolve conflicts among hints
	Example:  We want to run a compute-bound task in the background.  But the system blocks on another compute-bound foreground task (app launch)

--Take an additional parameter:  System energy (battery) level
	If low, consider setting 100\% to 70\% settings
	N.b. not really a from-userspace hint

--For I/O blocking hint (SQLite tri-sync):
	Need to quantify the block and duration proportions for which this works
	Would a stairstep speed be still better?

--Unresolved issues:
	(micro) What is the speed switching cost (quantify in both energy and time) (hardware)
	(macro) What is the kernel doing in inefficient runs?  Spinning?
	examples:  PocketData loads; FB user interaction (guessing governor complexity and inefficiency -- best to keep a simple speed; Kiss)

===

New experiments TODO (9-5-22):

Run phone at idle for 5 minutes -- at varying wake states, with screen on and off -- verify that fixed speed is indeed best there, too.
I.e. that the idle governor is shutting down cores most of the time, and that *not* ramping up from 0 constantly for the ongoing system background tasks (system\_server process) is best.

Run memory-bound workloads on devices (Pixel 2 and Nexus 6 and others)

Try to replicate issues with scrolly list


\tinysection{Assorted thoughts -- relocate as needed}

(see cse501 ppt ideas, and meeting notes late August / early September

General observation:

General observation (already observed in previous papers):  There is an energy U-minimum for doing a fixed amount of work.
All other things being equal, set CPU to that speed.
Obviously, this is not optimal for performance -- 100\% is always best there.

POLICY:
Generally:  Prioritize minimizing energy.
Comment:  Keep an eye on cost (frame drops).
May need to tweak speed.
Exception:  If user is waiting on a compute-bound workload (bootup, app cold start...), set to 100\%.
Exception:  memory bound workloads, where the CPU is necessarily running but not doing anything useful (it is stalling on memory), set lower.

Side observation:  Sometimes, the system is not doing (much?) useful work.
Example 1:  FaceBook -- an extra dropped frame for notably less work -- no great loss
Example 2:  The schedutil governor itself:  it triggers frequent frequency scaling, with attendant overhead, both in terms of computation and in terms of CPU stalling to adjust to new frequencies

General observation:  Use userspace for additional useful information

General observation:  In interactive mode, this is a type of RT system -- need to hit periodic deadlines, viz., frame refreshes.
Not critical, but desirable.

KISS policy:  Per-phone, not per CPU.
Linux scheduler already distributes tasks amongs CPUs \XXXnote{Verify more}
Plus, on phones, can only set speed per-cluster, not per-core.
Upshot:  Set policy for all 8 cores.
Side benefit:  Simpler...