Replacing Microkernels

This commit is contained in:
Oliver Kennedy 2022-03-31 00:57:07 -04:00
parent a2b40f0614
commit f5d6790f6b
Signed by: okennedy
GPG key ID: 3E5F9B3ABD3FDB60
3 changed files with 9 additions and 10 deletions

View file

@ -108,7 +108,7 @@
\label{sec:approx-prov}
\input{sections/approx-prov}
\section{Task Isolation}
\section{Isolated Cell Execution}
\label{sec:isolation}
\input{sections/isolation}

View file

@ -18,10 +18,10 @@ We then begin to explore the challenges of working with provenance metadata that
As a case study for exploring incremental provenance, we adapt a workflow-style parallelism-capable scheduling engine for use with Jupyter notebooks.
In addition to adapting the scheduler itself, parallel execution requires a fundamental shift away from execution in a single monolithic kernel, and towards lighter-weight per-cell interpreters.
As a basis for our implementation, we leverage Vizier~\cite{brachmann:2019:sigmod:data, brachmann:2020:cidr:your}, a microkernel notebook.
As a basis for our implementation, we leverage Vizier~\cite{brachmann:2019:sigmod:data, brachmann:2020:cidr:your}, an isolated cell execution (ICE) notebook.
In lieu of an opaque, monolithic kernel, cells run in isolated interpreters and communicate only through dataflow (by reading and writing data artifacts).
This paper outlines the challenges of extending Vizier to support parallel execution through incremental provenance.
We then show generality by discussing the process for importing Jupyter notebooks into Vizier, including the use of static analysis to extract approximate provenance information.
We then show generality by discussing the process for importing Jupyter notebooks into an ICE, including the use of static analysis to extract approximate provenance information.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}
@ -38,8 +38,7 @@ We constructed a dataflow graph as described in \Cref{sec:import}; as a proxy me
\Cref{fig:parallelismSurvey} presents the depth --- the maximum number of cells that must be executed serially --- in relation to the total number of python cells in the notebook.
The average notebook has over 16 cells, but an average dependency depth of just under 4; an average of around 4 cells able to run concurrently.
We outline the key ideas of our central contributions incremental provenance in \Cref{sec:approx-prov}, and review Vizier's microkernel architecture in \Cref{sec:isolation}.
We outline the key ideas of our central contributions incremental provenance in \Cref{sec:approx-prov}, and review Vizier's ICE architecture in \Cref{sec:isolation}.
The remainder of the paper focuses on our remaining contributions:
% \begin{itemize}
% \item
@ -47,7 +46,7 @@ The remainder of the paper focuses on our remaining contributions:
% \item
(i) \textbf{An Incremental Provenance Scheduler}. In \Cref{sec:scheduler}, we present a scheduler for incremental notebook execution; We identify the challenges that arise due to provenance mispredictions, and discuss how to compensate for them.
% \item
(ii) \textbf{Jupyter Import}: \Cref{sec:import} discusses how we extract approximate provenance from python code, and how existing notebooks written for Jupyter (or comparable monolithic kernel architectures) can be translated to microkernel notebook architectures like Vizier.
(ii) \textbf{Jupyter Import}: \Cref{sec:import} discusses how we extract approximate provenance from python code, and how existing notebooks written for Jupyter (or comparable monolithic kernel architectures) can be translated to ICE notebook architectures like Vizier.
% \item
(iii) \textbf{Implementation in Vizier and Experiments}: We have implemented a preliminary prototype of the proposed scheduler in Vizier. \Cref{sec:experiments} presents our initial experiments with parallel evaluation of Jupyter notebooks using this implementation.
% \end{itemize}

View file

@ -1,8 +1,8 @@
%!TEX root=../main.tex
Recall that in a classical notebook, a cell is run by evaluating its code in the kernel, a single running python interpreter.
To facilitate parallel execution, as well as incremental updates, \systemname isolates cells by executing each in a fresh kernel.
We note that isolation is incompatible, at least directly, with classical computational notebooks:
To facilitate parallel execution, as well as incremental updates, an isolated cell execution notebook (ICE) isolates cells by executing each in a fresh kernel.
We note that ICE is incompatible, at least directly, with classical computational notebooks:
(i) Cells normally communicate through kernel state, precluding a cell executing in one kernel from accessing variables created in another;
(ii) Variables generated by one kernel may need to be accessed after the kernel has exited;
(iii) Isolation comes at an impractically high performance cost.
@ -11,7 +11,7 @@ We note that isolation is incompatible, at least directly, with classical comput
First, the runtime must be able to reconstruct the global interpreter state, or at least the necessary subset of it needed to run the cell.
We start with a simplified model where inter-cell communication is explicit --- we discuss converting Jupyter notebooks into this model in \Cref{sec:import}.
Concretely, for a variable defined in one cell (the writer) to be used in a subsequent cell (the reader): (i) the writer must explicitly export the variable into the global state, and (ii) the reader must explicitly import the variable from the global state.
\systemname provides setter and getter functions (respectively) on a global state variable for this purpose.
ICE notebooks like Vizier provide setter and getter functions (respectively) on a global state variable for this purpose.
\tinysection{State Serialization}
When a state variable is exported, it is serialized by the python interpreter.
@ -20,7 +20,7 @@ The artifact is delivered to a central monitor process, and assigned to a name i
When a cell imports a symbol from the global state, it contacts the monitor to retrieve the artifact associated with the symbol.
The runtime deserializes the artifact and places it into the kernel-local state.
By default, state is serialized with python's native \texttt{pickle} library, although \systemname can be easily extended with codecs for specialized types that are either unsupported by \texttt{pickle}, or for which it is not efficient:
By default, Vizier serializes state through python's native \texttt{pickle} library, although \systemname can be easily extended with codecs for specialized types that are either unsupported by \texttt{pickle}, or for which it is not efficient:
(i) Python code (e.g., import statements, and function or class definitions) is exported as raw python code and imported with \texttt{eval}.
(ii) Pandas dataframes are exported in parquet format, and are exposed to subsequent cells by the monitor process through Apache Arrow direct access.