Replacing Microkernels
This commit is contained in:
parent
a2b40f0614
commit
f5d6790f6b
2
main.tex
2
main.tex
|
@ -108,7 +108,7 @@
|
|||
\label{sec:approx-prov}
|
||||
\input{sections/approx-prov}
|
||||
|
||||
\section{Task Isolation}
|
||||
\section{Isolated Cell Execution}
|
||||
\label{sec:isolation}
|
||||
\input{sections/isolation}
|
||||
|
||||
|
|
|
@ -18,10 +18,10 @@ We then begin to explore the challenges of working with provenance metadata that
|
|||
|
||||
As a case study for exploring incremental provenance, we adapt a workflow-style parallelism-capable scheduling engine for use with Jupyter notebooks.
|
||||
In addition to adapting the scheduler itself, parallel execution requires a fundamental shift away from execution in a single monolithic kernel, and towards lighter-weight per-cell interpreters.
|
||||
As a basis for our implementation, we leverage Vizier~\cite{brachmann:2019:sigmod:data, brachmann:2020:cidr:your}, a microkernel notebook.
|
||||
As a basis for our implementation, we leverage Vizier~\cite{brachmann:2019:sigmod:data, brachmann:2020:cidr:your}, an isolated cell execution (ICE) notebook.
|
||||
In lieu of an opaque, monolithic kernel, cells run in isolated interpreters and communicate only through dataflow (by reading and writing data artifacts).
|
||||
This paper outlines the challenges of extending Vizier to support parallel execution through incremental provenance.
|
||||
We then show generality by discussing the process for importing Jupyter notebooks into Vizier, including the use of static analysis to extract approximate provenance information.
|
||||
We then show generality by discussing the process for importing Jupyter notebooks into an ICE, including the use of static analysis to extract approximate provenance information.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\begin{figure}
|
||||
|
@ -38,8 +38,7 @@ We constructed a dataflow graph as described in \Cref{sec:import}; as a proxy me
|
|||
\Cref{fig:parallelismSurvey} presents the depth --- the maximum number of cells that must be executed serially --- in relation to the total number of python cells in the notebook.
|
||||
The average notebook has over 16 cells, but an average dependency depth of just under 4; an average of around 4 cells able to run concurrently.
|
||||
|
||||
|
||||
We outline the key ideas of our central contributions incremental provenance in \Cref{sec:approx-prov}, and review Vizier's microkernel architecture in \Cref{sec:isolation}.
|
||||
We outline the key ideas of our central contributions incremental provenance in \Cref{sec:approx-prov}, and review Vizier's ICE architecture in \Cref{sec:isolation}.
|
||||
The remainder of the paper focuses on our remaining contributions:
|
||||
% \begin{itemize}
|
||||
% \item
|
||||
|
@ -47,7 +46,7 @@ The remainder of the paper focuses on our remaining contributions:
|
|||
% \item
|
||||
(i) \textbf{An Incremental Provenance Scheduler}. In \Cref{sec:scheduler}, we present a scheduler for incremental notebook execution; We identify the challenges that arise due to provenance mispredictions, and discuss how to compensate for them.
|
||||
% \item
|
||||
(ii) \textbf{Jupyter Import}: \Cref{sec:import} discusses how we extract approximate provenance from python code, and how existing notebooks written for Jupyter (or comparable monolithic kernel architectures) can be translated to microkernel notebook architectures like Vizier.
|
||||
(ii) \textbf{Jupyter Import}: \Cref{sec:import} discusses how we extract approximate provenance from python code, and how existing notebooks written for Jupyter (or comparable monolithic kernel architectures) can be translated to ICE notebook architectures like Vizier.
|
||||
% \item
|
||||
(iii) \textbf{Implementation in Vizier and Experiments}: We have implemented a preliminary prototype of the proposed scheduler in Vizier. \Cref{sec:experiments} presents our initial experiments with parallel evaluation of Jupyter notebooks using this implementation.
|
||||
% \end{itemize}
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
%!TEX root=../main.tex
|
||||
|
||||
Recall that in a classical notebook, a cell is run by evaluating its code in the kernel, a single running python interpreter.
|
||||
To facilitate parallel execution, as well as incremental updates, \systemname isolates cells by executing each in a fresh kernel.
|
||||
We note that isolation is incompatible, at least directly, with classical computational notebooks:
|
||||
To facilitate parallel execution, as well as incremental updates, an isolated cell execution notebook (ICE) isolates cells by executing each in a fresh kernel.
|
||||
We note that ICE is incompatible, at least directly, with classical computational notebooks:
|
||||
(i) Cells normally communicate through kernel state, precluding a cell executing in one kernel from accessing variables created in another;
|
||||
(ii) Variables generated by one kernel may need to be accessed after the kernel has exited;
|
||||
(iii) Isolation comes at an impractically high performance cost.
|
||||
|
@ -11,7 +11,7 @@ We note that isolation is incompatible, at least directly, with classical comput
|
|||
First, the runtime must be able to reconstruct the global interpreter state, or at least the necessary subset of it needed to run the cell.
|
||||
We start with a simplified model where inter-cell communication is explicit --- we discuss converting Jupyter notebooks into this model in \Cref{sec:import}.
|
||||
Concretely, for a variable defined in one cell (the writer) to be used in a subsequent cell (the reader): (i) the writer must explicitly export the variable into the global state, and (ii) the reader must explicitly import the variable from the global state.
|
||||
\systemname provides setter and getter functions (respectively) on a global state variable for this purpose.
|
||||
ICE notebooks like Vizier provide setter and getter functions (respectively) on a global state variable for this purpose.
|
||||
|
||||
\tinysection{State Serialization}
|
||||
When a state variable is exported, it is serialized by the python interpreter.
|
||||
|
@ -20,7 +20,7 @@ The artifact is delivered to a central monitor process, and assigned to a name i
|
|||
When a cell imports a symbol from the global state, it contacts the monitor to retrieve the artifact associated with the symbol.
|
||||
The runtime deserializes the artifact and places it into the kernel-local state.
|
||||
|
||||
By default, state is serialized with python's native \texttt{pickle} library, although \systemname can be easily extended with codecs for specialized types that are either unsupported by \texttt{pickle}, or for which it is not efficient:
|
||||
By default, Vizier serializes state through python's native \texttt{pickle} library, although \systemname can be easily extended with codecs for specialized types that are either unsupported by \texttt{pickle}, or for which it is not efficient:
|
||||
(i) Python code (e.g., import statements, and function or class definitions) is exported as raw python code and imported with \texttt{eval}.
|
||||
(ii) Pandas dataframes are exported in parquet format, and are exposed to subsequent cells by the monitor process through Apache Arrow direct access.
|
||||
|
||||
|
|
Loading…
Reference in a new issue