paper-ParallelPython-Short/sections/experiments.tex

%!TEX root=../main.tex

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure*}[t]
  \newcommand{\plotminusvspace}{-3mm}
  \begin{subfigure}[b]{.3\textwidth}
    \includegraphics[width=0.9\columnwidth,trim=0 0 0 0]{graphics/gantt_serial.pdf}
    \vspace*{\plotminusvspace}
    \caption{Serial Execution}
    \label{fig:gantt:serial}
  \end{subfigure}
  \begin{subfigure}[b]{.3\textwidth}
    \includegraphics[width=0.9\columnwidth,trim=0 0 0 0]{graphics/gantt_parallel.pdf}
    \vspace*{\plotminusvspace}
    \label{fig:gantt:serial}
    \caption{Parallel Execution}
  \end{subfigure}
  % \begin{subfigure}[b]{.24\textwidth}
  %   \includegraphics[width=\columnwidth]{graphics/gantt_serial.png}
  %   \vspace*{\plotminusvspace}
  %   \label{fig:gantt:serial}
  %   \caption{Scalability - Read}
  % \end{subfigure}
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \begin{subfigure}{0.3\linewidth}
    \vspace*{-26mm}
    \includegraphics[width=0.9\columnwidth,trim=0 0 0 0]{graphics/scalability-read.pdf}
    \vspace*{\plotminusvspace}
    \caption{Scalability - Read}\label{fig:scalability}
  \end{subfigure}
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \vspace*{-5mm}
  \caption{Workload traces for a synthetic reader/writer workload}
  \label{fig:gantt}
  \trimfigurespacing
\end{figure*}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

As a proof of concept, we implemented the static analysis approach from \Cref{sec:import} as a simple, provenance-aware parallel scheduler (\Cref{sec:scheduler}) within the Vizier notebook system~\cite{brachmann:2020:cidr:your}.
Parallelizing cell execution requires an ICE architecture, which comes at the cost of increased communication overhead relative to monolithic kernel notebooks.
% In this section, we assess that cost.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tinysection{Implementation}
The parallel scheduler was integrated into Vizier 1.2\footnote{\url{https://github.com/VizierDB/vizier-scala}}.
%--- our experiments lightly modify this version for Jupyter notebooks and the related \texttt{-X PARALLEL-PYTHON} experimental option.
We additionally added a pooling feature to mitigate Python's high startup cost (600ms up to multiple seconds); The modified Vizier pre-launches a small pool of Python instances and keeps them running in the background.
%Our current implementation selects kernels from the pool arbitrarily.
In future work, we plan to allow kernels to cache artifacts, and prioritize the use of kernels that have already loaded artifacts we expect the cell to read. Note that this prototype implementation does not yet implement the repair actions for missing dependencies.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tinysection{Experiments}
All experiments were run on Ubuntu 20.04 on a server  with 2 x AMD Opteron 4238 CPUs (3.3Ghz), 128GB of RAM, and 4 x 1TB 7.2k RPM HDDs in hardware Raid 5. As Vizier relies on Apache Spark, we prefix all notebooks under test with a single reader and writer cell to force initialization of e.g., Spark's HDFS module.  These are not included in timing results.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tinysection{Overview}
As a preliminary experiment, we ran a synthetic workload consisting of one cell that randomly generates a 100k-row, 2 integer column Pandas dataframe and exports it, and 10 reader cells that read the dataset and perform a compute intensive task: Computing pairwise distance for a 10k-row subset of the source dataset.
\Cref{fig:gantt} shows execution traces for the workload in Vizier with its default (serial) scheduler and Vizier with its new (parallel) scheduler.
The experiments shows that the parallel execution is $\sim 4$ times faster than the serial execution. However, each individual reader takes longer to finish in the parallel execution.   % an overhead of XX\OK{Fill in}s overhead as Python exports data, and a XXs overhead from loading the data back in.
% We observe several oppoortunities for potential improvement:
% First, the serial first access to the dataset is 2s more expensive than the remaining lookups as Vizier loads and prepares to host the dataset through the Arrow protocol.  We expect that such startup costs can be mitigated, for example by having the Python kernel continue hosting the dataset itself while the monitor process is loading the data.
% We also note that this overhead grows to almost 10s in the parallel case.  In addition to startup-costs,
This is possibly the result of contention on the dataset. % Even when executing cells in parallel execution, it may be beneficial to stagger cell starts.
Nonetheless, this preliminary result and the analysis shown in \Cref{fig:parallelismSurvey} demonstrate the potential for parallel execution of notebooks.  % parallel execution even in its preliminary implementation already reduces runtime from 80 to 20 seconds for this test case.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tinysection{Scaling}
\Cref{fig:scalability}

ICE cost of reading data relative to data size.


We ran the experiment with cold and hot cache.  shows the results of this experiment. Note that Vizier scales linearly for larger dataset sizes.
% We specifically measure the cost of:
% (i) exporting a dataset,
% (ii) importing a 'cold' dataset, and
% (iii) importing a 'hot' dataset.
% Results are shown in XXXXX\BG{add}.


%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../main"
%%% End: