paper-ParallelPython-Short/sections/scheduler.tex

%!TEX root=../main.tex

\newcommand{\ws}[1]{\mathcal{W}(#1)}
\newcommand{\rs}[1]{\mathcal{R}(#1)}
\newcommand{\nc}{\ensuremath{c}\xspace}
\newcommand{\nb}{N}

\newcommand{\dg}{G}
\newcommand{\dep}{D}
\newcommand{\dga}{\widetilde{G}}
\newcommand{\depa}{\widetilde{\dep}}
\newcommand{\cstate}{\mathcal{S}}
\newcommand{\spending}{\textsc{pending}\xspace}
\newcommand{\sdone}{\textsc{done}\xspace}


The semantics of a workbook notebook is the serial execution of the cells in notebook order.
We refer to the set of variables imported or exported by each cell as the cell's read and write sets, respectively.
A \textit{correct} execution is thus defined in terms of view serializability~\cite{WV02}: A (parallel) schedule is correct iff the artifact versions that are read by each cell are consistent with the versions the cell would read in a serial execution. Note that blind writes are not an issue, because writes to an artifact create a new (immutable) version. Thus, cells that blindly write an artifact do not conflict with each other. % the last version of an artifact written by the serial and non-serial schedule will also be the same.
We assume that cell execution is atomic and idempotent: we are allowed to freely interrupt the execution of a cell, or to restart it.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tinysection{Naive Scheduling}
Let $\nb$ denote a notebook, a sequence of cells $[c_1, \ldots, c_n]$.
Assume, initially, that for each cell $c_i \in N$ we are given exact read and write sets ($\rs{c_i}$ and $\ws{c_i}$ respectively).
A notebook's data dependency graph $\dg = (\nb, \dep)$ connects cells through  edges $(r, w, \ell) \in D$ labelled with symbols as follows:
\begin{multline*}
\dep = \{\; (c_r, c_w, \ell) \;|\; c_r,c_w \in N, \ell \in \mathcal R(c_r), \ell \in \mathcal W(c_w), w < r, \\
\not \exists c_{w'} \in N \text{ s.t. }w < w' < r, \ell \in \mathcal W(c_{w'}) \;\}
\end{multline*}
An edge labelled $\ell$ exists from any cell $c_r$ that reads symbol $\ell$ to the most recent preceding cell that writes symbol $\ell$.

Denote by $\cstate(c) \in \{ \spending, \sdone \}$ the state of a cell (i.e., \sdone after it has completed execution); a cell $c$ can be scheduled for execution when all input edges are \sdone:
$\forall (c, c_w, \ell) \in \dep : \cstate(c_w) = \sdone$.
When a cell $c_r$ imports variable $\ell$ from the global scope, where $(c_r, c_w, \ell) \in \dep$, it receives the version exported by cell $c_w$.
Trivially, any execution order that complies with this rule produces schedules that are view-equivalent to the notebook order and, thus, will produce the same result as a serial execution.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tinysection{Runtime Refinement}
Recall that our static analysis approach produces a dependency graph $\dga= (\nb, \depa)$ which may have spurious edges and may miss edges. We refine $\dga$ at runtime. There are four possible types of changes to the dependency graph when a cell $\nc$ is executed. In the following we discuss these cases and how to compensate for them to ensure scheduler \emph{correctness}.

\textbf{(i)} When a read that does not materialize during $\nc$'s execution, we  remove the corresponding edge from the dependency graph. Such spurious reads of a variable $l$ may cause a delay in  $\nc$'s execution, because $\nc$ has to wait for the cell writing $l$ to finish execution. However, the correctness of the schedule is not affects by that.
\textbf{(ii)} A write of $l$ that does not materialize, causes inbound edges with the corresponding label to be redirected to the preceding cell to write $l$. Cells dependent on $\nc$'s version of $l$ could not have been started yet, so the schedule is still valid.
\textbf{(iii)} A missed read that materializes during \nc's execution adds a new edge to the dependency graph.
If the edge leads to a cell $\nc'$ in the \spending state, the read operation may block until the writing cell has completed.
This state is less desirable, as any already allocated resources are tied to the blocked cell $\nc$ and may create resource starvation.
\textbf{(iv)} A missed write of variable $l$ redirects a subset of edges with the corresponding label $l$ to the cell $\nc$.
This is only a correctness error if one of the dependent cells has already been started --- if so, the cell must be aborted and rescheduled after the current cell $\nc$ completes.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proposition}[Termination and Correctness]\label{theo:termination-and-corr}
For any notebook $\nb$ and approximated dependency graph $\dga$ for $\nb$, the execution of $\nb$ using the naive approach with  refinement and compensation is guaranteed to terminate and produces a correct schedule.
\end{proposition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% In this preliminary work, we focus on workloads where we can guarantee that we have an upper bound on the read and write sets (i.e., no false negatives).
% In this setting, approximation errors can lead to poor performance, but not correctness errors.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tinysection{Incremental Re-execution}
Vizier automatically refreshes dependent cells when a cell $\nc$ is modified by the user using incremental re-execution which avoids re-execution of cells whose output will be the same in the modified notebook. For that, the modified cell $\nc$ is put into \spending state. Furthermore, all cells that depend on $\nc$ directly or indirectly are also put into \spending state. That is, we memorize a cells actual dependencies from the previous execution and initially assume that the dependency graph will be the same as in the previous execution. The exception is the modified cell for which we statically approximate provenance from scratch. During the execution of the modified cell or one of its dependencies we may observe changes to the read and write set of a cell. We compensate for that using the repair actions described above.

% When the user schedules a cell for partial re-execution (e.g., to retrieve new input data), we would like to avoid re-executing dependent cells that will produce identical outputs.
% The cell(s) scheduled for re-execution are moved to the \spending state.
% False positives from the approximate dependencies may be true positives in a different execution, so the exact dependencies are no longer valid.
% On the other hand, false negatives (cases (iii) and (iv) above) revealed during the past execution are also valuable.
% Accordingly, the dependency graph is updated according to the union of the approximate and exact dependencies.
% Any \sdone cells that now depend on a cell in the \spending state are recursively moved to the \spending state and the graph is updated as above.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../main"
%%% End: