46 lines
4.1 KiB
TeX
46 lines
4.1 KiB
TeX
%!TEX root=../main.tex
|
|
|
|
The baseline semantics of a notebook are a serial execution of the cells in notebook order.
|
|
We refer to the set of variables imported or exported by each cell as the cell's read and write sets, respectively.
|
|
A correct execution is thus defined in terms of view serializability: An execution order is correct iff the artifact versions it imports are consistent with the versions it would import in a serial order.
|
|
We assume that cell execution is atomic and idempotent: we are allowed to freely interrupt the execution of a cell, or to restart it.
|
|
|
|
\tinysection{Naive Scheduling}
|
|
Let $N$ denote a notebook, a sequence of cells $[c_1, \ldots, c_n]$.
|
|
Assume, initially, that for each cell $c_i \in N$ we are given exact read and write sets ($\mathcal R(c_i)$ and $\mathcal W(c_i)$ respectively).
|
|
We define the notebook's data dependency graph $(N, D)$ through a series of edges $(r, w, \ell) \in D$ as follows:
|
|
\begin{multline*}
|
|
D = \{\; (c_r, c_w, \ell) \;|\; c_r,c_w \in N, \ell \in \mathcal R(c_r), \ell \in \mathcal W(c_w), w < r, \\
|
|
\not \exists c_{w'} \in N \text{ s.t. }w < w' < r, \ell \in \mathcal W(c_{w'}) \;\}
|
|
\end{multline*}
|
|
An edge labelled $\ell$ exists from any cell $c_r$ that reads symbol $\ell$ to the most recent preceding cell that writes symbol $\ell$.
|
|
|
|
Denote by $\mathcal S(c) \in \{ \text{PENDING}, \text{DONE} \}$ the state of a cell (i.e., \text{DONE} after it has completed execution); a cell $c$ can be scheduled for execution when all input edges are \text{DONE}:
|
|
$\forall (c, c_w, \ell) \in D : \mathcal S(c_w) = \text{DONE}$
|
|
When a cell $c_r$ imports variable $\ell$ from the global scope, where $(c_r, c_w, \ell) \in D$, it receives the version exported by cell $c_w$.
|
|
It can be trivially shown that this execution model must produce schedules that are view-equivalent to the notebook order.
|
|
|
|
\tinysection{Runtime Refinement}
|
|
The dependency graph is refined during execution in four possible ways: We now consider all four, and their consequences on scheduler \emph{correctness}.
|
|
|
|
\textbf{(i)} An approximated read that is never imported removes the corresponding edge from the dependency graph.
|
|
This is not a correctness error.
|
|
\textbf{(ii)} An approximated write that is never exported redirects inbound edges with the corresponding label to the preceding cell to export the variable.
|
|
The dependent cells could not have been started yet, so there is no possibility of a correctness error.
|
|
\textbf{(iii)} An imported variable not in the approximated read set adds a new edge to the dependency graph.
|
|
If the edge leads to a cell in the \texttt{PENDING} state, the import operation may block until the exporting cell has completed.
|
|
This state is less desirable, as any already allocated resources are tied to the blocked cell and may create resource starvation.
|
|
\textbf{(iv)} An exported variable not in the approximated write set redirects a subset of edges with the corresponding label to the cell.
|
|
This is only a correctness error if one of the dependent cells has already been started --- if so, the cell must be aborted and rescheduled after the current cell completes.
|
|
|
|
% In this preliminary work, we focus on workloads where we can guarantee that we have an upper bound on the read and write sets (i.e., no false negatives).
|
|
% In this setting, approximation errors can lead to poor performance, but not correctness errors.
|
|
|
|
\tinysection{Incremental Re-execution}
|
|
When the user schedules a cell for partial re-execution (e.g., to retrieve new input data), we would like to avoid re-executing cells that will produce identical outputs.
|
|
The cell(s) scheduled for re-execution are moved to the \text{PENDING} state.
|
|
False positives from the approximate dependencies may be true positives in a different execution, so the exact dependencies are no longer valid.
|
|
On the other hand, false negatives (cases (iii) and (iv) above) revealed during the past execution are also valuable.
|
|
Accordingly, the dependency graph is updated according to the union of the approximate and exact dependencies.
|
|
Any \text{DONE} cells that now depend on a cell in the \text{PENDING} state are recursively moved to the \text{PENDING} state and the graph is updated as above.
|