Workflow provenance has been studied extensively (e.g., see \cite{DC07} for a survey), but reliance on explicit dependencies limits its utility in our setting. More closely related are provenance and static analysis techniques from the programming languages community~\cite{NN99}.
Pimentel et al.~\cite{pimentel-19-scmanpfs} provide an overview of research on provenance for scripting (programming) languages and did identify the need for and challenges of fine-grained provenance in this context.
noWorkflow~\cite{pimentel-17-n} collects several types of provenance for Python scripts including environmental information, as well as static and dynamic data- and control-flow,
% \cite{DBLP:conf/tapp/PimentelBMF15} extends noWorkflow to Jupyter notebooks and is closely related to our work,
but in contrast to our work only produces provenance for analysis and debugging and not scheduling.
\cite{macke-21-fglsnin} combines static and dynamic dataflow analysis to track dataflow dependencies during cell execution and warn users of ``unsafe'' interactions where a cell is reading an outdated version of a variable. By contrast, our approach automatically refreshes dependent cells.
Vamsa~\cite{namaki-20-v} also employes static dataflow analysis to analyze provenance of Python ML pipelines. % , but additionally annotates variables with semantic tags (e.g., features and labels).
Dataflow notebooks~\cite{KP17a} extend Jupyter with immutable identifiers for cells and the capability to reference the results of a cell by its identifier.
Additionally, our approach allows parallel execution of independent cells, something that was only alluded to as a possibility in \cite{KP17a}.
Nodebook~\cite{nodebook} is a plugin for Jupyter that checkpoints notebook state in between cells to force in-order cell evaluation; Although closely related to our approach, it does not attempt parallelism, nor automatic re-execution of cells.
\cite{chapman-20-cqfgppp} capture fine-grained provenance at runtime for common classes of relational data transformations in Python preprocessing pipelines. In contrast our approach utilizes static analysis. % and is not limited to operations on relational datasets.