paper-ParallelPython-Short/sections/related.tex
Boris Glavic 977bd993a2 rel
2022-03-22 21:19:53 -05:00

24 lines
2.1 KiB
TeX

Provenance for workflow systems has been studied extensively for several decades. However, workflow systems expect data dependencies to be specified explicitly as part of the workflow specification and, thus, such provenance techniques are not applicable to our problem setting as was also observed in
Pimentel et al.~\cite{pimentel-19-scmanpfs} did provide an overview of research on provenance for scripting (programming) languages and did identify a need for fine-grained provenance.\BG{What other takeaways?}
noWorkflow~\cite{pimentel-17-n, DBLP:conf/tapp/PimentelBMF15} is tool for collecting several types of provenance for python scripts including environmental information (library dependencies and OS environments), static data-flow information, and dynamic (runtime) control- and dataflow information collected using profiling and instrumentation tools. In\cite{DBLP:conf/tapp/PimentelBMF15}, noWorkflow was extended to support collecting provenance
\cite{macke-21-fglsnin} presents an approach that combines static analysis with
\cite{KP17a} introduces Dataflow notebooks which extend Jupyter with immutable identifiers for cells and the capability to reference the results of a cell by its identifier. The purpose of this extension is to attack the problem of implicit cell dependencies caused by shared python interpreter state and out-of-order execution of cells in a notebook. If users are diligent in using these features, then Dataflow notebooks can be used for automatic refresh of dependent cells like our model. However, our model has the advantage that users do not need to change their code to use cell identifiers and cannot accidentally create hidden dependencies since cell executions are isolated from each other. Another advantage of our approach is that it allows parallel execution of independent cells which was only alluded to as a possibility in \cite{KP17a}.
\begin{itemize}
\item Provenance for jupyter or python
\item Dataflow analysis / program slicing
\item Textbook Transactions / Optimistic concurrency control
\end{itemize}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../main"
%%% End: