55 lines
2.9 KiB
TeX
55 lines
2.9 KiB
TeX
%!TEX root=../main.tex
|
|
|
|
In this section, we outline the conversion of monolithic kernel notebooks into ICE-compatible forms.
|
|
For this preliminary work, we make a simplifying assumption that all inter-cell communication occurs through the kernel's global scope (e.g., as opposed to files).
|
|
|
|
Python's \texttt{ast} module provides a structured representation of the code: an \emph{abstract syntax tree} (AST).
|
|
Variable accesses are marked by instances of the \texttt{Attribute} object, conveniently annotated with the directionality of the reference: \texttt{Load}, \texttt{Store}, or \texttt{Delete}.
|
|
Analogous to Program Slicing~\cite{DBLP:journals/tse/Weiser84}, we traverse the AST's statements in-order to build a \emph{fine-grained} data flow graph, where each node is a cell/statement pair, and each directed edge goes from an attribute \texttt{Load} to the corresponding \texttt{Store}.
|
|
Control-flow constructs (e.g., if-then-else blocks and for loops) may necessitate multiple edges per \texttt{Load}, as it may read from multiple \texttt{Store} operations.
|
|
|
|
Python's sccping logic presents additional complications;
|
|
First, function and class declarations may reference attributes (e.g., \texttt{import}s) from an enclosing scope, creating transitive dependencies.
|
|
When traversing a function or class declaration, we record such dependencies and include them when the symbol is \texttt{Load}ed.
|
|
Transitive dependency tracking is complicated due to python's use of mutable closures (e.g., see \Cref{fig:scoping});
|
|
In the latter code block, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides assignment in the global scope, even though the enclosing scope is not otherwise accessible.
|
|
|
|
|
|
\begin{figure}
|
|
\begin{center}
|
|
\begin{subfigure}{0.45\columnwidth}
|
|
\begin{minted}{python}
|
|
def foo():
|
|
print(a)
|
|
a = 1
|
|
foo() # Prints '1'
|
|
def bar():
|
|
a = 2
|
|
foo()
|
|
bar() # Prints '1'
|
|
\end{minted}
|
|
\end{subfigure}
|
|
\begin{subfigure}{0.5\columnwidth}
|
|
\begin{minted}{python}
|
|
def foo():
|
|
def bar():
|
|
print(a)
|
|
a = 2
|
|
return bar
|
|
bar = foo()
|
|
bar() # Prints '2'
|
|
a = 1
|
|
bar() # Prints '2'
|
|
\end{minted}
|
|
\end{subfigure}
|
|
\label{fig:scoping}
|
|
\caption{Scope capture in python happens at function definition, but captured scopes remain mutable.}
|
|
\trimfigurespacing
|
|
\end{center}
|
|
\end{figure}
|
|
|
|
Second, the fine-grained dataflow graph, as defined above, is reduced into a simplified \emph{coarse-grained} data flow graph by (i) merging nodes for the statements in a cell, (ii) removing self-edges, and (iii) removing parallel edges with identical labels.
|
|
The coarse-grained data flow graph provides the cell's dependencies: The set of in-edges (resp., out-edges) is a guaranteed upper bound on the cell's write set (read set).
|
|
As a final step, we inject explicit variable imports and exports for the read and write sets of each cell into the cell's code.
|
|
|