paper-ParallelPython-Short/sections/import.tex

53 lines
2.8 KiB
TeX

%!TEX root=../main.tex
In this section, we outline the conversion of monolithic kernel notebooks into ICE-compatible forms.
For this preliminary work, we make a simplifying assumption that all inter-cell communication occurs through the kernel's global scope (e.g., as opposed to files).
Python's \texttt{ast} module provides a structured representation of the code: an \emph{abstract syntax tree} (AST).
Variable accesses are marked by instances of the \texttt{Attribute} object, conveniently annotated with the directionality of the reference: \texttt{Load}, \texttt{Store}, or \texttt{Delete}.
Analogous to Program Slicing~\cite{DBLP:journals/tse/Weiser84}, we traverse the AST's statements in-order to build a \emph{fine-grained} data flow graph, where each node is a cell/statement pair, and each directed edge goes from an attribute \texttt{Load} to the corresponding \texttt{Store}.
Control-flow constructs (e.g., if-then-else blocks and for loops) may necessitate multiple edges per \texttt{Load}, as it may read from multiple \texttt{Store} operations.
Python's sccping logic presents additional complications;
First, function and class declarations may reference attributes (e.g., \texttt{import}s) from an enclosing scope, creating transitive dependencies.
When traversing a function or class declaration, we record such dependencies and include them when the symbol is \texttt{Load}ed.
Transitive dependency tracking is complicated due to python's use of mutable closures (e.g., see \Cref{fig:scoping});
In the latter code block, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides assignment in the global scope, even though the enclosing scope is not otherwise accessible.
\begin{figure}
\centering
\begin{subfigure}{0.45\columnwidth}
\begin{minted}{python}
def foo(): print(a)
a = 1
foo() # Prints '1'
def bar():
a = 2
foo()
bar() # Prints '1'
\end{minted}
\end{subfigure}
\begin{subfigure}{0.5\columnwidth}
\begin{minted}{python}
def foo():
def bar(): print(a)
a = 2
return bar
bar = foo()
bar() # Prints '2'
a = 1
bar() # Prints '2'
\end{minted}
\end{subfigure}
\vspace*{-3mm}
\caption{Scope capture in python happens at function definition, but captured scopes remain mutable.}
\label{fig:scoping}
\trimfigurespacing
\end{figure}
Second, the fine-grained dataflow graph, as defined above, is reduced into a simplified \emph{coarse-grained} data flow graph by (i) merging nodes for the statements in a cell, (ii) removing self-edges, and (iii) removing parallel edges with identical labels.
The coarse-grained data flow graph provides the cell's dependencies: The set of in-edges (resp., out-edges) is a guaranteed upper bound on the cell's write set (read set).
As a final step, we inject explicit variable imports and exports for the read and write sets of each cell into the cell's code.