paper-ParallelPython-Short/sections/import.tex

%!TEX root=../main.tex

In this section, we outline the conversion of monolithic kernel notebooks into  ICE-compatible forms.
For this preliminary work, we make a simplifying assumption that all inter-cell communication occurs through the kernel's global scope (e.g., as opposed to files).

Python's \texttt{ast} module provides a structured representation of the code: an \emph{abstract syntax tree} (AST).
Variable accesses are marked by instances of the \texttt{Attribute} object, conveniently annotated with the directionality of the reference: \texttt{Load}, \texttt{Store}, or \texttt{Delete}.
Analogous to Program Slicing~\cite{DBLP:journals/tse/Weiser84}, we traverse the AST's statements in-order to build a \emph{fine-grained} data flow graph, where each node is a cell/statement pair, and each directed edge goes from an attribute \texttt{Load} to the corresponding \texttt{Store}.
Control-flow constructs (e.g., if-then-else blocks and for loops) may necessitate multiple edges per \texttt{Load}, as it may read from multiple \texttt{Store} operations.

Python's sccping logic presents additional complications;
First, function and class declarations may reference attributes (e.g., \texttt{import}s) from an enclosing scope, creating transitive dependencies.
When traversing a function or class declaration, we record such dependencies and include them when the symbol is \texttt{Load}ed.
Transitive dependency tracking is complicated due to python's use of mutable closures (e.g., see \Cref{fig:scoping});
In the latter code block, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides assignment in the global scope, even though the enclosing scope is not otherwise accessible.


\begin{figure}
  \begin{center}
  \begin{subfigure}{0.45\columnwidth}
\begin{minted}{python}
def foo():
  print(a)
a = 1
foo()  # Prints '1'
def bar():
  a = 2
  foo()
bar()  # Prints '1'
\end{minted}
  \end{subfigure}
  \begin{subfigure}{0.5\columnwidth}
\begin{minted}{python}
def foo():
  def bar():
    print(a)
  a = 2
  return bar
bar = foo()
bar()  # Prints '2'
a = 1
bar()  # Prints '2'
\end{minted}
  \end{subfigure}
  \label{fig:scoping}
  \caption{Scope capture in python happens at function definition, but captured scopes remain mutable.}
  \trimfigurespacing
  \end{center}
\end{figure}

Second, the fine-grained dataflow graph, as defined above, is reduced into a simplified \emph{coarse-grained} data flow graph by (i) merging nodes for the statements in a cell, (ii) removing self-edges, and (iii) removing parallel edges with identical labels.
The coarse-grained data flow graph provides the cell's dependencies: The set of in-edges (resp., out-edges) is a guaranteed upper bound on the cell's write set (read set).
As a final step, we inject explicit variable imports and exports for the read and write sets of each cell into the cell's code.