paper-ParallelPython-Short/sections/import.tex
Boris Glavic 4c98ca952d updates
2022-04-01 21:00:20 -05:00

65 lines
3.4 KiB
TeX

%!TEX root=../main.tex
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}
\centering
\begin{subfigure}{0.4\columnwidth}
\centering
\begin{pythoncode}
def foo(): print(a)
a = 1
foo() # Prints '1'
def bar():
a = 2
foo()
bar() # Prints '1'
\end{pythoncode}
\end{subfigure}
\hspace{0.1\columnwidth}
\begin{subfigure}{0.4\columnwidth}
\centering
\begin{pythoncode}
def foo():
def bar(): print(a)
a = 2
return bar
bar = foo()
bar() # Prints '2'
a = 1
bar() # Prints '2'
\end{pythoncode}
\end{subfigure}
\vspace*{-3mm}
\caption{Scope capture in Python happens at function definition, but captured scopes remain mutable.}
\label{fig:scoping}
\trimfigurespacing
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We now outline the conversion of monolithic Jupyter kernel notebooks into ICE-compatible form.
For this preliminary work, we make a simplifying assumption that all inter-cell communication occurs through the kernel's global scope (e.g., as opposed to files).
Python's \texttt{ast} module provides a structured representation of the code: an \emph{abstract syntax tree} (AST).
Variable accesses are marked by instances of the \texttt{Attribute} object annotated with the type of reference: \texttt{Load}, \texttt{Store}, or \texttt{Delete}.
% Analogous to program slicing~\cite{DBLP:journals/tse/Weiser84},
We traverse the AST's statements in-order to build a \emph{fine-grained} dataflow graph, where each node is a cell/statement pair, and each directed edge goes from an attribute \texttt{Load} to the corresponding \texttt{Store}(s).
% Because of control-flow constructs (e.g., if-then-else blocks and for loops) we may have to use multiple edges per \texttt{Load}, as a Load may read from multiple \texttt{Store}(s).
Python's scoping logic presents additional complications;
first, function and class declarations may reference attributes (e.g., imports) from an enclosing scope, creating transitive dependencies.
When traversing a function or class declaration, we record such dependencies and include them when the symbol is \texttt{Load}ed.
Transitive dependency tracking is complicated due to Python's use of mutable closures (e.g., see \Cref{fig:scoping});
In the latter code block, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides an assignment in the global scope, even though the enclosing scope is not otherwise accessible.
Second, the fine-grained dataflow graph, produced as explained above, is reduced into a \emph{coarse-grained} dataflow graph by (i) merging nodes for the statements in a cell, (ii) removing self-edges, and (iii) removing parallel edges with identical labels.
The coarse-grained data flow graph provides an approximation of the cell's dependencies: The set of in-edges (resp., out-edges) is typically an upper bound on the cells real dependencies. % guaranteed upper bound on the cell's write set (read set).
While missed dependencies are theoretically possible, they are rare in the type of code used in typical Jupyter notebooks. Nonetheless, if they arise they will be taken care of by our scheduler. As a final step, we inject explicit variable imports and exports (using Vizier's artifact API) for the read and write sets of each cell into the cell's code.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../main"
%%% End: