Formatting and space.
parent
66a9ae79c9
commit
87eaf3caea
|
@ -9,3 +9,4 @@ pdfa.xmpi
|
|||
main.bbl
|
||||
main.blg
|
||||
vizier.db
|
||||
/_minted-main
|
||||
|
|
Binary file not shown.
File diff suppressed because one or more lines are too long
Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 55 KiB |
Binary file not shown.
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 26 KiB |
3
main.tex
3
main.tex
|
@ -2,6 +2,7 @@
|
|||
%\documentclass{vldb}
|
||||
|
||||
\settopmatter{printacmref=false}
|
||||
\setcopyright{none}
|
||||
|
||||
\usepackage{hyperref}
|
||||
\usepackage[a-1b]{pdfx}
|
||||
|
@ -49,7 +50,7 @@
|
|||
|
||||
\newcommand{\systemname}{Workbook\xspace}
|
||||
|
||||
\newcommand{\TheTitle}{Coarse-Grained Dataflow Provenance}
|
||||
\newcommand{\TheTitle}{Incremental Provenance}
|
||||
|
||||
\pagestyle{plain}
|
||||
|
||||
|
|
|
@ -7,7 +7,7 @@ Parallelizing cell execution requires an ICE architecture, which comes at the co
|
|||
In this section, we assess that cost.
|
||||
|
||||
All experiments were run on a XX GHz, XX core Intel Xeon with XX GB RAM running XX Linux\OK{Boris, Nachiket, can you fill this in?}.
|
||||
The provenance aware scheduler was integrated into Vizier 1.2\footnote{https://github.com/VizierDB/vizier-scala} --- our experiments use a lightly modified version with support for importing Jupyter notebooks, and the related \texttt{-X PARALLEL-PYTHON} experimental option.
|
||||
The provenance aware scheduler was integrated into Vizier 1.2\footnote{\url{https://github.com/VizierDB/vizier-scala}} --- our experiments use a lightly modified version with support for importing Jupyter notebooks, and the related \texttt{-X PARALLEL-PYTHON} experimental option.
|
||||
As Vizier relies on Apache Spark, we prefix all notebooks under test with a single reader and writer cell to force initialization of e.g., Spark's HDFS module. These are not included in timing results.
|
||||
|
||||
\begin{figure*}
|
||||
|
|
|
@ -20,9 +20,10 @@ For example, python's \texttt{import} statement simply declares imported modules
|
|||
Thus, references to the module's functions within a function or class definition create transitive dependencies.
|
||||
When the traversal visits a function or class declaration statement, we record
|
||||
|
||||
An additional complication arises from python's scope capture semantics.
|
||||
When a function (or class) is declared, it records a reference to all enclosing scopes. Consider the following example code:
|
||||
\begin{lstlisting}
|
||||
\begin{figure}
|
||||
\begin{center}
|
||||
\begin{subfigure}{0.45\columnwidth}
|
||||
\begin{minted}{python}
|
||||
def foo():
|
||||
print(a)
|
||||
a = 1
|
||||
|
@ -31,21 +32,29 @@ def bar():
|
|||
a = 2
|
||||
foo()
|
||||
bar() # Prints '1'
|
||||
\end{lstlisting}
|
||||
|
||||
\begin{lstlisting}
|
||||
\end{minted}
|
||||
\end{subfigure}
|
||||
\begin{subfigure}{0.5\columnwidth}
|
||||
\begin{minted}{python}
|
||||
def foo():
|
||||
a = 2
|
||||
def bar():
|
||||
print(a)
|
||||
a = 2
|
||||
return bar
|
||||
bar = foo()
|
||||
bar() # Prints '2'
|
||||
a = 1
|
||||
bar() # Prints '2'
|
||||
\end{lstlisting}
|
||||
\end{minted}
|
||||
\end{subfigure}
|
||||
\end{center}
|
||||
\label{fig:scoping}
|
||||
\caption{Scope capture in python happens at function definition, but captured scopes remain mutable.}
|
||||
\end{figure}
|
||||
|
||||
In the latter instance, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides assignment in the global scope.
|
||||
An additional complication arises from python's scope capture semantics.
|
||||
When a function (or class) is declared, it records a reference to all enclosing scopes. Consider the following example code in \Cref{fig:scoping}.
|
||||
In the latter code block, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides assignment in the global scope.
|
||||
In the former instance, conversely, \texttt{bar}'s assignment to \texttt{a} happens in its own scope, and so the invocation of \texttt{foo} reads the instance of \texttt{a} in the global scope.
|
||||
|
||||
\tinysection{Coarse-Grained Data Flow}
|
||||
|
|
|
@ -27,8 +27,8 @@ We then show generality by discussing the process for importing Jupyter notebook
|
|||
\begin{figure}
|
||||
% \includegraphics[width=\columnwidth]{graphics/depth_vs_cellcount.vega-lite.pdf}
|
||||
\includegraphics[width=0.8\columnwidth]{graphics/depth_vs_cellcount-averaged.vega-lite.pdf}
|
||||
\caption{Notebook size versus workflow depth in a collection of notebooks scraped from github~\cite{DBLP:journals/ese/PimentelMBF21}: On average, only one out of every 4 notebook cells must be run serially.}
|
||||
\label{fig:parallelismSurvey}
|
||||
\caption{Notebook size versus workflow depth in a collection of notebooks scraped from github~\cite{DBLP:journals/ese/PimentelMBF21}: On average, only one out of every 4 notebook cells has serial dependencies.}
|
||||
\label{fig:parallelismSurvey}
|
||||
\end{figure}
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
|
|
Loading…
Reference in New Issue