Formatting and space.

master
Oliver Kennedy 2022-03-31 14:47:06 -04:00
parent 66a9ae79c9
commit 87eaf3caea
Signed by: okennedy
GPG Key ID: 3E5F9B3ABD3FDB60
8 changed files with 25 additions and 14 deletions

1
.gitignore vendored
View File

@ -9,3 +9,4 @@ pdfa.xmpi
main.bbl
main.blg
vizier.db
/_minted-main

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 58 KiB

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 26 KiB

View File

@ -2,6 +2,7 @@
%\documentclass{vldb}
\settopmatter{printacmref=false}
\setcopyright{none}
\usepackage{hyperref}
\usepackage[a-1b]{pdfx}
@ -49,7 +50,7 @@
\newcommand{\systemname}{Workbook\xspace}
\newcommand{\TheTitle}{Coarse-Grained Dataflow Provenance}
\newcommand{\TheTitle}{Incremental Provenance}
\pagestyle{plain}

View File

@ -7,7 +7,7 @@ Parallelizing cell execution requires an ICE architecture, which comes at the co
In this section, we assess that cost.
All experiments were run on a XX GHz, XX core Intel Xeon with XX GB RAM running XX Linux\OK{Boris, Nachiket, can you fill this in?}.
The provenance aware scheduler was integrated into Vizier 1.2\footnote{https://github.com/VizierDB/vizier-scala} --- our experiments use a lightly modified version with support for importing Jupyter notebooks, and the related \texttt{-X PARALLEL-PYTHON} experimental option.
The provenance aware scheduler was integrated into Vizier 1.2\footnote{\url{https://github.com/VizierDB/vizier-scala}} --- our experiments use a lightly modified version with support for importing Jupyter notebooks, and the related \texttt{-X PARALLEL-PYTHON} experimental option.
As Vizier relies on Apache Spark, we prefix all notebooks under test with a single reader and writer cell to force initialization of e.g., Spark's HDFS module. These are not included in timing results.
\begin{figure*}

View File

@ -20,9 +20,10 @@ For example, python's \texttt{import} statement simply declares imported modules
Thus, references to the module's functions within a function or class definition create transitive dependencies.
When the traversal visits a function or class declaration statement, we record
An additional complication arises from python's scope capture semantics.
When a function (or class) is declared, it records a reference to all enclosing scopes. Consider the following example code:
\begin{lstlisting}
\begin{figure}
\begin{center}
\begin{subfigure}{0.45\columnwidth}
\begin{minted}{python}
def foo():
print(a)
a = 1
@ -31,21 +32,29 @@ def bar():
a = 2
foo()
bar() # Prints '1'
\end{lstlisting}
\begin{lstlisting}
\end{minted}
\end{subfigure}
\begin{subfigure}{0.5\columnwidth}
\begin{minted}{python}
def foo():
a = 2
def bar():
print(a)
a = 2
return bar
bar = foo()
bar() # Prints '2'
a = 1
bar() # Prints '2'
\end{lstlisting}
\end{minted}
\end{subfigure}
\end{center}
\label{fig:scoping}
\caption{Scope capture in python happens at function definition, but captured scopes remain mutable.}
\end{figure}
In the latter instance, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides assignment in the global scope.
An additional complication arises from python's scope capture semantics.
When a function (or class) is declared, it records a reference to all enclosing scopes. Consider the following example code in \Cref{fig:scoping}.
In the latter code block, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides assignment in the global scope.
In the former instance, conversely, \texttt{bar}'s assignment to \texttt{a} happens in its own scope, and so the invocation of \texttt{foo} reads the instance of \texttt{a} in the global scope.
\tinysection{Coarse-Grained Data Flow}

View File

@ -27,8 +27,8 @@ We then show generality by discussing the process for importing Jupyter notebook
\begin{figure}
% \includegraphics[width=\columnwidth]{graphics/depth_vs_cellcount.vega-lite.pdf}
\includegraphics[width=0.8\columnwidth]{graphics/depth_vs_cellcount-averaged.vega-lite.pdf}
\caption{Notebook size versus workflow depth in a collection of notebooks scraped from github~\cite{DBLP:journals/ese/PimentelMBF21}: On average, only one out of every 4 notebook cells must be run serially.}
\label{fig:parallelismSurvey}
\caption{Notebook size versus workflow depth in a collection of notebooks scraped from github~\cite{DBLP:journals/ese/PimentelMBF21}: On average, only one out of every 4 notebook cells has serial dependencies.}
\label{fig:parallelismSurvey}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%