Formatting and space.

master
Oliver Kennedy 2022-03-31 14:47:06 -04:00
parent 66a9ae79c9
commit 87eaf3caea
Signed by: okennedy
GPG Key ID: 3E5F9B3ABD3FDB60
8 changed files with 25 additions and 14 deletions

1
.gitignore vendored
View File

@ -9,3 +9,4 @@ pdfa.xmpi
main.bbl main.bbl
main.blg main.blg
vizier.db vizier.db
/_minted-main

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 58 KiB

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 26 KiB

View File

@ -2,6 +2,7 @@
%\documentclass{vldb} %\documentclass{vldb}
\settopmatter{printacmref=false} \settopmatter{printacmref=false}
\setcopyright{none}
\usepackage{hyperref} \usepackage{hyperref}
\usepackage[a-1b]{pdfx} \usepackage[a-1b]{pdfx}
@ -49,7 +50,7 @@
\newcommand{\systemname}{Workbook\xspace} \newcommand{\systemname}{Workbook\xspace}
\newcommand{\TheTitle}{Coarse-Grained Dataflow Provenance} \newcommand{\TheTitle}{Incremental Provenance}
\pagestyle{plain} \pagestyle{plain}

View File

@ -7,7 +7,7 @@ Parallelizing cell execution requires an ICE architecture, which comes at the co
In this section, we assess that cost. In this section, we assess that cost.
All experiments were run on a XX GHz, XX core Intel Xeon with XX GB RAM running XX Linux\OK{Boris, Nachiket, can you fill this in?}. All experiments were run on a XX GHz, XX core Intel Xeon with XX GB RAM running XX Linux\OK{Boris, Nachiket, can you fill this in?}.
The provenance aware scheduler was integrated into Vizier 1.2\footnote{https://github.com/VizierDB/vizier-scala} --- our experiments use a lightly modified version with support for importing Jupyter notebooks, and the related \texttt{-X PARALLEL-PYTHON} experimental option. The provenance aware scheduler was integrated into Vizier 1.2\footnote{\url{https://github.com/VizierDB/vizier-scala}} --- our experiments use a lightly modified version with support for importing Jupyter notebooks, and the related \texttt{-X PARALLEL-PYTHON} experimental option.
As Vizier relies on Apache Spark, we prefix all notebooks under test with a single reader and writer cell to force initialization of e.g., Spark's HDFS module. These are not included in timing results. As Vizier relies on Apache Spark, we prefix all notebooks under test with a single reader and writer cell to force initialization of e.g., Spark's HDFS module. These are not included in timing results.
\begin{figure*} \begin{figure*}

View File

@ -20,9 +20,10 @@ For example, python's \texttt{import} statement simply declares imported modules
Thus, references to the module's functions within a function or class definition create transitive dependencies. Thus, references to the module's functions within a function or class definition create transitive dependencies.
When the traversal visits a function or class declaration statement, we record When the traversal visits a function or class declaration statement, we record
An additional complication arises from python's scope capture semantics. \begin{figure}
When a function (or class) is declared, it records a reference to all enclosing scopes. Consider the following example code: \begin{center}
\begin{lstlisting} \begin{subfigure}{0.45\columnwidth}
\begin{minted}{python}
def foo(): def foo():
print(a) print(a)
a = 1 a = 1
@ -31,21 +32,29 @@ def bar():
a = 2 a = 2
foo() foo()
bar() # Prints '1' bar() # Prints '1'
\end{lstlisting} \end{minted}
\end{subfigure}
\begin{lstlisting} \begin{subfigure}{0.5\columnwidth}
\begin{minted}{python}
def foo(): def foo():
a = 2
def bar(): def bar():
print(a) print(a)
a = 2
return bar return bar
bar = foo() bar = foo()
bar() # Prints '2' bar() # Prints '2'
a = 1 a = 1
bar() # Prints '2' bar() # Prints '2'
\end{lstlisting} \end{minted}
\end{subfigure}
\end{center}
\label{fig:scoping}
\caption{Scope capture in python happens at function definition, but captured scopes remain mutable.}
\end{figure}
In the latter instance, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides assignment in the global scope. An additional complication arises from python's scope capture semantics.
When a function (or class) is declared, it records a reference to all enclosing scopes. Consider the following example code in \Cref{fig:scoping}.
In the latter code block, when \texttt{bar} is declared, it `captures' the scope of \texttt{foo}, in which \texttt{a = 2}, and overrides assignment in the global scope.
In the former instance, conversely, \texttt{bar}'s assignment to \texttt{a} happens in its own scope, and so the invocation of \texttt{foo} reads the instance of \texttt{a} in the global scope. In the former instance, conversely, \texttt{bar}'s assignment to \texttt{a} happens in its own scope, and so the invocation of \texttt{foo} reads the instance of \texttt{a} in the global scope.
\tinysection{Coarse-Grained Data Flow} \tinysection{Coarse-Grained Data Flow}

View File

@ -27,8 +27,8 @@ We then show generality by discussing the process for importing Jupyter notebook
\begin{figure} \begin{figure}
% \includegraphics[width=\columnwidth]{graphics/depth_vs_cellcount.vega-lite.pdf} % \includegraphics[width=\columnwidth]{graphics/depth_vs_cellcount.vega-lite.pdf}
\includegraphics[width=0.8\columnwidth]{graphics/depth_vs_cellcount-averaged.vega-lite.pdf} \includegraphics[width=0.8\columnwidth]{graphics/depth_vs_cellcount-averaged.vega-lite.pdf}
\caption{Notebook size versus workflow depth in a collection of notebooks scraped from github~\cite{DBLP:journals/ese/PimentelMBF21}: On average, only one out of every 4 notebook cells must be run serially.} \caption{Notebook size versus workflow depth in a collection of notebooks scraped from github~\cite{DBLP:journals/ese/PimentelMBF21}: On average, only one out of every 4 notebook cells has serial dependencies.}
\label{fig:parallelismSurvey} \label{fig:parallelismSurvey}
\end{figure} \end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%