abstract

2022-03-31 21:42:28 -05:00 · 2022-03-31 21:42:28 -05:00 · 8b30a85a7e
parent e6de4169c1
commit 8b30a85a7e
1 changed files with 9 additions and 8 deletions
--- a/sections/abstract.tex
+++ b/sections/abstract.tex
@ -1,12 +1,13 @@
 %!TEX root=../main.tex

-Computational notebooks (e.g., Jupyter or Apache Zeppelin) have become a popular choice for Data Exploration, Preparation, and ETL.
-Indeed, a single notebook may evolve through all three phases, with users exploring a dataset to identify problems, repairing the problems, and then deploying them to a system like Papermill for batch processing on related datasets.
-Notebooks more user-friendly for ETL than the classical state of the art, workflow systems, which require users to explicitly target batch processing and manually specify inputs and outputs.
-However, the notebook model suffers from poor reproducibility, do not automatically support incremental re-evaluation when inputs change, and must be executed in serial order --- all symptoms of its kernel-based evaluation strategy.
-In this paper, we propose a new a new ``workbook'' execution model that retains the usability of notebooks, and the provenance capabilities of workflow systems.
-We address key challenges in the workbook model, including information flow, static analysis, scheduling in the presence of ambiguous dependencies, and importing Jupyter notebooks into the workbook model.
-We also discuss the implementation of the workbook model within our existing notebook engine \textsc{Vizier}, and evaluate the resulting implementation.
+Computational notebooks (e.g., Jupyter or Apache Zeppelin) have become a popular choice for data exploration, preparation, and ETL.
+% Indeed, a single notebook may evolve through all three phases, with
+% Users typically first explore a dataset to identify problems, then repair the problems, finally deploy their pipeline to a system like Papermill for batch processing on related datasets.
+Notebooks are more user-friendly for ETL than classical workflow systems, because they provide immediate feedback for intermediate results and do not require the full computation upfront  to be specified upfront. % the user to specify including the inputs and outputs of each step.
+However, the notebook model suffers from poor reproducibility, does not  support automatic incremental re-evaluation of code when inputs change, and does not allow for parallel execution of cells --- all symptoms of its kernel-based evaluation strategy.
+We propose a new \emph{``workbook''} execution model that combines the usability of notebooks with the provenance and parallel execution capabilities of workflow systems. This is made possible through a novel approach that refines a static approximation of provenance at runtime and a scheduler that dynamically adapts the execution order of cells based on data dependencies detected during refinement. Additionally, this enables translation of Jupyter notebooks into workbooks.
+% We address key challenges in the workbook model, including information flow, static analysis, scheduling in the presence of ambiguous dependencies, and importing Jupyter notebooks into the workbook model.
+We implement this model in our  notebook engine \textsc{Vizier}, and evaluate the resulting implementation.


 % We show h
@ -16,7 +17,7 @@ We also discuss the implementation of the workbook model within our existing not
 %  that is largely compat


-%  that retains the user-friendliness of python 
+%  that retains the user-friendliness of python