break urls

master
Boris Glavic 2019-12-16 21:06:58 -06:00
parent 9c92b54fcd
commit b35782ac75
3 changed files with 28 additions and 36 deletions

View File

@ -47,7 +47,6 @@
\documentclass{sig-alternate}
\usepackage{cleveref}
\usepackage{listings}
\usepackage{todonotes}
\usepackage{xspace}
@ -75,6 +74,12 @@
\newcommand{\trimfigurespacing}{\vspace*{-5mm}}
\newcommand{\hide}{}
\usepackage{url}
\def\UrlBreaks{\do\/\do-}
\usepackage{breakurl}
\usepackage[bookmarks=false,breaklinks]{hyperref}
\usepackage{cleveref}
\begin{document}
% Copyright
@ -106,7 +111,7 @@
% --- End of Author Metadata ---
% \title{What needs to be REPLaced in notebooks}
\title{Your notebook is not crumby enough, REPLace it.}
\title{Your notebook is not crumby enough, REPLace it}
%
% You need the command \numberofauthors to handle the 'placement
% and alignment' of the authors beneath the title.

View File

@ -1,26 +1,26 @@
% Optional fields: author, title, howpublished, month, year, note
@MISC{vanderplass:2017:reproducibility,
howpublished = {https://twitter.com/jakevdp/status/935178916490223616},
howpublished = {\url{https://twitter.com/jakevdp/status/935178916490223616}},
title = {Idea: Jupyter notebooks could have a "reproducibility mode"},
author = {Jake VanderPlas}
}
% Optional fields: author, title, howpublished, month, year, note
@MISC{zelnicki:2017:nodebook,
howpublished = {https://multithreaded.stitchfix.com/blog/2017/07/26/nodebook/},
howpublished = {\url{https://multithreaded.stitchfix.com/blog/2017/07/26/nodebook/}},
author = {Kevin Zielnicki},
title = {Nodebook}
}
% Optional fields: author, title, howpublished, month, year, note
@MISC{jobevers:2018:jupyterOrderOfExec,
howpublished = {https://github.com/jupyter/notebook/issues/3229},
howpublished = {\url{https://github.com/jupyter/notebook/issues/3229}},
author = {Job Evers-Meltzer},
title = {Enforce a top-down order of execution}
}
@MISC{nyt:wrangling,
howpublished = {http://nyti.ms/1Aqif2X},
howpublished = {\url{http://nyti.ms/1Aqif2X}},
author = {S. Lohr},
title = {For big-data scientists, janitor work is key hurdle to insights.},
year = {2014}

View File

@ -1,35 +1,22 @@
% -*- root: ../paper.tex -*-
Notebook and spreadsheet systems are currently the de-facto standard for data collection, preparation, and analysis.
However, these systems have been criticized for their lack of
reproducibility, versioning, and support for sharing.
%
These shortcomings are particularly detrimental for
data curation where data scientists iteratively
build workflows to clean up and integrate data as a prerequisite for
analysis.
% \hide{ JF: here, there is a disconnect, since in the prev parag we
% talk about spreadsheets too. Also, we get into details that may not
% be clear for readers without giving some background first-- I
% suggest we remove the sentence below. %
% A key reason for these shortcomings is an impedence mismatch between
% the notebook user interface (as a sequence of steps) and the
% underlying implementation of most notebooks (as a library of code
% snippets).}
%
We present Vizier, an open-source tool that helps analysts to
build and refine data pipelines. Vizier combines the flexibility
of notebooks with the easy-to-use data manipulation
interface of spreadsheets.
%a publicly available, open-source workflow-based notebook system aimed at helping analysts to iteratively build and refine data pipelines.
%We highlight two features of Vizier: A spreadsheet interface for
%simultaneous exploration and direct manipulation of data, and caveats,
%an advanced approach for tracking potential data errors.
Combined with advanced provenance tracking for both data
and computational steps this enables reproducibility, versioning, and
streamlined data exploration.
% caveats
Unique to Vizier is that it exposes potential issues with data, no matter whether they already exist in the input or are introduced by the operations of a notebook. We refer to such potential errors as \emph{data caveats}. Caveats are propagated alongside data using principled techniques from uncertain data management. Vizier provides extensive user interface support for caveats, e.g., exposing them as summaries in a dedicated error view and highlighting cells with caveats in spreadsheets.
Notebook and spreadsheet systems are currently the de-facto standard for data
collection, preparation, and analysis. However, these systems have been
criticized for their lack of reproducibility, versioning, and support for
sharing. These shortcomings are particularly detrimental for data curation where
data scientists iteratively build workflows to clean up and integrate data as a
prerequisite for analysis. We present Vizier, an open-source tool that helps
analysts to build and refine data pipelines. Vizier combines the flexibility of
notebooks with the easy-to-use data manipulation interface of spreadsheets.
Combined with advanced provenance tracking for both data and computational steps
this enables reproducibility, versioning, and streamlined data exploration.
Unique to Vizier is that it exposes potential issues with data, no matter
whether they already exist in the input or are introduced by the operations of a
notebook. We refer to such potential errors as \emph{data caveats}. Caveats are
propagated alongside data using principled techniques from uncertain data
management. Vizier provides extensive user interface support for caveats, e.g.,
exposing them as summaries in a dedicated error view and highlighting cells with
caveats in spreadsheets.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../paper"