paper-2019-CIDR/sections/abstract.tex
2019-12-16 21:06:58 -06:00

24 lines
1.3 KiB
TeX

% -*- root: ../paper.tex -*-
Notebook and spreadsheet systems are currently the de-facto standard for data
collection, preparation, and analysis. However, these systems have been
criticized for their lack of reproducibility, versioning, and support for
sharing. These shortcomings are particularly detrimental for data curation where
data scientists iteratively build workflows to clean up and integrate data as a
prerequisite for analysis. We present Vizier, an open-source tool that helps
analysts to build and refine data pipelines. Vizier combines the flexibility of
notebooks with the easy-to-use data manipulation interface of spreadsheets.
Combined with advanced provenance tracking for both data and computational steps
this enables reproducibility, versioning, and streamlined data exploration.
Unique to Vizier is that it exposes potential issues with data, no matter
whether they already exist in the input or are introduced by the operations of a
notebook. We refer to such potential errors as \emph{data caveats}. Caveats are
propagated alongside data using principled techniques from uncertain data
management. Vizier provides extensive user interface support for caveats, e.g.,
exposing them as summaries in a dedicated error view and highlighting cells with
caveats in spreadsheets.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../paper"
%%% End: