24 lines
1.3 KiB
TeX
24 lines
1.3 KiB
TeX
|
|
% -*- root: ../paper.tex -*-
|
|
Notebook and spreadsheet systems are currently the de-facto standard for data
|
|
collection, preparation, and analysis. However, these systems have been
|
|
criticized for their lack of reproducibility, versioning, and support for
|
|
sharing. These shortcomings are particularly detrimental for data curation where
|
|
data scientists iteratively build workflows to clean up and integrate data as a
|
|
prerequisite for analysis. We present Vizier, an open-source tool that helps
|
|
analysts to build and refine data pipelines. Vizier combines the flexibility of
|
|
notebooks with the easy-to-use data manipulation interface of spreadsheets.
|
|
Combined with advanced provenance tracking for both data and computational steps
|
|
this enables reproducibility, versioning, and streamlined data exploration.
|
|
Unique to Vizier is that it exposes potential issues with data, no matter
|
|
whether they already exist in the input or are introduced by the operations of a
|
|
notebook. We refer to such potential errors as \emph{data caveats}. Caveats are
|
|
propagated alongside data using principled techniques from uncertain data
|
|
management. Vizier provides extensive user interface support for caveats, e.g.,
|
|
exposing them as summaries in a dedicated error view and highlighting cells with
|
|
caveats in spreadsheets.
|
|
%%% Local Variables:
|
|
%%% mode: latex
|
|
%%% TeX-master: "../paper"
|
|
%%% End:
|