paper-HILDA-2016-Spreadsheets/sections/abstract.tex

12 lines
1.8 KiB
TeX
Raw Normal View History

%!TEX root = ../main.tex
2016-04-24 12:50:41 -04:00
\begin{abstract}
2016-04-24 15:42:12 -04:00
The database community has developed a plethora of tools and techniques for supporting data curation and analysis including declarative query languages, data cleaning approaches, entity resolution and data fusion algorithms, schema matching and mapping, and many more. While usability has recently been observed to be a problem with databases~\cite{JC07}, there is currently no consensus on what is the best way of exposing these powerful tools to an analyst in a fashion that aids exploratory data curation and analysis. Thus, analysts continue to rely on tools such as spreadsheets and notebook-style programming environments~\cite{Chan1996119} (e.g., iPython notebook) for their data curation needs and cannot benefit from the contributions made by the database community. In this work we argue that both spreadsheets and notebooks have their advantages and disadvantages, and that a user friendly curation tool should expose automated data curation techniques through an interface that is an extended hybrid between the spreadsheet and notebook UIs. To support exploratory data curation and analysis, additional functionality is need that is found neither in spreadsheets or notebooks. Particularly, support for changing past decisions on the fly and having these changes propagate through an analysis workflow, the ability to combine small idiosyncratic manual curation steps to form a larger, more readable computation, and support for higher-level, automated data curation operations.
2016-04-24 17:05:14 -04:00
We also discuss the technical challenges of supporting such a hybrid UI over large scale datasets, present our vision of \sysname \,- a system that exposes data curation operations through such an interface - and its declarative spreadsheet language \langname.
2016-04-24 12:50:41 -04:00
\end{abstract}
2016-04-24 13:03:33 -04:00
2016-04-24 12:50:41 -04:00
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../main"
%%% End: