6 page corrected version

master
Boris Glavic 2016-05-28 17:16:36 -05:00
parent e02536b5a2
commit ae1ca2fe83
6 changed files with 22 additions and 18 deletions

View File

@ -211,7 +211,7 @@
Doi = {10.1007/3-540-45587-6_12},
Isbn = {978-3-540-45587-5},
Pages = {173--191},
Publisher = {Springer Berlin Heidelberg},
Publisher = {Springer},
Title = {Practical Aspects of Declarative Languages},
Url = {http://dx.doi.org/10.1007/3-540-45587-6_12},
Year = {2002},
@ -301,10 +301,10 @@
Author = {B. Arab and D. Gawlick and V. Krishnaswamy and V. Radhakrishnan and B. Glavic},
Date-Added = {2016-04-24 20:17:27 +0000},
Date-Modified = {2016-04-24 20:27:03 +0000},
Institution = {Illinois Institute of Technology},
Number = {IIT/CS-DB-2016-01},
Institution = {IIT},
Title = {Formal Foundations of Reenactment and Transaction Provenance},
Year = {2016}}
@Comment Number = {IIT/CS-DB-2016-01},
@techreport{AG14a,
Author = {B. Arab and D. Gawlick and V. Krishnaswamy and V. Radhakrishnan and B. Glavic},

View File

@ -1,7 +1,9 @@
We present our vision for Visier, a data curation system which exposes powerful curation operations through a UI that is a hybrid between the spreadsheet and notebook interface paradigms. In this work we focus on the user interface as well as present the initial design of a language \langname that can serve as the underlying computational model for operations in the system.
\noindent\textbf{Acknowledgements: } \textit{
This work was supported in part by gifts from Oracle and NSF Grant CNS-1229185. Juliana Freire is partially supported by Defense Advanced Research Projects Agency (DARPA) MEMEX program award FA8750-14-2-023.
This work was supported in part by gifts from Oracle and NSF Grant CNS-1229185. Juliana Freire is partially supported by % Defense Advanced Research Projects Agency (
DARPA MEMEX % program
award FA8750-14-2-023.
Opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Oracle or DARPA.}
%%% Local Variables:
%%% mode: latex

View File

@ -1,7 +1,7 @@
%!TEX root = ../main.tex
As the user makes edits in the spreadsheet interface, the corresponding actions are recorded in the notebook as a \langname script.
Although these scripts do encode the evaluation logic that generates the spreadsheet being displayed, they also serve as an audit trail, tool for reverting or altering older edits, and vector for generalizing the same curation process to new data.
Although these scripts do encode the evaluation logic that generates the spreadsheet being displayed, they also serve as an audit trail, tool for reverting or altering older edits, and template for generalizing the same curation process to new data.
As such, \langname is subject to a different set of optimization goals than most programming languages.
Rather than optimizing for performance or resource usage as in a normal optimizer, \langname needs an optimizer that prioritizes both \textit{readability} and \textit{generality}.

View File

@ -3,7 +3,7 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}
\centering
\includegraphics[width=0.8\columnwidth]{graphics/vizir-ui-new-two-columns}
\includegraphics[width=0.75\columnwidth]{graphics/vizir-ui-new-two-columns}
\caption{An example of \sysname's UI}
\label{fig:hybridinterface}
\vspace*{-3mm}
@ -21,7 +21,7 @@ Notebook interfaces like Jupyter's use an analogy of pages in a notebook that co
Each page in a \sysname notebook can be thought of as a block of SQL DML/DDL code that imperatively manipulates a single relation, which is displayed as a table or visualization. Pages are evaluated in sequential order. Code defining later pages may reference preceding pages as if they were views, and edits to a page may result in cascading changes to pages that depend on it.
We refer to this SQL-based language as the \sysname user action language (\langname).
In spite of its imperative flavor, operators in \langname form a monad that can be compiled down to a generalized form of relational algebra~\cite{AG16,AG14a}.
In spite of its imperative flavor, operators in \langname form a monad that can be compiled down to a generalized form of relational algebra~\cite{AG16}.
%This same imperative flavor also carries several benefits: (1) Singleton operations are easy to express as explicif updates, (2) Positional semantics are clearly defined by context, (3) It is easier to translate a sequence of user interactions with a spreadsheet into a sequence of imperative operations.
%Due to space constraints, we will only sketch the language through examples in this paper; a full description is left to future work.
@ -45,14 +45,16 @@ Figure~\ref{fig:program} shows an example \langname script that loads a CSV file
The script defines a sequence of declarative transformations on the data imported by the \texttt{LOAD} operation in the first line.
The entire script can be rewritten into a SQL query:
\begin{lstlisting}[morekeywords={LOAD}]
SELECT *, total = CASE WHEN ID = 90 THEN 1020
ELSE price*(1-discount) END
SELECT *, CASE WHEN ID = 90 THEN 1020
ELSE price*(1-discount)
END AS total
FROM LOAD(lineitem.csv)
UNION ALL
SELECT name = 'table', price = 10,
discount = 0.05, total = 9.5
SELECT 'table' AS name, 10 AS price,
0.05 AS discount, 9.5 AS total
\end{lstlisting}
Imperative-flavored declarative syntax has been repeatedly found to be more user-friendly than classic declarative syntax~\cite{Olston:2008:PLN:1376616.1376726,Sowell:2009aa}. Here however, it also serves to highlight the compositional nature of interactive views: each user action that changes the view's schema or contents is reflected in the script by a new statement appended to its end. Thus, we aim for --- in principle at least --- a bi-directional mapping between user actions and statements in \langname.
Imperative-flavored declarative syntax has been repeatedly found to be more user-friendly than classic declarative syntax~\cite{Olston:2008:PLN:1376616.1376726}. % ,Sowell:2009aa}.
Here however, it also serves to highlight the compositional nature of interactive views: each user action that changes the view's schema or contents is reflected in the script by a new statement appended to its end. Thus, we aim for --- in principle at least --- a bi-directional mapping between user actions and statements in \langname.
In addition to enabling singletons and being easy to integrate with spreadsheets, the imperative flavor of \langname also enables a form of backtracking and branching. As illustrated in Figure~\ref{fig:hybridinterface}, users can quickly try out hypothetical changes by checkpointing program state and applying a variant sequence of edits. \sysname will support a comprehensive suite of branching and merging capabilities for both data~\cite{NA16} and workflows~\cite{SV08}.
@ -79,7 +81,7 @@ page := LOAD 'file' | page ; s
% JF: this is a bit confusing -- are we assuming that the pages are linear and each page depends on the previous page? what about branches?
As a user edits tables and visualizations directly, these edits are reflected in the page where the table resides and is propagated to subsequent pages that depend on it. The user's edits, whether applied via the spreadsheet or notebook UI, are recorded as a form of workflow provenance~\cite{SV08,CF12a,AD11c,DC07}. Our goal is not to reproduce the full interface of a spreadsheet, but rather to replicate as many of the
As a user edits tables and visualizations directly, these edits are reflected in the page where the table resides and are propagated to subsequent pages that depend on it. The user's edits, whether applied via the spreadsheet or notebook UI, are recorded as a form of workflow provenance~\cite{SV08,CF12a,AD11c,DC07}. Our goal is not to reproduce the full interface of a spreadsheet, but rather to replicate as many of the
flexible data and schema manipulation features as possible within a more structured framework. \sysname's UI allows users to:\\
%%\hidecomment{
%%As a user edits tables and visualizations directly, these edits are reflected in the page where the table resides and they are also propagated to subsequent pages. The user's edits, whether applied via the spreadsheet or notebook UI, are recorded as a form of workflow provenance~\cite{SV08,CF12a,AD11c,DC07}. Note that our goal is not to reproduce the full interface of a spreadsheet, but rather to replicate many of the
@ -93,7 +95,7 @@ flexible data and schema manipulation features as possible within a more structu
\inlineitem{Sort data} A dropdown menu allows users to sort data according to values in one or more columns.\\
\inlineitem{Filter data} A dropdown menu allows users to filter out rows according to a formula defined over the row.\\
%\end{compactitem}
Many of these operations (e.g., paste, typecast) require the user to define a target, normally specified as rectangular area selected by clicking and dragging with the cursor; We also propose to support declarative regions, as discussed below.
Many of these operations (e.g., paste, typecast) require the user to define a target, normally specified as a rectangular area selected by clicking and dragging with the cursor; We also propose to support declarative regions, as discussed below.
\subsection{Spreadsheet to Notebook and Back}
To create a seamless interface between the spreadsheet and notebook UIs, we need to map operational semantics and effects between the two interaction models. We now sketch solutions to several of the resulting challenges.

View File

@ -4,7 +4,7 @@ In spite of the availability of powerful automated curation, cleaning, and analy
Key among these is the simplicity with which users can define exceptions to bulk set-at-a-time operations in both a spreadsheet and a notebook setting.
In this paper, we examine the spreadsheet and notebook interface models, and explore how lessons from both can be incorporated into relational database interfaces.
We present a new user interface for data curation and a tool implementing this interface called \sysname.
\sysname will combine UI elements from both spreadsheets and notebooks and will support functionality not commonly found in either spreadsheets or notebooks, including automated curation operators~\cite{Yang:2015:LOA:2824032.2824055}, deployment of curation workflows over large datasets~\cite{Kandel:2011:WIV:1978942.1979444}, declarative queries~\cite{AG14a,Olston:2008:PLN:1376616.1376726}, and support for exploratory curation tasks~\cite{SV08}.
\sysname will combine UI elements from both spreadsheets and notebooks and will support functionality not commonly found in either spreadsheets or notebooks, including automated curation operators~\cite{Yang:2015:LOA:2824032.2824055}, deployment of curation workflows over large datasets~\cite{Kandel:2011:WIV:1978942.1979444}, declarative queries~\cite{AG16,Olston:2008:PLN:1376616.1376726}, and support for exploratory curation tasks~\cite{SV08}.
This hybrid UI enables powerful relational queries, while still being flexible enough to permit easy data manipulation, summarization, and visualization.
@ -41,7 +41,7 @@ For example,
(2) repairs for data errors may be easy to define for individual cases, but far harder to define in a general case;
(3) complex data transformations that need to be generalized would still be easier to define for individual test cases than in bulk.
By making it easy for users to break the rules, even if only temporarily, spreadsheets empower users to explore data, evaluate options, and better understand the effects of their curation efforts.
Such violations, or \textit{singleton} operations, are not handled gracefully by existing relational DBMSes.
Such exceptions, or \textit{singleton} operations, are not handled gracefully by existing relational DBMSes.
However spreadsheets also have several drawbacks compared to a DBMS:\\
% Furthermore, they can sometimes help users to repeatedly apply such computations to ``more'' data.
% For example, by using a spreadsheet's copy/fill paste feature, a single formula can be mapped over a many cells.
@ -137,7 +137,7 @@ As the user edits the view, the user's actions are seamlessly transformed into a
This program serves as a form of history, allowing the user to revisit and revise earlier edits, even out of order. Furthermore, the program defines a workflow, highly specialized to a specific dataset.
Even this is sufficient to provide classical benefits of workflow provenance such as auditability and explainability for derived data.
Once an interactive view is developed for one dataset, it can more readily be adapted to new data or to react to changes in its inputs.
Recasting the user's actions programmatically allows us to leverage existing work on algebraic equivalences~\cite{Liu:2009:SAD:1546683.1547431} and program rewriting~\cite{AG14a,AG16} to first obtain multiple interpretations of sequences of user actions, and then to extrapolate more general expressions of the user's intent~\cite{Deutch:2016aa,Zloof:1975:QE:1499949.1500034}.
Recasting the user's actions programmatically allows us to leverage existing work on algebraic equivalences~\cite{Liu:2009:SAD:1546683.1547431} and program rewriting~\cite{AG16} to first obtain multiple interpretations of sequences of user actions, and then to extrapolate more general expressions of the user's intent~\cite{Deutch:2016aa,Zloof:1975:QE:1499949.1500034}.
% JF: This should not be a new parag. -- flow into: An important challenge we consider is...
%\tinysection{Overview} In this paper, we outline the technical challenges of implementing interactive views and sketch our proposed solutions.
An important challenge is controlling unexpected side effects arising from these edits.

View File

@ -18,7 +18,7 @@ DataSpread~\cite{Bendre:2015:DUD:2824032.2824121} extends spreadsheets with rela
The idea of generalizing singleton operations is based on Query by Example~\cite{Zloof:1975:QE:1499949.1500034} and Query by Explanation~\cite{Deutch:2016aa}. As individual operations are grouped together, the system can learn to describe what the user is attempting to accomplish. We plan to draw heavily on work in this area to develop \sysname's generalization engine.
% ~\cite{SV08,CF12a,AD11c,DC07}
As the basis for the notebook-style interface and script provenance, we leverage work on scientific workflows~\cite{SV08,CF12a,DC07}, and we borrow ideas from
reenactment~\cite{AG14a,AG16} as the basis for \langname scripts.
reenactment~\cite{AG16} as the basis for \langname scripts.
%%% Local Variables: