Merge branch 'master' of gitlab.odin.cse.buffalo.edu:Vizier/2016-HILDA-Interactive

This commit is contained in:
Oliver Kennedy 2016-04-25 22:32:16 -04:00
commit 4e84856656
2 changed files with 12 additions and 8 deletions

View file

@ -56,8 +56,12 @@ In addition to enabling singletons and being easy to integrate with spreadsheets
\subsection{The Spreadsheet UI}
% JF: this is a bit confusing -- are we assuming that the pages are linear and each page depends on the previous page? what about branches?
As a user edits tables and visualizations directly, these edits are reflected in the page where the table resides and they are also propagated to subsequent pages that depend on it. The user's edits, whether applied via the spreadsheet or notebook UI, are recorded as a form of workflow provenance~\cite{SV08,CF12a,AD11c,DC07}. Note that our goal is not to reproduce the full interface of a spreadsheet, but rather to replicate as many of the
flexible data and schema manipulation features as possible within a more structured framework. Concretely, \sysname's UI allows users to:\\
\hidecomment{
As a user edits tables and visualizations directly, these edits are reflected in the page where the table resides and they are also propagated to subsequent pages. The user's edits, whether applied via the spreadsheet or notebook UI, are recorded as a form of workflow provenance~\cite{SV08,CF12a,AD11c,DC07}. Note that our goal is not to reproduce the full interface of a spreadsheet, but rather to replicate many of the
flexible data and schema manipulation features within a more structured framework. Concretely, \sysname's UI allows users to:\\
flexible data and schema manipulation features within a more structured framework. Concretely, \sysname's UI allows users to:\\}
%\begin{compactitem}
\inlineitem{Overwrite arbitrary cells with constants, formulas, or regular expressions} Users may click on any cell in the output to overwrite its contents (as in a spreadsheet). \\
% JF rm to save space -- it sounded redundant: with a constant value or a new formula

View file

@ -1,21 +1,21 @@
%!TEX root = ../main.tex
The fundamental unit of data in \langname is a \textit{cell}, a 3-tuple: $C = \tuple{id, f, v}$, consisting of a globally unique identifier $id$, a formula expression $f$, and a value $v$. The identifier of a cell is assigned to it when it is first accessed and is immutable --- even if the cell is moved to a different position in the spreadsheet.
By combining storing both a formula $f$ and its result $v$, a cell stores provenance similar to the provenance in a provenance-aware data management system, where each record is associated with metadata describing how it was computed.
By storing both a formula $f$ and its result $v$, a cell maintains data provenance akin to a provenance-aware data management system, where each record is associated with metadata describing how it was computed.
Here, this metadata serves two purposes.
First, as noted above, we need to be able to reliably materialize the formula backing each cell so that it may be edited. We need to ensure that each operator defines precise semantics for how it affects formulas.
Second, and perhaps more importantly, we track both values and the formula used to derive them as a way to define operational semantics that minimize user surprise. As we will discuss shortly, one specific update to a spreadsheet may have many secondary, incidental effects on formulas and/or values in a spreadsheet. By tracking both, we can better understand these effects and ensure that the complexities and unexpected side-effects of an operation are minimized.
First, as noted above, we need to be able to reliably materialize the formula backing each cell so that it can be edited. We need to ensure that each operator defines precise semantics for how it affects formulas.
Second, and perhaps more importantly, we track both values and the formula used to derive them as a way to define operational semantics that minimize user surprise. As we discuss shortly, one specific update to a spreadsheet may have many secondary, incidental effects on formulas and/or values in a spreadsheet. By tracking both, we can better understand these effects and ensure that the complexities and unexpected side-effects of an operation are minimized.
\tinysection{Coordinate System} Cells are arranged into a 2-dimensional grid of rows and columns indexed by a coordinate system, a function $s : \mathbb N \times \mathbb N \rightarrow id$ that maps positions in the grid to the cell occupying that position. The function $s$ need not be complete, but must be one-to-one; A cell may only appear in one position in the spreadsheet.
\tinysection{Coordinate System} Cells are arranged into a 2-dimensional grid of rows and columns indexed by a coordinate system, a function $s : \mathbb N \times \mathbb N \rightarrow id$ that maps positions in the grid to the cell occupying that position. The function $s$ need not be complete, but must be one-to-one: a cell may only appear in one position in the spreadsheet.
\tinysection{Formulas} A formula is a primitive-valued expression that may include references to the values of other cells, identified by the cell's global id or by absolute coordinates (explicit and absolute references, respectively). A formula evaluated in the context of a cell may also specify coordinate references as being relative to the cell (relative references). Columns are usually denoted by letters, Rows by numbers,
A \textit{state} is the 2-tuple $\tuple{ C, s }$, consisting of a set of cells $C = \{C_i\}$ and a coordinate system.
\tinysection{Formulas} A formula is a primitive-valued expression that may include references to the values of other cells, identified by the cell's global id or by absolute coordinates (explicit and absolute references, respectively). A formula evaluated in the context of a cell may also specify coordinate references as being relative to the cell (relative references). Columns are usually denoted by letters and rows by numbers.
A \textit{state} is a 2-tuple $\tuple{ C, s }$ consisting of a set of cells $C = \{C_i\}$ and a coordinate system.
We say that a formula $f$ evaluates to a value $v$ in the context of a given state ($f \mapsto_{\tuple{C,s}} v$) if, after replacing all references (coordinate references using $s$ and $C$, and explicit references using $C$), the formula reduces to $v$~\footnote{Similar operational semantics were previously proposed by Krishnamurthi and Ramakrishnan~\cite{Erwig2002}.}.
%
We say that a state $\tuple{C, s}$ is \textit{valid}\footnote{Note that this definition does not preclude direct or indirect circular references as long as the computations defined by the cell formulas have a fixpoint. However, such a fixpoint computation may be hard to understand for a user and, thus, we disallow circular references for now.} if each cell's formula evaluates to the cell's value:
$\forall \tuple{id_i, f_i, v_i} \in C\;:\; f_i \mapsto_{\tuple{C,s}} v_i$
User \textit{actions} in \langname, transform a state $\tuple{C_1, s_1}$ into a new state $\tuple{C_2, s_2}$. %
User \textit{actions} in \langname transform a state $\tuple{C_1, s_1}$ into a new state $\tuple{C_2, s_2}$. %
We call the semantics for an action correct if they ensure that if the input to an action is valid, then the output is also valid.
%We also focus on two classes of action: (1) \textit{data actions} that change only the spreadsheet's cells (i.e., for which $s_1 = s_2$), and (2) \textit{structural actions} that alter the spreadsheet's coordinate system and only modify the spreadsheet's cells to the extent necessary to preserve validity under the new coordinate system.