%!TEX root = ../main.tex Interactive views in \sysname are backed by an imperative-flavored language: The \sysname user action language (\langname). Although appearing imperative, operators in \langname form a monad that can be compiled down to a slightly generalized form of relational algebra. We now overview \langname, its data model, general properties, and how it connects to user actions in \sysname's UI. Recall that our goal is not to reconstruct the full spreadsheet interface, but rather to define a form of relational algebra that naturally admits singleton operations and positional semantics, enabling it to mirror actions on the frontend. \subsection{Data Model} The fundamental unit of data in \langname is a \textit{cell}, a 3-tuple: $C_i = \tuple{id_i, f_i, v_i}$, consisting of a globally unique identifier $id_i$, a formula expression $f_i$, and a value $v_i$. Note that we maintain both a value and the formula used to derive it for each cell, a property we will exploit below when discussing the interface. Cells are also arranged into a 2-dimensional grid of rows and columns indexed by a coordinate system, a function $s : \mathbb N \times \mathbb N \rightarrow id$ that maps positions in the grid to the cell occupying that position. The function $s$ need not be complete, but must be one-to-one; A cell may only appear in one position in the spreadsheet. A formula is a primitive-valued expression that may include references to the values of other cells, identified by the cell's global id, by absolute coordinates (explicit and absolute references, respectively). A formula evaluated in the context of a cell may also specify coordinate references as being relative to the cell (relative references). Columns are usually denoted by letters, Rows by numbers, A \textit{state} is the 2-tuple $\tuple{ C, s }$, consisting of a set of cells $C = \{C_i\}$ and a coordinate system. We say that a formula $f$ evaluates to a value $v$ in the context of a given state ($f \mapsto_{\tuple{C,s}} v$) if, after replacing all references (coordinate references using $s$ and $C$, and explicit references using $C$), the formula reduces to $v$~\footnote{Similar operational semantics were previously proposed by Krishnamurthi and Ramakrishnan~\cite{Erwig2002}}. % We say that a state $\tuple{C, s}$ is \textit{valid} if each cell's formula evaluates to the cell's value: $$\forall \tuple{id_i, f_i, v_i} \in C\;:\; f_i \mapsto_{\tuple{C,s}} v_i$$ User \textit{actions} in \langname, transform a state $\tuple{C_1, s_1}$ into a new state $\tuple{C_2, s_2}$. % We call the semantics for an action correct if they ensure that if the input to an action is valid, then the output is also valid. % We also focus on two classes of action: (1) \textit{data actions} that change only the spreadsheet's cells (i.e., for which $s_1 = s_2$), and (2) \textit{structural actions} that alter the spreadsheet's coordinate system and only modify the spreadsheet's cells to the extent necessary to preserve validity under the new coordinate system. \begin{figure*} \centering \begin{subfigure}{0.3\textwidth} \centering \begin{tabular}{>{\tiny}rc|c|c} & \tiny A & \tiny B & \tiny C \\ 1& Alice & 10 & \texttt{=B1} (10)\\ \hline 2& Bob & 4 & \texttt{=B2+C1} (14)\\ \hline 3& Carol & 8 & \texttt{=B3+C2} (22)\\ \hline 4& Dave & 9 & \texttt{=B4+C3} (31) \end{tabular} \caption{Initial State} \label{fig:rearrange:initial} \end{subfigure} % \begin{subfigure}{0.3\textwidth} \centering \begin{tabular}{>{\tiny}rc|c|c} & \tiny A & \tiny B & \tiny C \\ 1& Alice & 10 & \texttt{=B1} (10)\\ \hline 2& Carol & 8 & \texttt{=B2+C3} (22)\\ \hline 3& Bob & 4 & \texttt{=B3+C1} (14)\\ \hline 4& Dave & 9 & \texttt{=B4+C3} (31) \end{tabular} \caption{After swapping rows 2 and 3} \label{fig:rearrange:manual} \end{subfigure} % \begin{subfigure}{0.3\textwidth} \centering \begin{tabular}{>{\tiny}rc|c|c} & \tiny A & \tiny B & \tiny C \\ 1& Alice & 10 & \texttt{=B1} (10)\\ \hline 2& Dave & 9 & \texttt{=B2+C1} (19)\\ \hline 3& Carol & 8 & \texttt{=B3+C2} (27)\\ \hline 4& Bob & 4 & \texttt{=B4+C3} (31) \end{tabular} \caption{After sorting on column 'B'} \label{fig:rearrange:sort} \end{subfigure} \caption{Examples of both swapping rows and sorting rows in commercial database systems.} \end{figure*} \subsection{Unsurprising Inconsistencies} User actions on a spreadsheet have not only direct, intended effects, but may also have indirect, \textit{incidental} effects. Examples include changing a formula (dependent formulas are recomputed), repositioning a row (formulas depending on the row are modified), or sorting (formulas are recomputed based on the new, sorted coordinate system). In modern commercial spreadsheet systems, the semantics of indirect effects at first appear to be inconsistent. Take, for example, two mechanisms for rearranging rows. For example, consider the table given in Figure~\ref{fig:rearrange:initial}, which shows a list of players (column A), scores (column B), and a cumulative total score (column C). % A user might manually drag row 3 to a position between rows 1 and 2, effecting a swap of rows 2 and 3. Microsoft Excel, Apple's Numbers, and Google's Sheets~\footnote{These and other behaviors described were evaluated on Excel for Mac version 15.20, Numbers version 3.6.1, and Google Sheets as of April 2016} all have identical behavior, each resulting in the table shown in Figure~\ref{fig:rearrange:manual}. Note that the formulas for C2 and C3 have changed to ensure that each cell retains its original value under the transposed coordinate system. In other words, the user's \texttt{MOVE} action treats formula references as being explicit references. % Conversely, a user might sort the rows of the table in descending order on Column B. The resulting table in all three systems is identical, and shown in Figure~\ref{fig:rearrange:sort}. Here, the formulas in column C are changed only in appearance; each continues to reference the cells immediately to the left and above. However the values of each cell have changed as a result. In other words, the user's \texttt{SORT} action treats formula references as being relative references. At a high level, both actions are structural, as they only transform the coordinate system by rearranging the coordinate mapping; The effects on cells (the $C$ part of a state) are only incidental consequences of the new coordinate scheme. For each dependent cell that changes coordinates, the action must also something else to be correct. The distinction between the two example actions is quite significant, because it underlies a fundamental tradeoff in minimizing the ``surprising'' incidental effects~\cite{saltzer2009principles} of a change in coordinates: For \texttt{MOVE}, cell formulas are \textit{translated} into the new coordinate system to ensure that each cell's values stay the same, while for \texttt{SORT}, cell formulas are \textit{re-evaluated} in the new coordinate system, changing the values to ensure that the formulas stay the same. We leave the optimization of this tradeoff to future work, but observe that virtually all structural actions we explored (drag cell, rearrange rows, filter, etc\ldots) favor minimizing changes in values. % % % % % % % % % % %A \langname script is a monad over \textit{data frames}, or ordered lists of uniform-width tuples of primitive typed values, or cells. For clarity of presentation, we will first discuss \langname considering only real-valued primitives, before defining the more complex type-system actually used. %Each attribute position, or column of a data frame has a globally unique identifier (a column id) and an optional human-readable name. Each tuple, or row of a data frame also has a globally unique identifier (a row id). Cells are thus uniquely identified by a pair of row and column ids. % %\subsection{Anatomy of a \langname script} % %\begin{figure} %\begin{verbatim} %LOAD 'input.csv' %ADD COLUMN total; %UPDATE total = price * (1 - discount) %INSERT ROW x AT LINE 9; %UPDATE name = 'table', price = 10, discount = 0.05, % total = price * (1-discount) WHERE id = x; %\end{verbatim} %\caption{An example \langname program} %\label{fig:program} %\end{figure} % %An example \langname script is shown in Figure~\ref{fig:program}. Scripts begin with a \texttt{LOAD} statement that initializes the frame, declaring a set of columns and populating the frame with data drawn from either a CSV file or a frame defined by a previous page. % % % % %A \textit{selector} in \langname identifies a rectangular region of cells and consists of a column selector and a row selector. A column selector identifies a set of columns by their ids. Row selectors operate according to one of two semantics: by row id, or by some predicate over the row's attributes. We refer to these semantics as universal and qualitative, respectively. % %\subsection{\langname} % %\newcommand{\vizcommand}[2]{\noindent\texttt{#1}\\{#2}} % %\vizcommand{LOAD \{frame | file\}}{ %The load operation initializes a data frame and is the first line of any \langname script. The frame is initialized either as a copy of an existing frame identified by name, or by importing a CSV file. %} % %\vizcommand{UPDATE \{formula\} WHERE \{selector\}}{ %The update operation modifies values in a rectangular region defined by a selector according to the specified formula. %} % % % %Ways to define to a collection %\begin{itemize} %\item Set of IDs %\item Range of Positions %\item Property of the Data %\item Everything %\end{itemize} % % %Context: %\begin{itemize} %\item Sort order %\item Unique Identifiers %\item %\end{itemize} % %Operations: %\begin{itemize} %\item UPDATE ... WHERE ... %\item DELETE WHERE ... %\item SORT BY ... / ARRANGE AS ... %\item INSERT ROW[S] ... / INSERT ... / LOAD ... %\item ADD COLUMN ... / DROP COLUMN ... / ALTER COLUMN ... %\item GROUP BY ... (a'la pivot tables) %\end{itemize}