paper-HILDA-2016-Spreadsheets/sections/language.tex

69 lines
3.2 KiB
TeX
Raw Normal View History

2016-04-10 01:35:40 -04:00
%!TEX root = ../main.tex
Interactive views in \sysname are backed by an imperative-flavored language: The \sysname user action language (\langname). Although appearing imperative, operators in \langname form a monad that can be compiled down to a slightly generalized form of relational algebra. We now overview \langname, its general properties, and how it connects to user actions in \sysname's UI. Recall that our goal is not to reconstruct the full spreadsheet interface, but rather to define a form of relational algebra that naturally admits singleton operations and positional semantics, enabling it to mirror actions on the frontend.
\subsection{Data Model}
A \langname script is a monad over \textit{data frames}, or ordered lists of uniform-width tuples of primitive typed values, or cells. For clarity of presentation, we will first discuss \langname considering only real-valued primitives, before defining the more complex type-system actually used.
Each attribute position, or column of a data frame has a globally unique identifier (a column id) and an optional human-readable name. Each tuple, or row of a data frame also has a globally unique identifier (a row id). Cells are thus uniquely identified by a pair of row and column ids.
\subsection{Anatomy of a \langname script}
\begin{figure}
\begin{verbatim}
LOAD 'input.csv'
ADD COLUMN total;
UPDATE total = price * (1 - discount)
INSERT ROW x AT LINE 9;
UPDATE name = 'table', price = 10, discount = 0.05,
total = price * (1-discount) WHERE id = x;
\end{verbatim}
\caption{An example \langname program}
\label{fig:program}
\end{figure}
An example \langname script is shown in Figure~\ref{fig:program}. Scripts begin with a \texttt{LOAD} statement that initializes the frame, declaring a set of columns and populating the frame with data drawn from either a CSV file or a frame defined by a previous page.
A \textit{selector} in \langname identifies a rectangular region of cells and consists of a column selector and a row selector. A column selector identifies a set of columns by their ids. Row selectors operate according to one of two semantics: by row id, or by some predicate over the row's attributes. We refer to these semantics as universal and qualitative, respectively.
\subsection{\langname}
\newcommand{\vizcommand}[2]{\noindent\texttt{#1}\\{#2}}
\vizcommand{LOAD \{frame | file\}}{
The load operation initializes a data frame and is the first line of any \langname script. The frame is initialized either as a copy of an existing frame identified by name, or by importing a CSV file.
}
\vizcommand{UPDATE \{formula\} WHERE \{selector\}}{
The update operation modifies values in a rectangular region defined by a selector according to the specified formula.
}
Ways to define to a collection
\begin{itemize}
\item Set of IDs
\item Range of Positions
\item Property of the Data
\item Everything
\end{itemize}
Context:
\begin{itemize}
\item Sort order
\item Unique Identifiers
\item
\end{itemize}
Operations:
\begin{itemize}
\item UPDATE ... WHERE ...
\item DELETE WHERE ...
\item SORT BY ... / ARRANGE AS ...
\item INSERT ROW[S] ... / INSERT ... / LOAD ...
\item ADD COLUMN ... / DROP COLUMN ... / ALTER COLUMN ...
\item GROUP BY ... (a'la pivot tables)
\end{itemize}