bew
parent
84efbd03b9
commit
6bb54dc0cf
2
main.tex
2
main.tex
|
@ -2,6 +2,8 @@
|
|||
|
||||
\usepackage{cleveref}
|
||||
\usepackage{todonotes}
|
||||
\usepackage{amsmath}
|
||||
\usepackage{xspace}
|
||||
|
||||
\input{macros}
|
||||
|
||||
|
|
|
@ -22,9 +22,12 @@
|
|||
\newcommand{\rangeOf}[2]{(#1\texttt{:}#2)}
|
||||
\newcommand{\range}{\rangeOf{\columnRange}{\rowRange}}
|
||||
\newcommand{\cellRef}[2]{(#1\texttt{:}#2)}
|
||||
\newcommand{\evalOf}[2]{\left\llbracket\; #2 \;\right\rrbracket_{#1}}
|
||||
\newcommand{\patternOf}[2]{\left\llbracket\; #1 \;\right\rrbracket_{#2}}
|
||||
\newcommand{\evalOf}[2]{\left\llbracket\; #2 \;\right\rrbracket_{#1}}
|
||||
\newcommand{\patternOf}[2]{\left\llbracket\; #1 \;\right\rrbracket_{#2}}
|
||||
\newcommand{\depsOf}[1]{\textbf{deps}\left(#1\right)}
|
||||
\newcommand{\tdepsOf}[1]{\ensuremath{\text{\textbf{deps}\ast\left(#1\right)}\xspace}}
|
||||
\newcommand{\DG}[1]{\ensuremath{G_{#1}}\xspace}
|
||||
\newcommand{\TDG}[1]{\ensuremath{G_{#1}^*}\xspace}
|
||||
|
||||
Let $\columnDomain$ and $\rowDomain$ denote a domain of column and row (respectively) labels, and let $\exprDomain$ and $\valueDomain$ denote a domain of expressions and values; We will define $\exprDomain$ in greater detail below.
|
||||
We define a \emph{spreadsheet} $\spreadsheet : (\columnDomain \times \rowDomain) \rightarrow \exprDomain$ as a mapping from \emph{cells} ($(\column, \row) \in (\columnDomain \times \rowDomain)$) to expressions.
|
||||
|
@ -37,12 +40,13 @@ The expression $\expr$ may be evaluated in the context of a spreadsheet ($\evalO
|
|||
$$\evalOf{\spreadsheet}{\cellRef{\column}{\row}} \equiv \evalOf{\spreadsheet}{\spreadsheet(\column, \row)}$$
|
||||
Cyclic references evaluate to a distinguished error value in $\valueDomain$.
|
||||
|
||||
We define the dependencies of an expression ($\depsOf{\expr}$) to be the set of cells referenced by $\expr$.
|
||||
Expression dependencies induce a graph $\tuple{V, E}$ over the spreadsheet, where each cell is a node (i.e., $V = \columnDomain \times \rowDomain$), and each dependency is a (directed) edge:
|
||||
$$E = \bigcup_{\cell \in \columnDomain \times \rowDomain}
|
||||
We define the dependencies of an expression ($\depsOf{\expr}$) to be the set of cells referenced by $\expr$.
|
||||
Expression dependencies induce a graph $\DG{\spreadsheet}\tuple{V, E}$ over the spreadsheet, where each cell is a node (i.e., $V = \columnDomain \times \rowDomain$), and each dependency is a (directed) edge:
|
||||
$$E = \bigcup_{\cell \in \columnDomain \times \rowDomain}
|
||||
\{\;\cell \rightarrow \cell'\;|\;\cell' \in \depsOf{\spreadsheet(\cell)}\;\} $$
|
||||
We use $\TDG{\spreadsheet}$ to denote the graph $\tuple{V,E^*}$ where $E^*$ is the transitive closure of $E$, i.e., $\TDG{\spreadsheet}$ stores both direct and indirect dependencies among the cells in the spreadsheet.
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Compressed Updates}
|
||||
We adopt a common assumption in relational data: that $\columnDomain$ is small and $\rowDomain$ is large for a typical spreadsheet.
|
||||
Accordingly, we define $\rowDomain$ for a spreadsheet with $N$ rows to be the range $[1,N]$, with rows identified by their position in the spreadsheet.
|
||||
|
@ -55,8 +59,8 @@ To compactly encode more general sets of rows, we define a \emph{row set} data s
|
|||
Observe that given two range sets $\rowRange$, $\rowRange'$, satisfying the above properties, we can compute their intersection $\rowRange \cap \rowRange'$, union $\rowRange \cup \rowRange'$ and difference $\rowRange - \rowRange'$ in $O(|\rowRange|+|\rowRange'|)$ time; returning a row set that respects the same properties.
|
||||
|
||||
As previously noted, a spreadsheet user may apply a single formula to a range of cells in a single interaction.
|
||||
Typically, such formulas are defined by an expression \emph{pattern} $\pattern \in \patternDomain$, a more general form of an expression that may also include \emph{offset references}.
|
||||
An offset reference $\cellRef{\column}{\rowOffset}$ is defined by a column $\column \in \columnDomain$ and an integer row offset $\rowOffset \in \mathbb Z$.
|
||||
Typically, such formulas are defined by an expression \emph{pattern} $\pattern \in \patternDomain$, a more general form of an expression that may also include \emph{offset references}.
|
||||
An offset reference $\cellRef{\column}{\rowOffset}$ is defined by a column $\column \in \columnDomain$ and an integer row offset $\rowOffset \in \mathbb Z$.
|
||||
A pattern may be expanded in the context of a row ($\patternOf{\pattern}{\row}$); A pattern expands to an expression by replacing every offset reference with an explicit cell reference at the corresponding offset:
|
||||
$$\patternOf{\cellRef{\column}{\rowOffset}}{\row} = \cellRef{\column}{\row + \rowOffset}$$
|
||||
|
||||
|
@ -68,14 +72,17 @@ $$\spreadsheet \equiv \comprehension{
|
|||
(\column, \row) \in \range,
|
||||
\tuple{\range, \pattern}\in \encodedSpreadsheet
|
||||
}$$
|
||||
Informally, the expression at a cell $\cell$ is defined by identifying the range in $\encodedSpreadsheet$ that contains $\cell$, and expanding the pattern in the context of $\cell$'s row. We require that the set of ranges in $\encodedSpreadsheet$ be disjoint;
|
||||
Informally, the expression at a cell $\cell$ is defined by identifying the range in $\encodedSpreadsheet$ that contains $\cell$, and expanding the pattern in the context of $\cell$'s row. We require that the set of ranges in $\encodedSpreadsheet$ be disjoint;
|
||||
If the set of ranges is not complete, the expression for cells not covered by a range is defined to be the literal null.
|
||||
|
||||
|
||||
|
||||
\subsection{Update Index}
|
||||
Evaluating a cell in a spreadsheet requires evaluating transitive dependencies;
|
||||
Evaluating a cell in a spreadsheet requires evaluating transitive dependencies;
|
||||
The spreadsheet may thus be viewed as a graph, with one node for each cell and one edge for each dependency.
|
||||
The update index maintains
|
||||
|
||||
The update index maintains
|
||||
|
||||
%%% Local Variables:
|
||||
%%% mode: latex
|
||||
%%% TeX-master: "../main"
|
||||
%%% End:
|
||||
|
|
|
@ -2,13 +2,17 @@
|
|||
\section{System Overview}
|
||||
\label{sec:overview}
|
||||
|
||||
\BG{define the spreadsheet model first?}
|
||||
|
||||
\newcommand{\errorval}{\ensuremath{\bot}\xspace}
|
||||
|
||||
A spreadsheet is a regular grid of cells, which are defined by formulas.
|
||||
A cell's formula may be a literal value, or an expression defining a computation that may be based on the value of other cells.
|
||||
The value of a cell is the result of evaluating the cell's formula.
|
||||
This may require obtaining the value of cells on which the formula depends; we refer to such cells as \emph{upstream} cells.
|
||||
When a cell is modified, the values of downstream (i.e., dependent) cells are updated accordingly;
|
||||
This may require obtaining the value of cells on which the formula depends; we refer to such cells as \emph{direct prerequisite} cells. If cell
|
||||
When a cell is modified, the values of \emph{dependent} cells, i.e., cells that use have to be updated.
|
||||
That is, in contrast to a relational table, which can be updated by a sequence of imperative operations, the formulas of a spreadsheet are evaluated (conceptually) at the same time.
|
||||
A cycle in the dependency graph (i.e., a cell being upstream of itself) is an error, and any cells participating in the cycle evaluate to a special error value.
|
||||
A cycle in the dependency graph (i.e., a cell being upstream of itself) is an error, and any cells participating in the cycle evaluate to a special error value $\errorval$.
|
||||
|
||||
In contrast to classical spreadsheets, where each cell is a completely independent entity, we adopt the Relational spreadsheet model~\cite{DBLP:conf/cidr/BakkeB11}, which focuses on so-called `tidy data,' where each row is one record, and each column represents a distinct (strongly typed) variable.
|
||||
This approach incentivizes usage patterns that streamline data caching and make it easier to implement on-disk: Critically, columns and type information are available in a static context even before data is loaded, while the need for dynamic data access via caching is limited to a one-dimensional index on records.
|
||||
|
@ -47,7 +51,7 @@ The layer also provides push access to cell values through notifications that fi
|
|||
|
||||
This layer also acts as a visibility filter over the dataset.
|
||||
The user interface explicitly maintains a subset of cells that are ``active'' (i.e., in or near the viewable area).
|
||||
The data update layer extends this subset based on the transitive closure of the active cells with cells that are upstream of active cells.
|
||||
The data update layer extends this subset based on the transitive closure of the active cells with cells that are upstream of active cells.
|
||||
Only active cells are maintained.
|
||||
|
||||
We discuss the data update layer in greater depth in \Cref{sec:data}.
|
||||
|
@ -65,7 +69,7 @@ To identify rows, we considered two approaches: (i) identifying rows by unique i
|
|||
|
||||
Assigning each row a unique identifier poses several scalability challenges.
|
||||
First, this mapping makes caching more challenging, as row identifiers must be persisted in the original dataset.
|
||||
Moreover, unique identifiers can not be used to partition the source data into rows of consistent size.
|
||||
Moreover, unique identifiers can not be used to partition the source data into set of rows of consistent size.
|
||||
Finally, unique identifiers preclude rule based updates, as we describe in \Cref{sec:data}.
|
||||
|
||||
Positional references can compactly encode contiguous ranges of rows.
|
||||
|
@ -105,4 +109,8 @@ Errors in a transformation to an earlier reference frame indicate inserted rows,
|
|||
|
||||
\begin{example}
|
||||
Some example of reference frames in practice. Insert, delete, move, etc...
|
||||
\end{example}
|
||||
\end{example}
|
||||
%%% Local Variables:
|
||||
%%% mode: latex
|
||||
%%% TeX-master: "../main"
|
||||
%%% End:
|
||||
|
|
Loading…
Reference in New Issue