Finished brain dump of system
parent
772ba3aaa7
commit
ab74420bca
2
main.tex
2
main.tex
|
@ -165,7 +165,7 @@
|
|||
\input{sections/overview}
|
||||
\input{sections/formalism}
|
||||
\input{sections/system}
|
||||
\input{sections/data}
|
||||
% \input{sections/data}
|
||||
\input{sections/relwork}
|
||||
|
||||
\input{sections/conclusions}
|
||||
|
|
|
@ -59,56 +59,6 @@ We discuss the data update layer in greater depth in \Cref{sec:data}.
|
|||
|
||||
\todo{Discuss relevant aspects of the UI}
|
||||
|
||||
\subsection{Reference Frames}
|
||||
\label{sec:cellidentity}
|
||||
|
||||
A reference to a cell (e.g., in a formula) is given as the intersection of a specific row and column.
|
||||
Because Overlay adopts the relational spreadsheet model, the set of columns is available in a static context, making it easy to assign unique identifiers (e.g., column names).
|
||||
To identify rows, we considered two approaches: (i) identifying rows by unique identifiers, and (ii) identifying rows by their position.
|
||||
|
||||
Assigning each row a unique identifier poses several scalability challenges.
|
||||
First, this mapping makes caching more challenging, as row identifiers must be persisted in the original dataset.
|
||||
Moreover, unique identifiers can not be used to partition the source data into set of rows of consistent size.
|
||||
Finally, unique identifiers preclude rule based updates, as we describe in \Cref{sec:data}.
|
||||
|
||||
Positional references can compactly encode contiguous ranges of rows.
|
||||
However whenever a row is inserted or deleted, every reference to a row following the update must be updated.
|
||||
A similar approach is fractional indexing~\cite{DBLP:journals/jidm/HausteinHM010,DBLP:conf/sigmod/ONeilOPCSW04}, which allocate new identifiers in sequential order by using the midpoint of the predecessor and successor rows.
|
||||
An analogous approach uses tree structures with counts to accelerate positional access to rows~\cite{DBLP:conf/icde/BendreVZCP18}.
|
||||
Both of these approaches avoid the overhead of reference updates, but impose a logarithmic cost to point lookups of rows by their position.
|
||||
|
||||
Overlay adopts a positional style of reference, but augments it with a construct that we call a reference frame.
|
||||
Concretely, a row is identified by a 2-tuple $\tuple{i, \mathcal F}$, where $i$ is an integer position, and $\mathcal F$ is a reference frame.
|
||||
A reference frame is a function mapping positions to specific rows $\mathcal F : \mathbb Z \rightarrow \mathcal R$ (where $\mathcal R$ denotes the set of all rows).
|
||||
In other words, a row $\tuple{i, \mathcal F}$ denotes the row $\mathcal F(i)$.
|
||||
We observe that simple row insertions, deletions, or movement, apply a simple translation to a portion of the reference frame's domain.
|
||||
For example given initial reference frame $\mathcal F$, an insertion of three rows at position 5 defines a new reference frame $\mathcal F'$ as follows:
|
||||
$$\mathcal F'(x) = \begin{cases}
|
||||
\mathcal F(x) & \textbf{if } x < 5\\
|
||||
\text{[new row } x - 5\text{]} & \textbf{if } 5 \leq x \leq 7\\
|
||||
\mathcal F(x - 3) & \textbf{otherwise}
|
||||
\end{cases}$$
|
||||
Observe that, row positions defined with respect to $\mathcal F$ may be transformed in constant time into positions with respect to $\mathcal F'$.
|
||||
That is, we can define a reference frame transformation $T$:\tabularnewline
|
||||
$$T(x) = \begin{cases}
|
||||
x & \textbf{if } x < 5\\
|
||||
x+3 & \textbf{otherwise}
|
||||
\end{cases}$$
|
||||
For portions of the domain of $T$ that are defined, the function may be inverted:
|
||||
$$T^{-1}(x) = \begin{cases}
|
||||
x & \textbf{if } x < 5\\
|
||||
x-3 & \textbf{if } x > 7\\
|
||||
error & \textbf{otherwise}
|
||||
\end{cases}$$
|
||||
Thus $\mathcal F(T(x)) = \mathcal F'(x)$, and $\mathcal F'(T^{-1}(x)) = F(x)$. Similar translations exist for deletions and row moves.
|
||||
|
||||
Let $\mathcal F' = T_1 \circ \ldots \circ T_n \circ \mathcal F$, where $\circ$ denotes function composition.
|
||||
Given a history of transformations, any row $\tuple{i, \mathcal F}$ can be transformed into a later reference frame $\tuple{T_1(\ldots(T_n(x))), \mathcal F'}$, or an earlier one.
|
||||
Errors in a transformation to an earlier reference frame indicate inserted rows, while errors moving forward through reference frames indicate deleted rows.
|
||||
|
||||
\begin{example}
|
||||
Some example of reference frames in practice. Insert, delete, move, etc...
|
||||
\end{example}
|
||||
%%% Local Variables:
|
||||
%%% mode: latex
|
||||
%%% TeX-master: "../main"
|
||||
|
|
|
@ -140,12 +140,65 @@ To support these efficiently, we maintain a backward index that relates cell ran
|
|||
Analog to $\textbf{getDeps}$ inferring cells immediately upstream of a range of cells, we can infer the cells downstream of any cell or set of cells, with one caveat.
|
||||
When the cell identified an absolute reference in a pattern is modified, all cells using the pattern are invalidated, so we track the set of ranges over which any given pattern is defined.
|
||||
|
||||
\paragraph{Column Insertions and Deletions}
|
||||
\paragraph{Column Insertions, Deletions, and Moves}
|
||||
|
||||
At the index layer, columns are referenced by unique identifier; Ordering is imposed only at the presentation layer.
|
||||
Column insertion (or deletion) requires simply inserting (resp., removing) an entry from the forward and backward index.
|
||||
Column reordering requires no actions at the index level.
|
||||
|
||||
\paragraph{Row Insertions, Deletions, and Moves}
|
||||
|
||||
Rows are identified by their position.
|
||||
When a row is inserted, deleted, or moved, references to the affected rows (and rows following them) change and must be updated.
|
||||
This update can be expensive, as it may require defining an entirely new set of patterns
|
||||
|
||||
One alternative is fractional indexing~\cite{DBLP:journals/jidm/HausteinHM010,DBLP:conf/sigmod/ONeilOPCSW04}, where a new identifier can be allocated in between any two rows.
|
||||
An analogous approach uses tree structures with counts to accelerate positional access to rows~\cite{DBLP:conf/icde/BendreVZCP18}.
|
||||
Both of these approaches avoid the overhead of reference updates, but impose a logarithmic cost to look up individual rows by their position.
|
||||
|
||||
Instead, we adopt a lazy approach by associating every pattern with a construct that we call a reference frame, a function $\mathcal F$ a function mapping positions to specific rows $\mathcal F : \mathbb Z \rightarrow \mathcal R$ (where $\mathcal R$ denotes the set of all rows).
|
||||
In other words, the pair $\tuple{\row, \mathcal F}$ denotes the row $\mathcal F(\row)$.
|
||||
We observe that simple row insertions, deletions, or movement, apply a simple translation to a portion of the reference frame's domain.
|
||||
For example given initial reference frame $\mathcal F$, an insertion of three rows at position 5 defines a new reference frame $\mathcal F'$ as follows:
|
||||
$$\mathcal F'(\row) = \begin{cases}
|
||||
\mathcal F(\row) & \textbf{if } \row < 5\\
|
||||
\text{[new row } \row - 5\text{]} & \textbf{if } 5 \leq \row \leq 7\\
|
||||
\mathcal F(\row - 3) & \textbf{otherwise}
|
||||
\end{cases}$$
|
||||
Observe that, row positions defined with respect to $\mathcal F$ may be transformed in constant time into positions with respect to $\mathcal F'$.
|
||||
That is, we can define a reference frame transformation $T$:\tabularnewline
|
||||
$$T(\row) = \begin{cases}
|
||||
\row & \textbf{if } \row < 5\\
|
||||
\row+3 & \textbf{otherwise}
|
||||
\end{cases}$$
|
||||
For portions of the domain of $T$ that are defined, the function may be inverted:
|
||||
$$T^{-1}(\row) = \begin{cases}
|
||||
\row & \textbf{if } \row < 5\\
|
||||
\row-3 & \textbf{if } \row > 7\\
|
||||
error & \textbf{otherwise}
|
||||
\end{cases}$$
|
||||
Thus $\mathcal F(T(\row)) = \mathcal F'(\row)$, and $\mathcal F'(T^{-1}(\row)) = F(\row)$. Similar translations exist for deletions and row moves.
|
||||
|
||||
Let $\mathcal F' = T_1 \circ \ldots \circ T_n \circ \mathcal F$, where $\circ$ denotes function composition.
|
||||
Given a history of transformations, any row $\tuple{i, \mathcal F}$ can be transformed into a later reference frame $\tuple{T_1(\ldots(T_n(\row))), \mathcal F'}$, or an earlier one.
|
||||
Errors in a transformation to an earlier reference frame indicate inserted rows, while errors moving forward through reference frames indicate deleted rows.
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Execution Layer}
|
||||
|
||||
The execution layer is responsible for providing efficient access to cell values, the results of executing cells.
|
||||
As in \cite{DBLP:conf/sigmod/BendreWMCP19}, this can be accomplished by (i)deriving a topological sort over the cells of the spreadsheet in dependency order, and (ii) materializing cells in this order.
|
||||
|
||||
However, materializing the full spreadsheet becomes impractical for a sufficiently large dataset.
|
||||
Instead, the execution layer maintains an \emph{active region} that includes all of the rows that are on a client's screen, a small surrounding buffer, and all of their upstream dependencies (\Cref{alg:upstream}).
|
||||
Only cells in the active region are materialized.
|
||||
When the user's view changes, a new set of cells (typically in the surrounding buffer) are recomputed.
|
||||
|
||||
We observe that recursive patterns (as discussed above) create situations where an active region may scale to the full size of the dataset.
|
||||
Although it is beyond the scope of this work, note that any such form of recursion may be expressed as a window function over the base dataset, and is likely well suited for evaluation in a batch-processing system.
|
||||
|
||||
TODO:
|
||||
short: just note that there is no implicit ordering, so these just involve updating the unordered maps encoding column values.
|
||||
|
||||
\paragraph{Row Insertions/Deletions}
|
||||
|
||||
TODO: probably just migrate the reference frame text here.
|
||||
|
||||
|
|
Loading…
Reference in New Issue