39 lines
3.5 KiB
TeX
39 lines
3.5 KiB
TeX
%!TEX root=../main.tex
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Related Work}
|
|
\label{sec:related-work}
|
|
|
|
Although spreadsheets present a convenient interface to data, they lack the scalability to manage large data.
|
|
A common approach to scaling spreadsheets (the ``virtual'' approach) adds an interface to an existing database or workflow system providing spreadsheet-style direct manipulation operations~\cite{DBLP:conf/cidr/BakkeB11,DBLP:conf/icde/LiuJ09,freire:2016:hilda:exception,DBLP:conf/sigmod/JagadishCEJLNY07,DBLP:conf/chi/KandelPHH11}.
|
|
The resulting systems bear varying levels of resemblance to existing spreadsheets, usually introducing concepts from relational databases like explicit tables, attributes, and records.
|
|
%
|
|
Wrangler~\cite{DBLP:conf/chi/KandelPHH11} is an ETL workflow development tool with an interface inspired by spreadsheets.
|
|
Users open a small sample of a dataset in Wrangler and use spreadsheet-style operations to indicate desired changes to the dataset.
|
|
%
|
|
Vizier~\cite{brachmann:2019:sigmod:data, kennedy:2022:ieee-deb:right, kumari:2021:cidr:datasense, brachmann:2020:cidr:your} is a computational notebook system that allows users to define workflow stages through a spreadsheet-style interface.
|
|
%
|
|
Other approaches more directly mimic relational databases:
|
|
The Spreadsheet Algebra~\cite{DBLP:conf/sigmod/JagadishCEJLNY07,DBLP:conf/icde/LiuJ09} allows users to specify any SPJGA-query purely through spreadsheet-style user interactions.
|
|
Related Worksheets~\cite{DBLP:conf/cidr/BakkeB11,DBLP:conf/chi/BakkeKM11} re-imagines the spreadsheet interface with record structure and inline display of foreign-key references.
|
|
|
|
A second approach (the ``materialized'' approach) instead redesigns the spreadsheet engine using database concepts;
|
|
An example is DataSpread~\cite{DBLP:conf/icde/BendreVZCP18, DBLP:conf/sigmod/RahmanMBZKP20, DBLP:conf/sigmod/BendreWMCP19}.
|
|
A key challenge is that classical database techniques, which exploit common structures in a dataset, are not directly applicable.
|
|
\cite{DBLP:conf/icde/BendreVZCP18} explores data structures that can leverage partial structure; for example, when a range of cells are structured as a relational table.
|
|
\cite{DBLP:conf/sigmod/BendreWMCP19} explores strategies for quickly invalidating cells and computing dependencies, by leveraging a (lossy) compressed dependency graph that can efficiently bound a cell's downstream.
|
|
\cite{tang-23-efcsfg} introduces a different type of compressed dependency graph which is lossless, instead exploiting repeating patterns in formulas.
|
|
This is analogous to our own approach, but focuses on the dependency graph rather than expressions, limiting opportunities for optimization.
|
|
|
|
In summary, DataSpread introduced multiple efficient algorithms for storing, accessing, and updating spreadsheets.
|
|
The virtual approach is often less efficient, but has the advantage of supporting light-weight versioning and provenance.
|
|
Crucially, it also enables replaying a user's updates, originally applied to one dataset, on a new dataset (e.g., to re-apply curation work on an updated version of the data).
|
|
Our overlay approach has the potential to retain these benefits while enabling performance competitive with DataSpread.
|
|
% Furthermore, overlays with reference frames allow more efficient insertion and deletion for rows and columns as this only affects reference frames, but not the formulas of cells.
|
|
|
|
|
|
%%% Local Variables:
|
|
%%% mode: latex
|
|
%%% TeX-master: "../main"
|
|
%%% reftex-default-bibliography: ("../main.bib")
|
|
%%% End:
|