paper-Vizier-SpreadsheetOve.../sections/relwork.tex

39 lines
3.5 KiB
TeX

%!TEX root=../main.tex
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Related Work}
\label{sec:related-work}
Although spreadsheets present a convenient interface to data, they lack the scalability to manage large data.
A common approach to scaling spreadsheets (the ``virtual'' approach) adds an interface to an existing database or workflow system providing spreadsheet-style direct manipulation operations~\cite{DBLP:conf/cidr/BakkeB11,DBLP:conf/icde/LiuJ09,freire:2016:hilda:exception,DBLP:conf/sigmod/JagadishCEJLNY07,DBLP:conf/chi/KandelPHH11}.
The resulting systems bear varying levels of resemblance to existing spreadsheets, usually introducing concepts from relational databases like explicit tables, attributes, and records.
%
Wrangler~\cite{DBLP:conf/chi/KandelPHH11} is an ETL workflow development tool with an interface inspired by spreadsheets.
Users open a small sample of a dataset in Wrangler and use spreadsheet-style operations to indicate desired changes to the dataset.
%
Vizier~\cite{brachmann:2019:sigmod:data, kennedy:2022:ieee-deb:right, kumari:2021:cidr:datasense, brachmann:2020:cidr:your} is a computational notebook system that allows users to define workflow stages through a spreadsheet-style interface.
%
Other approaches more directly mimic relational databases:
The Spreadsheet Algebra~\cite{DBLP:conf/sigmod/JagadishCEJLNY07,DBLP:conf/icde/LiuJ09} allows users to specify any SPJGA-query purely through spreadsheet-style user interactions.
Related Worksheets~\cite{DBLP:conf/cidr/BakkeB11,DBLP:conf/chi/BakkeKM11} re-imagines the spreadsheet interface with record structure and inline display of foreign-key references.
A second approach (the ``materialized'' approach) instead redesigns the spreadsheet engine using database concepts;
An example is DataSpread~\cite{DBLP:conf/icde/BendreVZCP18, DBLP:conf/sigmod/RahmanMBZKP20, DBLP:conf/sigmod/BendreWMCP19}.
A key challenge is that classical database techniques, which exploit common structures in a dataset, are not directly applicable.
\cite{DBLP:conf/icde/BendreVZCP18} explores data structures that can leverage partial structure; for example, when a range of cells are structured as a relational table.
\cite{DBLP:conf/sigmod/BendreWMCP19} explores strategies for quickly invalidating cells and computing dependencies, by leveraging a (lossy) compressed dependency graph that can efficiently bound a cell's downstream.
\cite{tang-23-efcsfg} introduces a different type of compressed dependency graph which is lossless, instead exploiting repeating patterns in formulas.
This is analogous to our own approach, but focuses on the dependency graph rather than expressions, limiting opportunities for optimization.
In summary, DataSpread introduced multiple efficient algorithms for storing, accessing, and updating spreadsheets.
The virtual approach is often less efficient, but has the advantage of supporting light-weight versioning and provenance.
Crucially, it also enables replaying a user's updates, originally applied to one dataset, on a new dataset (e.g., to re-apply curation work on an updated version of the data).
Our overlay approach has the potential to retain these benefits while enabling performance competitive with DataSpread.
% Furthermore, overlays with reference frames allow more efficient insertion and deletion for rows and columns as this only affects reference frames, but not the formulas of cells.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../main"
%%% reftex-default-bibliography: ("../main.bib")
%%% End: