Merge branch 'main' of git.odin.cse.buffalo.edu:VizierDB/paper-Vizier-SpreadsheetOverlay

main
Oliver Kennedy 2023-06-28 10:57:17 -04:00
commit de9a5d26b3
Signed by: okennedy
GPG Key ID: 3E5F9B3ABD3FDB60
3 changed files with 32 additions and 2 deletions

Binary file not shown.

View File

@ -19,7 +19,17 @@
%% values in them; it is your responsibility as an author to replace
%% the commands and values with those provided to you when you
%% complete the rights form.
\setcopyright{none}
\copyrightyear{2023}
\acmYear{2023}
\setcopyright{acmlicensed}\acmConference[HILDA '23]{Workshop on
Human-In-the-Loop Data Analytics}{June 18, 2023}{Seattle, WA, USA}
\acmBooktitle{Workshop on Human-In-the-Loop Data Analytics (HILDA '23),
June 18, 2023, Seattle, WA, USA}
\acmPrice{15.00}
\acmDOI{10.1145/3597465.3605220}
\acmISBN{979-8-4007-0216-7/23/06}
%\setcopyright{none}
% \copyrightyear{2018}
% \acmYear{2018}
% \acmDOI{XXXXXXX.XXXXXXX}
@ -125,7 +135,7 @@
% Spreadsheets provide a convenient, friendly direct manipulation interface to datasets.
Efforts to scale spreadsheets either follow a `virtual` strategy that imposes a spreadsheet interface over an existing database engine or a `materialized' strategy based on re-engineering the spreadsheet engine.
Because database engines are not optimized for spreadsheet access patterns, the materialized approach has better performance.
However, the virtual approach offers several advantages that can not be easily replicated in the materialized approach, including the ability to re-apply user interactions to an updated dataset.
However, the virtual approach offers several advantages that can not be easily replicated in the materialized approach, including the ability to re-apply user interactions to an updated dataset.
We propose a hybrid approach, where patterns of user updates are indexed (as in the materialized approach) and overlaid on an existing dataset (as in the virtual approach).
We introduce the overlay update model, and outline strategies for efficiently accessing an overlay spreadsheet.
A key feature of our approach is storing updates generated by bulk operations (e.g., copy/paste) as ``patterns" that can be leveraged to reduce execution costs.

20
reviews.org Normal file
View File

@ -0,0 +1,20 @@
* Reviewer 1
The paper presents an overlap approach to scale spreadsheets, where patterns of user updates are indexed and overlaid on an existing dataset. The presented ideas are interesting, and the results are quite promising.
Experiments: Materialized strategies are more efficient than virtual but the experiments show that DataSpread is inefficient as compared to Vizier. Please clarify the results and discuss them in more detail.
* Reviewer 2
The Overlay Spreadsheets approach addresses the very important problem of scaling spreadsheet interfaces. It combines ideas from virtual and materialized spreadsheet approaches to improve execution time. The paper is well-motivated and relevant to the HILDA community, and I think it would lead to a lot of interesting discussions on the pros and cons of virtual vs materialized approaches.
Thoughts for improvement:
While the paper is well-organized, it should be checked for typos.
Prior to presenting the formal data model in section 2, it might help to present a motivating example containing a workflow/queries to show concrete scenarios highlighting the pros and cons of each approach, and showing how the overlay approach addresses these shortcomings. This would build intuition before formalization, and make it easier to follow.
* Reviewer 3
This paper presents a technique scale spreadsheets to large scale data by using a hybrid approach that combines the performance benefits of the materialized approach and the ability for the virtual overlay approach to reapply user interaction to updated datasets. The paper describes the technique for this hybrid approach and demonstrate that the implementation reduces execution cost compared to a material
STRENGTH
A strong technique based on a strong observation that bulk updates in a spreadsheet rely on expression patterns, which
Contain a small benchmark section, covering 4 cases, which shows that the technique introduced is faster than the materialized-based technique baseline.
(MINOR) WEAKNESSES
The intro of the paper speaks as if works like Wrangler supports generate spreadsheets features, even though the work leverages declarative transforms to support certain operations specific to data cleaning tasks (but not all spreadsheet operations). Similarly, it's unclear from the paper if the implementation in Vizier supports general free-form spreadsheets or only "spreadsheet mode for data frames".