<pre gra[js
3
main.tex
|
@ -8,6 +8,7 @@
|
|||
\usepackage{listings}
|
||||
\usepackage{algorithm}
|
||||
\usepackage[noend]{algpseudocode}
|
||||
\usepackage{subcaption}
|
||||
|
||||
\newcommand{\trimfigurespacing}{\vspace*{-5mm}}
|
||||
|
||||
|
@ -182,8 +183,8 @@ the materialized approach has better performance.
|
|||
% \input{sections/formalism}
|
||||
\input{sections/system}
|
||||
% \input{sections/data}
|
||||
\input{sections/relwork}
|
||||
\input{sections/experiments}
|
||||
\input{sections/relwork}
|
||||
\input{sections/conclusions}
|
||||
|
||||
|
||||
|
|
|
@ -1,31 +0,0 @@
|
|||
[info] DataspreadBenchmarkVizierSpec
|
||||
[info] DataspreadBenchmarkVizierSpec should
|
||||
[info] Perform Benchamrks consistent with those done with VizierDB
|
||||
[test] @0: Init Spreadsheet: 21.147718712 s
|
||||
[test] @0: Monitoring Overhead: 0.204571017 s
|
||||
[test] @0: Init formulas: 5.258463694 s
|
||||
[test] @0: Update one: 0.031132039 s
|
||||
[test] @0: Update all: [not run]
|
||||
[info] + Time Results @ 0
|
||||
[test] @60: Init Spreadsheet: 21.075780951 s
|
||||
[test] @60: Monitoring Overhead: 0.121315652 s
|
||||
[test] @60: Init formulas: 7.208747642 s
|
||||
[test] @60: Update one: 0.020606538 s
|
||||
[test] @60: Update all: [not run]
|
||||
[info] + Time Results @ 60
|
||||
[test] @600: Init Spreadsheet: 21.059860767 s
|
||||
[test] @600: Monitoring Overhead: 0.120807536 s
|
||||
[test] @600: Init formulas: 24.264395221 s
|
||||
[test] @600: Update one: 0.007306683 s
|
||||
[test] @600: Update all: [not run]
|
||||
[info] + Time Results @ 600
|
||||
[test] @6000: Init Spreadsheet: 21.056667272 s
|
||||
[test] @6000: Monitoring Overhead: 0.11134842 s
|
||||
[test] @6000: Init formulas: 245.847328681 s
|
||||
[test] @6000: Update one: 0.033850361 s
|
||||
[test] @6000: Update all: [not run]
|
||||
[info] + Time Results @ 6000
|
||||
[info] o Time Results @ 60000 [not run - taking longer than 30 minutes]
|
||||
[info] Total for specification DataspreadBenchmarkVizierSpec
|
||||
[info] Finished in 6 minutes 16 seconds, 997 ms
|
||||
[info] 4 examples, 6 expectations, 0 failure, 0 error
|
|
@ -1,32 +0,0 @@
|
|||
[info] DataspreadBenchmarkVizierSpec
|
||||
[info] DataspreadBenchmarkVizierSpec should
|
||||
[info] Perform Benchamrks consistent with those done with VizierDB
|
||||
[info] + Warm up the cache
|
||||
[test] @0: Init Spreadsheet: 21.144650182 s
|
||||
[test] @0: Monitoring Overhead: 0.204167816 s
|
||||
[test] @0: Init formulas: 5.258831865 s
|
||||
[test] @0: Update one: 0.033579433 s
|
||||
[test] @0: Update all: [not run]
|
||||
[info] + Time Results @ 0
|
||||
[test] @60: Init Spreadsheet: 21.046036608 s
|
||||
[test] @60: Monitoring Overhead: 0.116584287 s
|
||||
[test] @60: Init formulas: 7.469271179 s
|
||||
[test] @60: Update one: 0.013094508 s
|
||||
[test] @60: Update all: [not run]
|
||||
[info] + Time Results @ 60
|
||||
[test] @600: Init Spreadsheet: 21.068284745 s
|
||||
[test] @600: Monitoring Overhead: 0.11571002 s
|
||||
[test] @600: Init formulas: 21.723715773 s
|
||||
[test] @600: Update one: 0.007254895 s
|
||||
[test] @600: Update all: [not run]
|
||||
[info] + Time Results @ 600
|
||||
[test] @6000: Init Spreadsheet: 21.072645619 s
|
||||
[test] @6000: Monitoring Overhead: 0.11134726 s
|
||||
[test] @6000: Init formulas: 244.810593821 s
|
||||
[test] @6000: Update one: 0.021708022 s
|
||||
[test] @6000: Update all: [not run]
|
||||
[info] + Time Results @ 6000
|
||||
[info] o Time Results @ 60000 [not run - taking longer than 30 minutes]
|
||||
[info] Total for specification DataspreadBenchmarkVizierSpec
|
||||
[info] Finished in 6 minutes 15 seconds, 731 ms
|
||||
[info] 5 examples, 7 expectations, 0 failure, 0 error
|
|
@ -1,32 +0,0 @@
|
|||
[info] DataspreadBenchmarkVizierSpec
|
||||
[info] DataspreadBenchmarkVizierSpec should
|
||||
[info] Perform Benchamrks consistent with those done with VizierDB
|
||||
[test] @0: Init Spreadsheet: 21.159097527 s
|
||||
[test] @0: Monitoring Overhead: 0.231722052 s
|
||||
[test] @0: Init formulas: 5.259485356 s
|
||||
[test] @0: Update one: 0.019440175 s
|
||||
[test] @0: Update all: [not run]
|
||||
[info] + Time Results @ 0
|
||||
[test] @60: Init Spreadsheet: 21.077188513 s
|
||||
[test] @60: Monitoring Overhead: 0.119935001 s
|
||||
[test] @60: Init formulas: 8.167940197 s
|
||||
[test] @60: Update one: 0.024402629 s
|
||||
[test] @60: Update all: [not run]
|
||||
[info] + Time Results @ 60
|
||||
[test] @600: Init Spreadsheet: 0.030223373 s
|
||||
[test] @600: Monitoring Overhead: 0.113077619 s
|
||||
[test] @600: Init formulas: 32.570919298 s
|
||||
[test] @600: Update one: 0.008316406 s
|
||||
[test] @600: Update all: [not run]
|
||||
[info] + Time Results @ 600
|
||||
[test] @6000: Init Spreadsheet: 0.024342633 s
|
||||
[test] @6000: Monitoring Overhead: 0.111124168 s
|
||||
[test] @6000: Init formulas: 193.063436155 s
|
||||
[test] @6000: Update one: 0.018608992 s
|
||||
[test] @6000: Update all: [not run]
|
||||
[info] + Time Results @ 6000
|
||||
[info] o Time Results @ 60000 [not run - taking longer than 30 minutes]
|
||||
[info] Total for specification DataspreadBenchmarkVizierSpec
|
||||
[info] Finished in 4 minutes 51 seconds, 142 ms
|
||||
[info] 4 examples, 6 expectations, 0 failure, 0 error
|
||||
|
Before Width: | Height: | Size: 17 KiB |
Before Width: | Height: | Size: 19 KiB |
Before Width: | Height: | Size: 17 KiB |
Before Width: | Height: | Size: 16 KiB |
Before Width: | Height: | Size: 18 KiB |
Before Width: | Height: | Size: 17 KiB |
Before Width: | Height: | Size: 19 KiB |
Before Width: | Height: | Size: 18 KiB |
Before Width: | Height: | Size: 21 KiB |
Before Width: | Height: | Size: 19 KiB |
Before Width: | Height: | Size: 19 KiB |
Before Width: | Height: | Size: 21 KiB |
|
@ -1,56 +0,0 @@
|
|||
SpreadsheetBenchmark
|
||||
|
||||
@60/false: Init Spreadsheet: 0.502968873 s
|
||||
@60/false: Monitoring Overhead: 0.006877202 s
|
||||
@60/false: Init Formulas: 0.288059989 s
|
||||
@60/false: Update one: 0.008191735 s
|
||||
@60/false: Update all: 0.066918199 s
|
||||
+ Time Results @ 60
|
||||
@60/true: Init Spreadsheet: 0.006720101 s
|
||||
@60/true: Monitoring Overhead: 0.005177446 s
|
||||
@60/true: Init Formulas: 0.224219809 s
|
||||
@60/true: Update one: 0.008588053 s
|
||||
@60/true: Update all: 0.059844046 s
|
||||
+ Time Results @ 60
|
||||
@600/false: Init Spreadsheet: 0.010373324 s
|
||||
@600/false: Monitoring Overhead: 0.008801294 s
|
||||
@600/false: Init Formulas: 0.207266834 s
|
||||
@600/false: Update one: 0.007608617 s
|
||||
@600/false: Update all: 0.076818458 s
|
||||
+ Time Results @ 600
|
||||
@600/true: Init Spreadsheet: 0.014502952 s
|
||||
@600/true: Monitoring Overhead: 0.005923629 s
|
||||
@600/true: Init Formulas: 0.191094545 s
|
||||
@600/true: Update one: 0.007352669 s
|
||||
@600/true: Update all: 0.057713751 s
|
||||
+ Time Results @ 600
|
||||
@6000/false: Init Spreadsheet: 0.29348608 s
|
||||
@6000/false: Monitoring Overhead: 0.010335879 s
|
||||
@6000/false: Init Formulas: 0.225178793 s
|
||||
@6000/false: Update one: 0.008125263 s
|
||||
@6000/false: Update all: 0.061337873 s
|
||||
+ Time Results @ 6000
|
||||
@6000/true: Init Spreadsheet: 0.005700485 s
|
||||
@6000/true: Monitoring Overhead: 0.00465223 s
|
||||
@6000/true: Init Formulas: 0.205236902 s
|
||||
@6000/true: Update one: 0.007241557 s
|
||||
@6000/true: Update all: 0.053460177 s
|
||||
+ Time Results @ 6000
|
||||
@60000/false: Init Spreadsheet: 4.830022963 s
|
||||
@60000/false: Monitoring Overhead: 0.004278716 s
|
||||
@60000/false: Init Formulas: 0.160613413 s
|
||||
@60000/false: Update one: 0.006726616 s
|
||||
@60000/false: Update all: 0.057563017 s
|
||||
+ Time Results @ 60000
|
||||
@60000/true: Init Spreadsheet: 0.007197506 s
|
||||
@60000/true: Monitoring Overhead: 0.004691892 s
|
||||
@60000/true: Init Formulas: 0.230308324 s
|
||||
@60000/true: Update one: 0.007654771 s
|
||||
@60000/true: Update all: 0.053793202 s
|
||||
+ Time Results @ 60000
|
||||
|
||||
|
||||
Total for specification SpreadsheetBenchmark
|
||||
Finished in 15 seconds, 208 ms
|
||||
8 examples, 9 expectations, 0 failure, 0 error
|
||||
|
|
@ -1,56 +0,0 @@
|
|||
SpreadsheetBenchmark
|
||||
|
||||
+ Warm up the cache
|
||||
@60/false: Init Spreadsheet: 26.140277045 s
|
||||
@60/false: Monitoring Overhead: 0.014927718 s
|
||||
@60/false: Init Formulas: 0.378240991 s
|
||||
@60/false: Update one: 0.007560233 s
|
||||
@60/false: Update all: 0.087895794 s
|
||||
+ Time Results @ 60
|
||||
@60/true: Init Spreadsheet: 0.007541636 s
|
||||
@60/true: Monitoring Overhead: 0.00600814 s
|
||||
@60/true: Init Formulas: 0.309725093 s
|
||||
@60/true: Update one: 0.00739941 s
|
||||
@60/true: Update all: 0.080354155 s
|
||||
+ Time Results @ 60
|
||||
@600/false: Init Spreadsheet: 0.00719922 s
|
||||
@600/false: Monitoring Overhead: 0.005914293 s
|
||||
@600/false: Init Formulas: 0.409290477 s
|
||||
@600/false: Update one: 0.008420892 s
|
||||
@600/false: Update all: 0.144015039 s
|
||||
+ Time Results @ 600
|
||||
@600/true: Init Spreadsheet: 0.007104852 s
|
||||
@600/true: Monitoring Overhead: 0.005998643 s
|
||||
@600/true: Init Formulas: 0.284384031 s
|
||||
@600/true: Update one: 0.008116714 s
|
||||
@600/true: Update all: 0.077044715 s
|
||||
+ Time Results @ 600
|
||||
@6000/false: Init Spreadsheet: 0.39814007 s
|
||||
@6000/false: Monitoring Overhead: 0.005437453 s
|
||||
@6000/false: Init Formulas: 4.291371661 s
|
||||
@6000/false: Update one: 0.010826873 s
|
||||
@6000/false: Update all: 0.697241208 s
|
||||
+ Time Results @ 6000
|
||||
@6000/true: Init Spreadsheet: 0.006777436 s
|
||||
@6000/true: Monitoring Overhead: 0.005843616 s
|
||||
@6000/true: Init Formulas: 0.304945226 s
|
||||
@6000/true: Update one: 0.007598998 s
|
||||
@6000/true: Update all: 0.076540439 s
|
||||
+ Time Results @ 6000
|
||||
@60000/false: Init Spreadsheet: 0.42241651 s
|
||||
@60000/false: Monitoring Overhead: 0.005534486 s
|
||||
@60000/false: Init Formulas: 47.899451675 s
|
||||
@60000/false: Update one: 0.03709275 s
|
||||
@60000/false: Update all: 29.091698828 s
|
||||
+ Time Results @ 60000
|
||||
@60000/true: Init Spreadsheet: 0.006794719 s
|
||||
@60000/true: Monitoring Overhead: 0.005888871 s
|
||||
@60000/true: Init Formulas: 0.450570698 s
|
||||
@60000/true: Update one: 0.007473072 s
|
||||
@60000/true: Update all: 0.078396281 s
|
||||
+ Time Results @ 60000
|
||||
|
||||
|
||||
Total for specification SpreadsheetBenchmark
|
||||
Finished in 1 minute 58 seconds, 970 ms
|
||||
9 examples, 10 expectations, 0 failure, 0 error
|
|
@ -1,56 +0,0 @@
|
|||
SpreadsheetBenchmark
|
||||
|
||||
@60/false: Init Spreadsheet: 0.622319637 s
|
||||
@60/false: Monitoring Overhead: 0.008267427 s
|
||||
@60/false: Init Formulas: 0.379130661 s
|
||||
@60/false: Update one: 0.008588219 s
|
||||
@60/false: Update all: 0.096520684 s
|
||||
+ Time Results @ 60
|
||||
@60/true: Init Spreadsheet: 0.007797491 s
|
||||
@60/true: Monitoring Overhead: 0.006393537 s
|
||||
@60/true: Init Formulas: 0.461880471 s
|
||||
@60/true: Update one: 0.013939607 s
|
||||
@60/true: Update all: 0.11355628 s
|
||||
+ Time Results @ 60
|
||||
@600/false: Init Spreadsheet: 0.008746324 s
|
||||
@600/false: Monitoring Overhead: 0.006425039 s
|
||||
@600/false: Init Formulas: 0.551072941 s
|
||||
@600/false: Update one: 0.01089435 s
|
||||
@600/false: Update all: 0.158270664 s
|
||||
+ Time Results @ 600
|
||||
@600/true: Init Spreadsheet: 0.007240454 s
|
||||
@600/true: Monitoring Overhead: 0.006301443 s
|
||||
@600/true: Init Formulas: 0.339588744 s
|
||||
@600/true: Update one: 0.009767286 s
|
||||
@600/true: Update all: 0.086571365 s
|
||||
+ Time Results @ 600
|
||||
@6000/false: Init Spreadsheet: 0.443006796 s
|
||||
@6000/false: Monitoring Overhead: 0.007127946 s
|
||||
@6000/false: Init Formulas: 2.475917445 s
|
||||
@6000/false: Update one: 0.01008147 s
|
||||
@6000/false: Update all: 0.858393776 s
|
||||
+ Time Results @ 6000
|
||||
@6000/true: Init Spreadsheet: 0.007298637 s
|
||||
@6000/true: Monitoring Overhead: 0.005954614 s
|
||||
@6000/true: Init Formulas: 0.342004032 s
|
||||
@6000/true: Update one: 0.008022405 s
|
||||
@6000/true: Update all: 0.08392664 s
|
||||
+ Time Results @ 6000
|
||||
@60000/false: Init Spreadsheet: 5.584728068 s
|
||||
@60000/false: Monitoring Overhead: 0.006156743 s
|
||||
@60000/false: Init Formulas: 42.167866315 s
|
||||
@60000/false: Update one: 0.0327693 s
|
||||
@60000/false: Update all: 17.918641827 s
|
||||
+ Time Results @ 60000
|
||||
@60000/true: Init Spreadsheet: 0.00672735 s
|
||||
@60000/true: Monitoring Overhead: 0.005856171 s
|
||||
@60000/true: Init Formulas: 0.339442519 s
|
||||
@60000/true: Update one: 0.007649768 s
|
||||
@60000/true: Update all: 0.076880671 s
|
||||
+ Time Results @ 60000
|
||||
|
||||
|
||||
Total for specification SpreadsheetBenchmark
|
||||
Finished in 1 minute 21 seconds, 988 ms
|
||||
8 examples, 9 expectations, 0 failure, 0 error
|
||||
|
|
@ -9,6 +9,7 @@ def read_dataspread(testbed, experiment):
|
|||
if match is None:
|
||||
return []
|
||||
else:
|
||||
print(line)
|
||||
return [(
|
||||
testbed,
|
||||
"dataspread",
|
||||
|
@ -68,12 +69,12 @@ def read_vizier(testbed, experiment):
|
|||
data = [
|
||||
record
|
||||
for ds in [
|
||||
read_vizier("desktop", "varystart"),
|
||||
read_dataspread("desktop", "varystart"),
|
||||
read_vizier("desktop", "varysize"),
|
||||
read_dataspread("desktop", "varysize"),
|
||||
read_vizier("desktop", "varystartandsize"),
|
||||
read_dataspread("desktop", "varystartandsize"),
|
||||
read_vizier("laptop", "varystart"),
|
||||
read_dataspread("laptop", "varystart"),
|
||||
read_vizier("laptop", "varysize"),
|
||||
read_dataspread("laptop", "varysize"),
|
||||
read_vizier("laptop", "varystartandsize"),
|
||||
read_dataspread("laptop", "varystartandsize"),
|
||||
]
|
||||
for record in ds
|
||||
]
|
||||
|
@ -88,9 +89,9 @@ experiment_xlabels = {
|
|||
}
|
||||
|
||||
system_labels = {
|
||||
"vizier" : "Vizier",
|
||||
"vizier-batch" : "Vizier (Simulated Batching)",
|
||||
"dataspread" : "DataSpread"
|
||||
"vizier" : ("Vizier", "v-"),
|
||||
"vizier-batch" : ("Vizier (Simulated Batching)", "^-"),
|
||||
"dataspread" : ("DataSpread", 'o-')
|
||||
}
|
||||
|
||||
|
||||
|
@ -149,10 +150,12 @@ def plot_one(testbed, stage, experiment):
|
|||
and record[5] == experiment
|
||||
], key=lambda x: x[0])
|
||||
|
||||
label, marker = system_labels[system]
|
||||
ax.plot(
|
||||
[pt[0] for pt in points],
|
||||
[pt[1] for pt in points],
|
||||
label=system_labels[system]
|
||||
marker,
|
||||
label=label,
|
||||
)
|
||||
ax.legend()
|
||||
stage = stage.replace(" ", "_")
|
||||
|
@ -162,17 +165,17 @@ def plot_one(testbed, stage, experiment):
|
|||
|
||||
|
||||
|
||||
plot_one("desktop", "init spreadsheet", "varystart")
|
||||
plot_one("desktop", "init formulas", "varystart")
|
||||
plot_one("desktop", "init", "varystart")
|
||||
plot_one("desktop", "update one", "varystart")
|
||||
# plot_one("laptop", "init spreadsheet", "varystart")
|
||||
# plot_one("laptop", "init formulas", "varystart")
|
||||
plot_one("laptop", "init", "varystart")
|
||||
plot_one("laptop", "update one", "varystart")
|
||||
|
||||
plot_one("desktop", "init spreadsheet", "varysize")
|
||||
plot_one("desktop", "init formulas", "varysize")
|
||||
plot_one("desktop", "init", "varysize")
|
||||
plot_one("desktop", "update one", "varysize")
|
||||
# plot_one("laptop", "init spreadsheet", "varysize")
|
||||
# plot_one("laptop", "init formulas", "varysize")
|
||||
plot_one("laptop", "init", "varysize")
|
||||
plot_one("laptop", "update one", "varysize")
|
||||
|
||||
plot_one("desktop", "init spreadsheet", "varystartandsize")
|
||||
plot_one("desktop", "init formulas", "varystartandsize")
|
||||
plot_one("desktop", "init", "varystartandsize")
|
||||
plot_one("desktop", "update one", "varystartandsize")
|
||||
# plot_one("laptop", "init spreadsheet", "varystartandsize")
|
||||
# plot_one("laptop", "init formulas", "varystartandsize")
|
||||
plot_one("laptop", "init", "varystartandsize")
|
||||
plot_one("laptop", "update one", "varystartandsize")
|
||||
|
|
|
@ -1,4 +1,30 @@
|
|||
%!TEX root=../main.tex
|
||||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
\subcaptionbox{Scale Data, View First}{
|
||||
\includegraphics[width=0.28\textwidth]{results/laptop-init-varysize.pdf}
|
||||
}
|
||||
\subcaptionbox{Fix Data, Move View}{
|
||||
\includegraphics[width=0.28\textwidth]{results/laptop-init-varystart.pdf}
|
||||
}
|
||||
\subcaptionbox{Scale Data, View Last}{
|
||||
\includegraphics[width=0.28\textwidth]{results/laptop-init-varystartandsize.pdf}
|
||||
}
|
||||
\subcaptionbox{Scale Data, View First}{
|
||||
\includegraphics[width=0.28\textwidth]{results/laptop-update_one-varysize.pdf}
|
||||
}
|
||||
\subcaptionbox{Fix Data, Move View}{
|
||||
\includegraphics[width=0.28\textwidth]{results/laptop-update_one-varystart.pdf}
|
||||
}
|
||||
\subcaptionbox{Scale Data, View Last}{
|
||||
\includegraphics[width=0.28\textwidth]{results/laptop-update_one-varystartandsize.pdf}
|
||||
}
|
||||
\caption{System Initialization costs (a-c) and cost to update one cell (d-f)}
|
||||
\label{fig:experiments}
|
||||
\trimfigurespacing
|
||||
\end{figure*}
|
||||
|
||||
\section{Experiments}
|
||||
\label{sec:experiments}
|
||||
|
||||
|
@ -6,11 +32,14 @@ In this section we explore the performance of the overlay approach.
|
|||
Concretely, we are interested in two questions:
|
||||
(i) How does data size affect the performance of each system?
|
||||
(ii) How does dependency chain length affect the performance of each system?
|
||||
Experiments were run on a 10-core 1.7 GHz Intel i7-12700H running Linux (Kernel 6.0), with 64G of DDR-3200 RAM, and a 2TB 970 EVO NVME solid state drive.
|
||||
% Desktop
|
||||
% Experiments were run on a 10-core 1.7 GHz Intel i7-12700H running Linux (Kernel 6.0), with 64G of DDR-3200 RAM, and a 2TB 970 EVO NVME solid state drive.
|
||||
% Laptop
|
||||
Experiments were run on an 8-core 2.3 GHz Intel i7-11800H running Linux (Kernel 5.19), with 32G of DDR4-3200 RAM, and a 2TB 970 EVO NVME solid state drive.
|
||||
We compare three systems:
|
||||
(i) \textbf{dataspread}: Dataspread version 0.5~\cite{bendre-15-d}, the most recent version of time of submission;
|
||||
(ii) \textbf{vizier}: Our prototype implementation of overlay spreadsheets; and
|
||||
(iii) \textbf{vizier-batch}: Our prototype implementation with simulated hybrid batch processing.
|
||||
(i) \textbf{DataSpread}: Dataspread version 0.5~\cite{bendre-15-d}, the most recent version of time of submission;
|
||||
(ii) \textbf{Vizier}: Our prototype implementation of overlay spreadsheets; and
|
||||
(iii) \textbf{Vizier (Simulated Batching)}: Our prototype with simulated hybrid batch processing (see Setup, below).
|
||||
All experiments were performed with a warm cache.
|
||||
|
||||
\partitle{Setup}
|
||||
|
@ -28,32 +57,27 @@ We measure (i) the cost of initialization and (ii) the cost of a single update.
|
|||
Time is measured until quiescence.
|
||||
To emulate batch processing, we replace the formula for the $\texttt{sum\_change}[i-1]$ (where $i$ is the first visible row) with a formula that computes the analogous aggregate query.
|
||||
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\partitle{Moving Viewport}
|
||||
% \begin{figure}
|
||||
% \includegraphics[width=0.7\columnwidth]{results/desktop-update_one.png}
|
||||
% \vspace*{-4mm}
|
||||
% \caption{Performance based on viewable range.}
|
||||
% \label{fig:perf-scale-visible}
|
||||
% \trimfigurespacing
|
||||
% \end{figure}
|
||||
\Cref{fig:experiments}(a,d) shows initialization and update costs, with a fixed dataset size of approximately 600,000 rows, and a variable viewport position.
|
||||
Due to the running sum, the longest visible dependency chain grows as the visible region moves further into the dataset.
|
||||
Costs for Vizier and Dataspread grow significantly with the length of the dependency chain, while batch processing can compute the updated sum significantly faster.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\partitle{Scaling Data}
|
||||
|
||||
\begin{figure}
|
||||
\includegraphics[width=0.7\columnwidth]{results/desktop-init_formulas.png}
|
||||
\vspace*{-4mm}
|
||||
\caption{Performance as data size scales.}
|
||||
\label{fig:perf-scale-size}
|
||||
\trimfigurespacing
|
||||
\end{figure}
|
||||
\Cref{fig:perf-scale-size} shows performance as the size of the dataset grows.
|
||||
\Cref{fig:experiments}(a,d) shows the initialization and update costs when the viewport is on the first cell. Vizier only needs to compute the visible cell formulas, and so is significantly faster.
|
||||
\Cref{fig:experiments}(c,f) show these costs when the viewport is on the last cell; as before, the costs for Vizier grow with the length of the longest visible dependency chain, supporting the value of batching.
|
||||
|
||||
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\partitle{Viewport}
|
||||
\begin{figure}
|
||||
\includegraphics[width=0.7\columnwidth]{results/desktop-update_one.png}
|
||||
\vspace*{-4mm}
|
||||
\caption{Performance based on viewable range.}
|
||||
\label{fig:perf-scale-visible}
|
||||
\trimfigurespacing
|
||||
\end{figure}
|
||||
|
||||
\Cref{fig:perf-scale-size} shows performance as the viewable area moves lower.
|
||||
|
||||
|
|
|
@ -4,32 +4,30 @@
|
|||
\label{sec:related-work}
|
||||
|
||||
Although spreadsheets present a convenient, direct-manipulation interface to data, they lack the scalability to manage large data.
|
||||
A common approach to scaling spreadsheets --- what we term the ``virtual'' approach --- is to reformulate the interface to an existing database or workflow system using spreadsheet-style direct manipulation metaphors~\cite{DBLP:conf/cidr/BakkeB11,DBLP:conf/icde/LiuJ09,freire:2016:hilda:exception,DBLP:conf/sigmod/JagadishCEJLNY07,DBLP:conf/chi/KandelPHH11}.
|
||||
A common approach to scaling spreadsheets (the ``virtual'' approach) reformulates the interface to an existing database or workflow system using spreadsheet-style direct manipulation metaphors~\cite{DBLP:conf/cidr/BakkeB11,DBLP:conf/icde/LiuJ09,freire:2016:hilda:exception,DBLP:conf/sigmod/JagadishCEJLNY07,DBLP:conf/chi/KandelPHH11}.
|
||||
The resulting systems bear varying levels of resemblance to existing spreadsheets, usually introducing concepts from relational databases like explicit tables, attributes, and records.
|
||||
|
||||
Vizier~\cite{brachmann:2019:sigmod:data, kennedy:2022:ieee-deb:right, kumari:2021:cidr:datasense, brachmann:2020:cidr:your} is a computational notebook system that automatically versions notebooks as they are edited by users.
|
||||
In Vizier, any dataset used in a computational notebook can be accessed and edited through a spreadsheet interface; the resulting edits are integrated into the workflow.
|
||||
|
||||
%
|
||||
Wrangler~\cite{DBLP:conf/chi/KandelPHH11} is an ETL workflow development tool with an interface inspired by spreadsheets.
|
||||
Users open a small sample of a dataset in Wrangler and use spreadsheet-style direct manipulations to indicate a desired change to the dataset.
|
||||
Wrangler, in turn, proposes ETL workflow steps that can achieve the user's desired effect on the target cell, as well as the remainder of the dataset.
|
||||
|
||||
Other approaches more directly mimic relational databases through spreadsheet-style interfaces.
|
||||
Users open a small sample of a dataset in Wrangler and use spreadsheet-style direct manipulations to indicate desired changes to the dataset.
|
||||
%
|
||||
Vizier~\cite{brachmann:2019:sigmod:data, kennedy:2022:ieee-deb:right, kumari:2021:cidr:datasense, brachmann:2020:cidr:your} is a computational notebook system that allows users to define workflow stages through a spreadsheet-style interface.
|
||||
%
|
||||
Other approaches more directly mimic relational databases:
|
||||
The Spreadsheet Algebra~\cite{DBLP:conf/sigmod/JagadishCEJLNY07,DBLP:conf/icde/LiuJ09} allows users to specify any SPJGA-query purely through spreadsheet-style user interactions.
|
||||
Related Worksheets~\cite{DBLP:conf/cidr/BakkeB11,DBLP:conf/chi/BakkeKM11} re-imagines the classical spreadsheet-style interface by introducing relational structure, as well as nested display of foreign-key dependencies.
|
||||
Related Worksheets~\cite{DBLP:conf/cidr/BakkeB11,DBLP:conf/chi/BakkeKM11} re-imagines the classical spreadsheet-style interface with record structure and inlined display of foreign-key references.
|
||||
|
||||
A second class of approach --- what we term the ``materialized'' approach --- instead redesigns the spreadsheet engine itself through database concepts;
|
||||
The primary example in this space is DataSpread~\cite{DBLP:conf/icde/BendreVZCP18, DBLP:conf/sigmod/RahmanMBZKP20, DBLP:conf/sigmod/BendreWMCP19}.
|
||||
A key challenge that the materialized approach faces is that classical database techniques, which exploit common structures in a dataset, are not directly applicable.
|
||||
A second approach (the ``materialized'' approach) instead redesigns the spreadsheet engine itself through database concepts;
|
||||
An example is DataSpread~\cite{DBLP:conf/icde/BendreVZCP18, DBLP:conf/sigmod/RahmanMBZKP20, DBLP:conf/sigmod/BendreWMCP19}.
|
||||
A key challenge is that classical database techniques, which exploit common structures in a dataset, are not directly applicable.
|
||||
\cite{DBLP:conf/icde/BendreVZCP18} explores data structures that can leverage partial structure; for example, when a range of cells are structured as a relational table.
|
||||
\cite{DBLP:conf/sigmod/BendreWMCP19} explores strategies for quickly invalidating cells and computing dependencies, by leveraging a (lossy) compressed dependency graph that can efficiently bound a cell's downstream.
|
||||
\cite{tang-23-efcsfg} introduces a different type of compressed dependency graph which is lossless, instead exploiting repeating patterns in formulas.
|
||||
This is analogous to our own approach, but focuses on the dependency graph;
|
||||
As we demonstrate here, applying a similar approach to expressions as well creates multiple optimization opportunities.
|
||||
As we demonstrate, expression patterns create more optimization opportunities.
|
||||
|
||||
In summary, several efficient algorithms for storing, accessing, and updating spreadsheets have been developed and adapted in the context of the DataSpread.
|
||||
The approach developed for Vizier is often less efficient, but has the advantage of supporting light-weight versioning and tracking the provenance of the evolution of a dataset (and the computational notebook containing it) under spreadsheet operations.
|
||||
Importantly, this approach enables replaying a user's updates that were originally applied to a dataset $D_{old}$ when $D_{old}$ is replaced with an updated dataset $D_{new}$ (e.g., the user may have downloaded a new version of an open dataset and wants to keep the manual fixes they have applied to the original version of the dataset).
|
||||
The virtual approach is often less efficient, but has the advantage of supporting light-weight versioning, tracking the provenance.
|
||||
Crucially, this approach also enables replaying a user's updates, originally applied to one dataset, on a new dataset (e.g., to re-apply curation work on an updated version of the data).
|
||||
The overlay approach we present in this work has the potential to retain these benefits while enabling performance competitive with, or exceeding that of DataSpread.
|
||||
Furthermore, overlays with reference frames enable more efficient support for insertion and deletion for rows and columns as this only affects reference frames, but not the formulas of cells.
|
||||
|
||||
|
|