paper-Vizier-SpreadsheetOve.../sections/experiments.tex

%!TEX root=../main.tex

\begin{figure}
  \centering
  \subcaptionbox{Scale Data, View First}{
    \includegraphics[width=0.47\columnwidth]{results/laptop-init-varysize.pdf}
  }
  \subcaptionbox{Fix Data, Move View}{
    \includegraphics[width=0.47\columnwidth]{results/laptop-init-varystart.pdf}
  }\\[1mm]
  \subcaptionbox{Scale Data, View Last}{
    \includegraphics[width=0.47\columnwidth]{results/laptop-init-varystartandsize.pdf}
  }
  \subcaptionbox{Scale Data, View First}{
    \includegraphics[width=0.47\columnwidth]{results/laptop-update_one-varysize.pdf}
  }\\[-2mm]
  % \subcaptionbox{Fix Data, Move View}{
  %   \includegraphics[width=0.28\textwidth]{results/laptop-update_one-varystart.pdf}
  % }
  % \subcaptionbox{Scale Data, View Last}{
  %   \includegraphics[width=0.28\textwidth]{results/laptop-update_one-varystartandsize.pdf}
  % }
  \caption{Time to initialize the spreadsheet (a-b) and cost to update one cell (c-d)}
  \label{fig:experiments}
  \trimfigurespacing
\end{figure}

\section{Experiments}
\label{sec:experiments}

In this section we explore the performance of the overlay approach.
Concretely, we are interested in two questions:
(i) How does data size affect the performance of each system?
(ii) How does dependency chain length affect the performance of each system?
% Desktop
% Experiments were run on a 10-core 1.7 GHz Intel i7-12700H running Linux (Kernel 6.0), with 64G of DDR-3200 RAM, and a 2TB 970 EVO NVME solid state drive.
% Laptop
Experiments were run on an 8-core 2.3 GHz Intel i7-11800H running Linux (Kernel 5.19), with 32G of DDR4-3200 RAM, and a 2TB 970 EVO NVME solid state drive.
We compare three systems:
(i) \textbf{DataSpread}: Dataspread version 0.5~\cite{bendre-15-d};
(ii) \textbf{Vizier}: Our prototype implementation of overlay spreadsheets; and
(iii) \textbf{Vizier (Simulated Batching)}: Simulated hybrid batch processing (see Setup, below).
All experiments were performed with a warm cache.

\partitle{Setup}
We address our questions through a microbenchmark modeled after TPC-H query 1~\cite{tpc-h}: The spreadsheet is defined by the TPC-H \texttt{lineitem} dataset with $\texttt{N}$ rows and four additional columns defined by the patterns:\\[-2mm]
{\footnotesize
\begin{verbatim}
  base_price[1-N] = ext_price[+0]
  disc_price[1-N] = base_price[+0] * (1 - discount[+0])
  charge[1-N]     = disc_price[+0] * (1 + tax[+0])
  sum_charge[1]   = charge[1]
  sum_charge[2-N] = charge[+0] + sum_charge[-1]
\end{verbatim}
}
\noindent The \texttt{sum\_charge} column is a running total, creating a dependency chain that grows linearly with row index.
As the user scrolls down the page (under normal usage), the runtime to compute visible cells grows linearly.
Each system loads the spreadsheet with a viewable area of 50 rows and updates a single cell.
We measure (i) the cost of initialization and (ii) the cost of a single update.
Time is measured until quiescence.
To emulate batch processing, we replace the formula for the $\texttt{sum\_change}[i-1]$ (where $i$ is the first visible row) with a formula that computes the analogous aggregate query.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\partitle{Moving View}
% \begin{figure}
%   \includegraphics[width=0.7\columnwidth]{results/desktop-update_one.png}
%   \vspace*{-4mm}
%   \caption{Performance based on viewable range.}
%   \label{fig:perf-scale-visible}
%   \trimfigurespacing
% \end{figure}
\Cref{fig:experiments}(a,c) shows costs for a fixed dataset size of approximately 600,000 rows, varying the viewable rows.
Due to the running sum, later rows require more computation.
Costs for Vizier and Dataspread grow significantly with the length of the dependency chain, while batch processing can compute the updated sum significantly faster.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\partitle{Scaling Data}
\Cref{fig:experiments}(b,d) shows costs when varying data size, with the view fixed on the first cell.
Because dependencies in the visible area are of constant size, Vizier is faster.
% \Cref{fig:experiments}(c,f) show these costs when the viewport is on the last cell; as before, the costs for Vizier grow with the length of the longest visible dependency chain, supporting the value of batching.