88 lines
4.3 KiB
TeX
88 lines
4.3 KiB
TeX
%!TEX root=../main.tex
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\subcaptionbox{Scale Data, View First}{
|
|
\includegraphics[width=0.47\columnwidth]{results/laptop-init-varysize.pdf}
|
|
}
|
|
\subcaptionbox{Fix Data, Move View}{
|
|
\includegraphics[width=0.47\columnwidth]{results/laptop-init-varystart.pdf}
|
|
}\\[1mm]
|
|
\subcaptionbox{Scale Data, View Last}{
|
|
\includegraphics[width=0.47\columnwidth]{results/laptop-init-varystartandsize.pdf}
|
|
}
|
|
\subcaptionbox{Scale Data, View First}{
|
|
\includegraphics[width=0.47\columnwidth]{results/laptop-update_one-varysize.pdf}
|
|
}\\[-2mm]
|
|
% \subcaptionbox{Fix Data, Move View}{
|
|
% \includegraphics[width=0.28\textwidth]{results/laptop-update_one-varystart.pdf}
|
|
% }
|
|
% \subcaptionbox{Scale Data, View Last}{
|
|
% \includegraphics[width=0.28\textwidth]{results/laptop-update_one-varystartandsize.pdf}
|
|
% }
|
|
\caption{Time to initialize the spreadsheet (a-b) and cost to update one cell (c-d)}
|
|
\label{fig:experiments}
|
|
\trimfigurespacing
|
|
\end{figure}
|
|
|
|
\section{Experiments}
|
|
\label{sec:experiments}
|
|
|
|
In this section we explore the performance of the overlay approach.
|
|
Concretely, we are interested in two questions:
|
|
(i) How does data size affect the performance of each system?
|
|
(ii) How does dependency chain length affect the performance of each system?
|
|
% Desktop
|
|
% Experiments were run on a 10-core 1.7 GHz Intel i7-12700H running Linux (Kernel 6.0), with 64G of DDR-3200 RAM, and a 2TB 970 EVO NVME solid state drive.
|
|
% Laptop
|
|
Experiments were run on an 8-core 2.3 GHz Intel i7-11800H running Linux (Kernel 5.19), with 32G of DDR4-3200 RAM, and a 2TB 970 EVO NVME solid state drive.
|
|
We compare three systems:
|
|
(i) \textbf{DataSpread}: Dataspread version 0.5~\cite{bendre-15-d};
|
|
(ii) \textbf{Vizier}: Our prototype implementation of overlay spreadsheets; and
|
|
(iii) \textbf{Vizier (Simulated Batching)}: Simulated hybrid batch processing (see Setup, below).
|
|
All experiments were performed with a warm cache.
|
|
|
|
\partitle{Setup}
|
|
We address our questions through a microbenchmark modeled after TPC-H query 1~\cite{tpc-h}: The spreadsheet is defined by the TPC-H \texttt{lineitem} dataset with $\texttt{N}$ rows and four additional columns defined by the patterns:\\[-2mm]
|
|
{\footnotesize
|
|
\begin{verbatim}
|
|
base_price[1-N] = ext_price[+0]
|
|
disc_price[1-N] = base_price[+0] * (1 - discount[+0])
|
|
charge[1-N] = disc_price[+0] * (1 + tax[+0])
|
|
sum_charge[1] = charge[1]
|
|
sum_charge[2-N] = charge[+0] + sum_charge[-1]
|
|
\end{verbatim}
|
|
}
|
|
\noindent The \texttt{sum\_charge} column is a running total, creating a dependency chain that grows linearly with row index.
|
|
As the user scrolls down the page (under normal usage), the runtime to compute visible cells grows linearly.
|
|
Each system loads the spreadsheet with a viewable area of 50 rows and updates a single cell.
|
|
We measure (i) the cost of initialization and (ii) the cost of a single update.
|
|
Time is measured until quiescence.
|
|
To emulate batch processing, we replace the formula for the $\texttt{sum\_change}[i-1]$ (where $i$ is the first visible row) with a formula that computes the analogous aggregate query.
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\partitle{Moving View}
|
|
% \begin{figure}
|
|
% \includegraphics[width=0.7\columnwidth]{results/desktop-update_one.png}
|
|
% \vspace*{-4mm}
|
|
% \caption{Performance based on viewable range.}
|
|
% \label{fig:perf-scale-visible}
|
|
% \trimfigurespacing
|
|
% \end{figure}
|
|
\Cref{fig:experiments}(a,c) shows costs for a fixed dataset size of approximately 600,000 rows, varying the viewable rows.
|
|
Due to the running sum, later rows require more computation.
|
|
Costs for Vizier and Dataspread grow significantly with the length of the dependency chain, while batch processing can compute the updated sum significantly faster.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\partitle{Scaling Data}
|
|
\Cref{fig:experiments}(b,d) shows costs when varying data size, with the view fixed on the first cell.
|
|
Because dependencies in the visible area are of constant size, Vizier is faster.
|
|
% \Cref{fig:experiments}(c,f) show these costs when the viewport is on the last cell; as before, the costs for Vizier grow with the length of the longest visible dependency chain, supporting the value of batching.
|
|
|
|
|
|
|