paper-TPCTC-PocketData/sections/6-pocketdata.tex

In spite of the prevalence of mobile devices, relatively little attention has been paid to pocket-scale data management.  We believe that this is, in large part, due to the lack of a common, overarching mechanism to evaluate potential solutions to known challenges in the space.  In this section, we first explore some existing research on mobile databases, and in particular focus on how the authors evaluate their solutions.  Then, we turn to existing benchmarking suites and identify specific disconnects that prevent them from being applied directly to model pocket data.  In the process, we explore aspects of these benchmarks that could be drawn into a potential pocket-data benchmark.

\subsection{Pocket Data Management}
\label{sec:pocketdata:related}

Kang et. al.~\cite{kang2013xftl} explored the design of a flash-aware transactional layer called X-FTL, specifically targeting limitations of SQLite's undo/redo logging on mobile devices.  To evaluate their work, the authors used the TPC-C benchmark in conjunction with a series of micro-benchmarks that evaluate the file system's response to database write operations.  This workload is appropriate for their target optimizations.  However, as we discuss below, TPC-C is not sufficiently representative of a pocket data workload to be used as a general-purpose mobile database benchmark.

Jeong et. al.~\cite{jeong2013iostack} noted similar limitations in SQLite's transactional layer, and went about streamlining the IO-stack, again primarily for the benefit of mobile devices.  Again, micro-benchmarks played a significant role in the author's evaluation of their work.  To evaluate their system's behavior under real-world conditions, the authors ran the \textit{Twitter} and \textit{Facebook} apps, simulating user behavior using a mobility trace generated by MobiGen~\cite{ahmed2009mobigen}.  This is perhaps the most representative benchmarking workload that we encountered in our survey of related work.  %However, it too could be improved.
%In our traces, Facebook and Twitter do represent a substantial contribution to the database workload of a typical smartphone, but still perform orders of magnitude less work with SQLite than built-in apps and system services.

Many of the same issues with IO and power management that now appear in mobile phones have also historically arisen in sensor networks.  Madden et. al.'s work on embedded databases with TinyDB~\cite{madden2005tinydb} is emblematic of this space, where database solutions are driven by one or more specific target application domains.  Naturally, evaluation benchmarks and metrics in sensor networks are typically derived from, and closely tied to the target domain.% --- for example distributed event monitoring in the case of TinyDB.

\subsection{Comparison to Existing Benchmarks}

Given the plethora of available benchmarking software, it is reasonable to ask what a new benchmark for pocket-scale data management brings to the table.  We next compare the assumptions and workload characteristics behind a variety of popular benchmarking suites against a potential TPC-MOBILE, and identify concerns that this benchmark would need to address in order to accurately capture the workload characteristics that we have observed.

\subsubsection{Existing Mobile Benchmarks and Data Generators}

Although no explicit macro-benchmarks exist for mobile embedded databases, we note two benchmark data generators that do simulate several properties of interest: AndroBench~\cite{kim2012androbench} and MobiGen~\cite{ahmed2009mobigen}.  AndroBench is a micro-benchmark capable of simulating the IO behavior of SQLite under different workloads.  It is primarily designed to evaluate the file-system supporting SQLite, rather than the embedded database itself.  However, the structure of its micro-benchmark workloads can just as effectively be used to compare two embedded database implementations.

The second benchmark, MobiGen has little to do with data management directly.  Rather, it generates realistic traces of environmental inputs (\textit{e.g.}, signal strength, accelerometer readings, \textit{etc}\ldots), simulating the effects of a phone being carried through a physical space.  Replaying these traces through a virtual machine running a realistic application workload could generate realistic conditions (\textit{e.g.}, as in the evaluation of X-FTL~\cite{jeong2013iostack}).  However, it does not simulate the effects of user interactions with apps running on the device.

\subsubsection{TPC-C}

One macro-benchmark suite that bears a close resemblance to the trace workload is TPC-C~\cite{tpcc}, which simulates a supply-chain management system.  It includes a variety of transactional tasks ranging from low-latency user interactions for placing and querying orders, to longer-running batch processes that simulate order fulfilment.  A key feature of this benchmark workload is the level of concurrency expected and required of the system.  Much of the data is neatly partitioned, but the workload is designed to force a non-trivial level of cross-talk between partitions, making concurrency a bottleneck at higher throughputs.  Conversely, mobile SQLite databases are isolated into specialized app-specific silos.  In our experiments, throughput remained at very manageable levels from a concurrency standpoint.  The most intensive database user, \textit{Google Play services} had 14.8 million statements attributable to it, just under half of which were writes.  This equates to about one write every 3 seconds, which is substantial from a power management and latency perspective, but not from the standpoint of concurrency.

\subsubsection{YCSB}

We observed many applications using SQLite as a simple key/value store.  Indeed, 13\% of the applications we observed had a read workload that consisted exclusively of key/value queries, and over half of the applications we observed had a workload that consisted of at least 80\% key/value queries.
The Yahoo Cloud Services benchmark~\cite{ycsb} is designed to capture a variety of key/value query workloads, and could provide a foundation for a pocket-scale data benchmark in this capacity.  However, it would need to be extended with support for more complex queries over the same data.

\subsubsection{Analytics}

These more complex queries include multiple levels of query nesting, wide joins, and extensive use of aggregation.  As such, they more closely resemble analytics workload benchmarks such as TPC-H~\cite{tpch}, The Star-Schema Benchmark~\cite{ssb}, and TPC-DS~\cite{tpcds}.  This resemblance is more than passing; many of the more complex queries we encountered appeared to be preparing application run-time state for presentation to the user.  For example the \textit{Google Play Games} service tracks so-called \textit{events} and \textit{quests}, and participating \textit{apps}.  One of the most complex queries that we encountered appeared to be linking and summarizing these features together for presentation in a list view.  Additionally, we note that the presence of analytics queries in pocket data management workloads is likely to increase further, as interest grows in smart phones as a platform for personal sensing~\cite{campbell2008peoplesensing,klasnja2009using,lam2009healthmonitoring}.

\subsubsection{TPC-E}

The TPC-E benchmark emulates a brokerage firm, and includes a mix of reporting and data mining queries alongside stream-monitoring queries.  It models decision support systems that involve a high level of CPU and IO load, and that examine large volumes of rapidly changing data.  SQLite does not presently target or support streaming or active database applications, although such functionality may become available as personal sensing becomes more prevalent.