Readthrough with some grammar and flow tweaks. Also fixing a few typos in the abstract. Nothing serious.

2015-07-21 19:35:01 -04:00 · 2015-07-21 19:35:01 -04:00 · 02e5e77265
parent 1879c1f4c8
commit 02e5e77265
5 changed files with 24 additions and 21 deletions
--- a/sections/0-abstract.tex
+++ b/sections/0-abstract.tex
@ -1,22 +1,23 @@
-Because embedded database engines, such as SQLite, provide a convenient data
-persistence layer, they have spread along with the applications using them to
+Embedded database engines such as SQLite provide a convenient data
+persistence layer and have spread along with the applications using them to
 many types of systems, including interactive devices such as smartphones.
 %
 Android, the most widely-distributed smartphone platform, both uses SQLite
 internally and provides interfaces encouraging apps to use SQLite to store
 their own private structured data.
 %
-As a result, embedded database performance affects the response times and
-resource consumption of both the platforms that operation billions of
+As similar functionality appears in all major mobile operating systems,
+embedded database performance affects the response times and
+resource consumption of billions of
 smartphones and the millions of apps that run on them---making it more
 important than ever to characterize smartphone embedded database workloads.
 %
 To do so, we present results from an experiment which recorded SQLite
 activity on 11~Android smartphones during one month of typical usage.
 %
-Our analysis shows that Android SQLite usage produce queries and access
+Our analysis shows that Android SQLite usage produces queries and access
 patterns quite different from canonical server workloads.
 %
-We argue that evaluating smartphone embedded database will require a new
-benchmarking suite, and we use our results to outline some of its
+We argue that evaluating smartphone embedded databases will require a new
+benchmarking suite and we use our results to outline some of its
 characteristics.
--- a/sections/2-overview.tex
+++ b/sections/2-overview.tex
@ -1,11 +1,11 @@

 Our primary observation was that a pocket data workload includes a mix of both OLTP and OLAP characteristics.  The majority of operations performed by SQLite were simple key-value manipulations and look-ups.  However, a substantial fraction of the (comparatively read-heavy) workload consisted of far more complex OLAP-style operations involving wide, multi-table joins, nested sub-queries, complex selection predicates, and aggregation.  

-Many of these workload characteristics appeared to be motivated by factors unique to embedded databases.  For example, SQLite uses single-file databases that have a standard, platform-independent format.  As a consequence, we saw indications of entire databases, indexes and all, being transported in their entirety through web downloads or as attachments to other files~\cite{Dit2015CIDR}.  A common pattern we observed was for a cloud service to package a fragment of its state into a SQLite database, which could then be cached locally on the device for lower-latency and offline access.
+Many of these workload characteristics appeared to be motivated by factors unique to embedded databases.  For example, SQLite uses single-file databases that have a standard, platform-independent format.  As a consequence, we saw indications of entire databases, indexes and all, being transported in their entirety through web downloads or as attachments to other files~\cite{Dit2015CIDR}.  This is suggestive of a pattern where cloud services package fragments of their state into SQLite databases, which are then downloaded and cached by the app for both lower-latency and offline access.

 Query optimization goals also differ substantially for pocket data workloads. For example, latency is a primary concern, but at vastly different scales.  Over our one-month trial, the average SQL statement took 2 ms to evaluate, and even complex \texttt{SELECT} queries with 4-level deep nesting only took an average of 120 ms. 

-Finally, unlike typical server-class benchmark workloads where throughput is a key factor, embedded databases have smaller workloads --- on the order of hundreds of rows at most.  Moreover, embedded databases
+Finally, unlike typical server-class benchmark workloads, where throughput is a key factor, embedded databases have smaller workloads --- on the order of hundreds of rows at most.  Moreover, embedded databases
 need to share computing resources fairly with other processes on the same device.  This means that in stark contrast to server-class workloads, an embedded database is idle more frequently.  Periods of low-utilization are opportunities for background optimization, but must be managed against the needs of other applications running on the device, as well as the device's limited power budget. 

 Pocket data workloads represent a growing, and extremely important class of database consumers.  Unfortunately, research and development on embedded databases (\textit{e.g.},~\cite{jeong2013iostack,kang2013xftl}) is presently obligated to rely on micro-benchmarks or anecdotal observations about the needs and requirements of embedded database engines.  
--- a/sections/3-experimental.tex
+++ b/sections/3-experimental.tex
@ -1,4 +1,4 @@
-To collect and analyze SQLite queries generated by Android, we utilized the
+To collect and analyze SQLite queries generated by Android, we used the
 unique capabilities of the \PhoneLab{} smartphone platform testbed located at
 the University at Buffalo (UB). Approximately 200~UB students, faculty, and
 staff use instrumented LG Nexus~5 smartphones as their primary device and
@ -9,7 +9,7 @@ user population. \PhoneLab{} smartphones run a modified version of the
 Android Open Source Platform (AOSP) 4.4.4 ``KitKat" including instrumentation
 and logging developed in collaboration with the mobile systems community.
 Participating smartphones log experimental results which are uploaded to a
-centralized server when the device is charging.
+central server when the device is charging.

 We instrumented the \PhoneLab{} AOSP platform image to log SQLite activity by
 modifying the SQLite source code and distributing the updated binary library
--- a/sections/4-queryc.tex
+++ b/sections/4-queryc.tex
@ -1,3 +1,8 @@
+
+
+In this section we discuss the query complexity we observed during our study and illustrate typical workloads over pocket data.
+Figure~\ref{fig:breakdownByCatAndFeature} summarizes all 45 million statements executed by SQLite over the 1 month period.  As might be expected, \texttt{SELECT} forms almost three quarters of the workload by volume.  \texttt{UPSERT} statements (\textit{i.e.}, \texttt{INSERT OR REPLACE}) form a similarly substantial 16\% of the workload --- more than simple \texttt{INSERT} and \texttt{UPDATE} statements combined.  Also of note is a surprising level of complexity in \texttt{DELETE} statements, many of which rely on nested sub-queries when determining which records to delete.
+
 \begin{figure}
 \centering
 \input{tables/query_breakdown}
@ -5,9 +10,6 @@
 \label{fig:breakdownByCatAndFeature}
 \end{figure}

-In this section we discuss the query complexity we observed during our study and illustrate typical workloads over pocket data.
-Figure~\ref{fig:breakdownByCatAndFeature} summarizes all 45 million statements executed by SQLite over the 1 month period.  As might be expected, \texttt{SELECT} forms almost three quarters of the workload by volume.  \texttt{UPSERT} statements (\textit{i.e.}, \texttt{INSERT OR REPLACE}) form a similarly substantial 16\% of the workload --- more than simple \texttt{INSERT} and \texttt{UPDATE} statements combined.  Also of note is a surprising level of complexity in \texttt{DELETE} statements, many of which rely on nested sub-queries when determining which records to delete.
-
 \begin{figure*}
  \begin{subfigure}[t]{0.5\textwidth}
    \centering
@ -49,7 +51,7 @@ Figure~\ref{fig:breakdownByCatAndFeature} summarizes all 45 million statements e
 \label{fig:topBottom10Apps}
 \end{figure*}

-Figure~\ref{fig:topBottom10Apps} shows the 10 most frequent and 10 least frequent clients of SQLite over the one month trace.  The most active SQLite clients include internal Android services that broker access to data shared between apps such as personal media, calendars and address books, as well as pre-installed and popular social media apps.  There is less of a pattern at the low end, although several infrequent SQLite clients are themselves apps that may be used only infrequently, especially on a phone-sized device.  We suspect that the distribution of apps would differ significantly for a tablet-sized device.
+Figure~\ref{fig:topBottom10Apps} shows the 10 most frequent and 10 least frequent clients of SQLite over the one month trace.  The most active SQLite clients include internal Android services that broker access to data shared between apps such as personal media, calendars, and address books; as well as pre-installed and popular social media apps.  There is less of a pattern at the low end, although several infrequent SQLite clients are themselves apps that may be used only infrequently, especially on a phone-sized device.  We suspect that the distribution of apps would differ significantly for a tablet-sized device.

 \subsection{Database Reads}
 \begin{figure*}
@ -107,7 +109,7 @@ Not surprisingly, this suggests that the most common use-case for SQLite is as a
 \subsubsection{Other \texttt{SELECT} Queries}


-Figure~\ref{fig:allSelectConditionBreakdown} shows a similar breakdown for all 33.5 million \texttt{SELECT} queries seen.  As before, the table shows the form of all expressions that appear as one of the conjunctive terms of a \texttt{WHERE} clause, alongside the number of queries where the expression appears.  31.0 million of these queries contain an exact lookup.
+Figure~\ref{fig:allSelectConditionBreakdown} shows a similar breakdown for all 33.5 million \texttt{SELECT} queries seen.  As before, the table shows the form of all expressions that appear as one of the conjunctive terms of a \texttt{WHERE} clause, alongside the number of queries where the expression appears at least once.  31.0 million of these queries contain an exact lookup.
 1.6 million queries contain at least one multi-attribute equality expression such as an equi-join constraint, lining up nicely with the 1.7 million queries that reference at least two tables. 

 \begin{figure}
@ -119,7 +121,7 @@ Figure~\ref{fig:allSelectConditionBreakdown} shows a similar breakdown for all 3
 \label{fig:allSelectConditionBreakdown}
 \end{figure} 

-App developers make frequent use of SQLite's dynamic typing: Where clauses include bare column references (\textit{e.g.}, \texttt{WHERE A}, implicitly equivalent to \texttt{WHERE A <> 0}) as well as bare bit-wise AND expressions (\textit{e.g.}, \texttt{A\&0xc4}).  This latter predicate appearing in a half-million queries indicates extensive use of bit-arrays packed into integers.   
+App developers make frequent use of SQLite's dynamic typing: Where clauses include bare column references (\textit{e.g.}, \texttt{WHERE A}, implicitly equivalent to \texttt{WHERE A <> 0}) as well as bare bit-wise AND expressions (\textit{e.g.}, \texttt{A\&0xc4}).  This latter predicate appearing in a half-million queries suggests extensive use of bit-arrays packed into integers.   

 \subsubsection{Functions}

@ -174,8 +176,8 @@ SELECT property_value FROM properties WHERE property_key=?;
 \end{verbatim}
 In this query, \texttt{?} is a prepared statement parameter that acts as a place holder for values that are bound when the prepared statement is evaluated.

-To broaden the scope of our search for key/value queries, we define a key-value look-up query as a \texttt{SELECT} query over a single relation that either performs a full table scan, or performs an exact look-up on a single attribute.  
-Figure~\ref{fig:selectByApp:simple} shows the cumulative distribution of apps sorted by the percent of its queries that are key-value lookup queries.  For 24 apps (13.4\%), we observed only key-value queries during the entire, month-long trace. 
+To broaden the scope of our analysis of key/value queries, we define a key-value look-up query as a \texttt{SELECT} query over a single relation that either performs a full table scan, or performs an exact look-up on a single attribute.  
+Figure~\ref{fig:selectByApp:simple} shows the cumulative distribution of apps sorted by the fraction of each app's queries that are key-value lookup queries.  For 24 apps (13.4\%), we observed \emph{only} key-value queries during the entire, month-long trace. 

 % Adobe Reader, Barcode Scanner, BuzzFeed, Candy Crush Saga, Discover, Evernote, Foursquare, GPS Status, Google Play Newsstand, Google Sky Map, KBS kong, LTE Discovery, MX Player Pro, Muzei, My Tracks, Office Mobile, PayPal, Quickoffice, SignalCheck Lite, Titanium Backup, TuneIn Radio Pro, VLC, Weather, Wifi Analyzer

--- a/sections/6-pocketdata.tex
+++ b/sections/6-pocketdata.tex
@ -1,4 +1,4 @@
-In spite of the prevalence of SQL on mobile devices, and a increasing interest in so-called ``small data"~\cite{Dit2015CIDR}, relatively little attention has been paid to the rapidly growing \textit{pocket data} space.
+In spite of the prevalence of SQL on mobile devices, and an increasing interest in so-called ``small data"~\cite{Dit2015CIDR}, relatively little attention has been paid to the rapidly growing \textit{pocket data} space.
 %We believe that this is, in large part, due to the lack of a common, overarching mechanism to evaluate potential solutions to known challenges in the space.  
 In this section, we first explore some existing research on mobile databases, with a focus on how the authors evaluate their solutions.  Then, we turn to existing benchmarking suites and identify specific disconnects that prevent them from being applied directly to model pocket data.  In the process, we explore aspects of these benchmarks that could be drawn into a potential pocket data benchmark.

@ -7,7 +7,7 @@ In this section, we first explore some existing research on mobile databases, wi

 Kang et. al.~\cite{kang2013xftl} explored the design of a flash-aware transactional layer called X-FTL, specifically targeting limitations of SQLite's redo logging on mobile devices.  To evaluate their work, the authors used the TPC-C benchmark in conjunction with a series of micro-benchmarks that evaluate the file system's response to database write operations.  This workload is appropriate for their target optimizations.  However, as we discuss below, TPC-C is not sufficiently representative of a pocket data workload to be used as a general-purpose mobile database benchmark.

-Jeong et. al.~\cite{jeong2013iostack} noted similar limitations in SQLite's transactional layer, and went about streamlining the IO-stack, again primarily for the benefit of mobile devices.  Again, micro-benchmarks played a significant role in the author's evaluation of their work.  To evaluate their system's behavior under real-world conditions, the authors ran the \textit{Twitter} and \textit{Facebook} apps, simulating user behavior using a mobility trace generated by MobiGen~\cite{ahmed2009mobigen}.  This is perhaps the most representative benchmarking workload that we encountered in our survey of related work.  %However, it too could be improved.
+Jeong et. al.~\cite{jeong2013iostack} noted similar limitations in SQLite's transactional layer, and went about streamlining the IO-stack, also primarily for the benefit of mobile devices.  Again, micro-benchmarks played a significant role in the author's evaluation of their work.  To evaluate their system's behavior under real-world conditions, the authors ran the \textit{Twitter} and \textit{Facebook} apps, simulating user behavior using a mobility trace generated by MobiGen~\cite{ahmed2009mobigen}.  This is perhaps the most representative benchmarking workload that we encountered in our survey of related work.  %However, it too could be improved.
 %In our traces, Facebook and Twitter do represent a substantial contribution to the database workload of a typical smartphone, but still perform orders of magnitude less work with SQLite than built-in apps and system services.

 Many of the same issues with IO and power management that now appear in mobile phones have also historically arisen in sensor networks.  Madden et. al.'s work on embedded databases with TinyDB~\cite{madden2005tinydb} is emblematic of this space, where database solutions are driven by one or more specific target application domains.  Naturally, evaluation benchmarks and metrics in sensor networks are typically derived from, and closely tied to the target domain.% --- for example distributed event monitoring in the case of TinyDB.