paper-TPCTC-PocketData/sections/5-dba.tex

167 lines
8.6 KiB
TeX

\begin{figure*}[t]
\begin{subfigure}[t]{0.33\textwidth}
\includegraphics[width=\textwidth]{./graphs/activity/All_Devices__All_Apps__All_Queries__AllPreviousQueryCDFGraph.pdf}
\caption{}
\label{fig-overview-interarrival}
\end{subfigure}%
\begin{subfigure}[t]{0.33\textwidth}
\includegraphics[width=\textwidth]{./graphs/activity/All_Devices__All_Apps__All_Queries__AllRuntimeCDFGraph.pdf}
\caption{}
\label{fig-overview-runtime}
\end{subfigure}%
\begin{subfigure}[t]{0.33\textwidth}
\includegraphics[width=\textwidth]{./graphs/activity/All_Devices__All_Apps__All_Queries__AllRowcountCDFGraph.pdf}
\caption{}
\label{fig-overview-rowcount}
\end{subfigure}%
\caption{\textbf{Summary Statistics for Android SQLite Queries. Distributions of (a) inter-query arrival times, (b) query runtimes, and (c) rows returned per query.}}
\label{fig-overview}
\end{figure*}
Next, we look at overall runtime characteristics of the query workload observed
during our study. We examine how often queries arrive, how long they run, and
how many rows they return---all important inputs into designing the TPC-Mobile
embedded database benchmark.
\subsubsection{General Characteristics}
Figure~\ref{fig-overview} shows query interarrival times, runtimes, and
returned row counts (for \texttt{SELECT} statements) for all users,
applications, and non-informational query types (\texttt{SELECT},
\texttt{UPDATE}, \texttt{INSERT}, \texttt{DELETE}) included in our dataset.
Given that each mobile application is really generating an isolated workload
to its own embedded database, we measure query interarrival time only between
queries issued by the same application.
Examining the interarrival times shown in
Figure~\ref{fig-overview-interarrival}, it is interesting to observe that
many queries seem to arrive much more quickly than the minimum query runtime
shown in Figure~\ref{fig-overview-runtime}. Part of this may be due to apps
that use multiple separate databases, which is not yet captured by our
analysis. However, our logging is also done above any locking
performed by SQLite, and so this may demonstrate that there are many cases
where multiple application threads are issuing overlapping queries in
parallel, even if the queries are eventually serialized before results are
returned. Figure~\ref{fig-overview-interarrival} also shows that, in addition
to a standard long-tailed distribution of query inter-arrival times, about
20\% of the workload is very periodic, arriving at a rate of 0.01~Hz.
The runtime CDF shown in Figure~\ref{fig-overview-runtime} shows while
overall query runtimes show variation over several orders of magnitude, a
large fraction of queries are executed in between 100~and~1000~$\mu$s.
Further investigation into the small fraction of extremely slow queries may
discover areas for database or application improvement. Finally, the row
count CDF shown in Figure~\ref{fig-overview-rowcount} shows that 80\% of
queries return only one row, further supporting our observation that many
applications seem to be using the SQLite database almost as a key-value store.
\begin{figure*}[t]
\begin{subfigure}[t]{0.33\textwidth}
\includegraphics[width=\textwidth]{./graphs/activity/All_Devices__All_Apps__All_Queries__ByTypePreviousQueryCDFGraph.pdf}
\caption{}
\label{fig-type-interarrival-prev}
\end{subfigure}%
\begin{subfigure}[t]{0.33\textwidth}
\includegraphics[width=\textwidth]{./graphs/activity/All_Devices__All_Apps__All_Queries__ByTypeNextQueryCDFGraph.pdf}
\caption{}
\label{fig-type-interarrival-next}
\end{subfigure}%
\begin{subfigure}[t]{0.33\textwidth}
\includegraphics[width=\textwidth]{./graphs/activity/All_Devices__All_Apps__All_Queries__ByTypeRuntimeCDFGraph.pdf}
\caption{}
\label{fig-type-runtime}
\end{subfigure}%
\caption{\textbf{By-Query-Type Statistics for Android SQLite Queries. Distribution of times since the query (a) immediately preceding, and (b) immediately following the query in question. (c) Distribution of runtimes for each query.}}
\label{fig-type}
\end{figure*}
\subsubsection{Runtime Characteristics by Query Type}
Figure~\ref{fig-type} shows runtime characteristics for each of
the four types of SQL statement. Figure~\ref{fig-type-interarrival-prev} and
\ref{fig-type-interarrival-next} in particular show the time since the last
query to be issued and the time until the next query is issued (respectively),
while Figure~\ref{fig-type-runtime} shows the distribution of runtimes for each
type of query.
Examining the differences between Figures~\ref{fig-type-interarrival-prev} and
\ref{fig-type-interarrival-next}, we observe that \texttt{INSERT} queries are far
more likely to arrive shortly before another query than shortly after. Almost 80\% of
\texttt{INSERT}s are followed by another query within 100$\mu$s. A similar, but far
more subdued pattern can be seen for \texttt{UPDATE} statements. Conversely, both
\texttt{SELECT} and \texttt{DELETE} statements are slightly more likely to arrive
shortly before, rather than shortly after another query.
Figure~\ref{fig-type-runtime} shows significant deviations from the global average
runtime for \texttt{DELETE} and \texttt{UPDATE} statements. \texttt{UPDATE}
statements in particular have a bimodal distribution of runtimes, spiking at 100$\mu$s
and 10ms. We suspect that this performance distribution is related to SQLite's use
of filesystem primitives for locking and write-ahead logging~
\cite{jeong2013iostack,kang2013xftl}. This could also help to explain the 0.01Hz
query periodicity we observed above.
\begin{figure*}[t]
\centering
\includegraphics[width=0.6\textwidth]{./graphs/activity/All_Devices__Top_10__Key}
\begin{subfigure}[t]{0.33\textwidth}
\includegraphics[width=\textwidth]{./graphs/activity/All_Devices__Top_10__All_Queries__ByAppPreviousQueryCDFGraph_noKey.pdf}
\caption{}
\label{fig-app-interarrival}
\end{subfigure}%
\begin{subfigure}[t]{0.33\textwidth}
\includegraphics[width=\textwidth]{./graphs/activity/All_Devices__Top_10__All_Queries__ByAppRuntimeCDFGraph_noKey.pdf}
\caption{}
\label{fig-app-runtime}
\end{subfigure}%
\begin{subfigure}[t]{0.33\textwidth}
\includegraphics[width=\textwidth]{./graphs/activity/All_Devices__Top_10__All_Queries__ByAppRowcountCDFGraph_noKey.pdf}
\caption{}
\label{fig-app-rowcount}
\end{subfigure}%
\caption{\textbf{Per-App Summary Statistics for Android SQLite Queries. Distributions of (a) inter-query arrival times, (b) query runtimes, and (c) rows returned per query.}}
\label{fig-app}
\end{figure*}
\subsubsection{Runtime Characteristics by Application}
Figure~\ref{fig-app} shows query interarrival times, runtimes, and returned row
counts for ten of the most active SQLite clients. As seen in
Figure~\ref{fig-app-interarrival}, the 0.01Hz periodicity is not unique to any one
application, further suggesting filesystem locking as a culprit. Two of the most
prolific SQLite clients, \textit{Google Play services} and \textit{Media Storage}
appear to be very bursty: 70\% of all statements for these applications are issued
within 0.1ms of the previous statement. Also interesting is the curve for queries
issued by the \textit{Android System} itself. The interarrival time CDF appears
to be almost precisely logarithmic for rates above 10$\mu$s, but has a notable lack
of interarrival times in the 1ms to 10ms range. This could suggest caching
effects, with the cache expiring after 1ms.
As seen in Figure~\ref{fig-app-runtime}, most apps hold to the average runtime of
100$\mu$s, with several notable exceptions. Over 50\% of the
\textit{Android System}'s statements take on the order of 1ms. Just under 20\% of
\textit{Hangouts} statements take 10ms, suggesting an update-heavy workload. Also,
\textit{Contacts Storage} has a heavier-duty workload, with 30\% of statements taking
between 100$\mu$s and 1ms.
Figure~\ref{fig-app-rowcount} shows that the \textit{Android System} and
\textit{Media Storage} issue almost exclusively single-row lookup queries.
The remaining apps issue a large number of single-row queries --- Even
\textit{Contacts Storage} has a workload consisting of 45\% single-row reads ---
the number of rows returned in general varies much more widely. Many of these
apps' user interfaces have both a list and a search view that show multiple records
at a time, suggesting that these views are backed directly by SQLite. Although all
apps have long tails, two apps in particular: \textit{Gmail} and \textit{Google+} are
notable for regularly issuing queries that return on the order of 100 rows.
%% LocalWords: Android SQLite runtimes TPC interarrival apps CDF ms
%% LocalWords: SQLite's App bursty Gmail