Tweaking the flow of Section 3

master
Oliver Kennedy 2018-04-28 19:41:35 -04:00
parent eea92c041b
commit 434dfcd53c
8 changed files with 388 additions and 151 deletions

Binary file not shown.

91
graphics/example_log.tex Normal file
View File

@ -0,0 +1,91 @@
% -*- root: ../paper.tex -*-
{\footnotesize
\scalebox{0.7}{
\begin{tabular}{EES{3.5in}}
\textbf{ID} & \textbf{Time} & \textbf{Query}\\ \hline
1 & 16:18:47 &
{\begin{lstlisting}^^J
SELECT m, ts FROM msgs
WHERE (tk=? AND mt=?) ORDER BY ts ^^J
\end{lstlisting}
} \\
2 & 16:18:47 &
{\begin{lstlisting}^^J
SELECT * FROM er
WHERE (tk IN (?) AND ert>=?) ORDER BY ert ASC ^^J
\end{lstlisting}
} \\
3 & 16:18:48 &
{\begin{lstlisting}^^J
SELECT tk FROM
(SELECT m.\_ROWID\_ AS \_id, m.tk AS tk, m.ts AS ts, m.mt AS mt
FROM msgs AS m) WHERE ((tk=? AND mt=? AND ts>?)) ^^J
\end{lstlisting}
}
\\
4 & 16:18:49 &
{\begin{lstlisting}^^J
SELECT * FROM er
WHERE (tk IN (?) AND ert>=?)
ORDER BY ert ASC^^J
\end{lstlisting}
}
\\
5 & 16:18:50 &
{\begin{lstlisting}^^J
SELECT tk FROM
(SELECT m.\_ROWID\_ AS \_id, m.tk AS tk, m.ts AS ts, m.mt AS mt FROM msgs AS m)
WHERE ((tk=? AND mt=? AND ts>?))^^J
\end{lstlisting}
}
\\
6 & 16:20:28 &
{\begin{lstlisting}^^J
SELECT * FROM er
WHERE (tk IN (?) AND ert>=?)
ORDER BY ert ASC^^J
\end{lstlisting}
}
\\
7 & 16:20:30 &
{\begin{lstlisting}^^J
SELECT tk FROM
(SELECT m.\_ROWID\_ AS \_id, m.tk AS tk, m.ts AS ts, m.mt AS mt FROM msgs AS m)
WHERE ((tk=? AND mt=? AND ts>?))^^J
\end{lstlisting}
}
\\
8 & 16:20:33 &
{\begin{lstlisting}^^J
SELECT tk, ifc FROM
(SELECT t.\_ROWID\_ AS \_id, t.tk AS tk, t.ifc AS ifc, t.ts AS ts, t.f AS f
FROM thds AS t)
WHERE (f=?) ORDER BY ts DESC LIMIT 20^^J
\end{lstlisting}
}
\\
9 & 16:20:33 &
{\begin{lstlisting}^^J
SELECT * FROM er
WHERE (tk IN (?) AND ert>=?)
ORDER BY ert ASC^^J
\end{lstlisting}
}
\\
10 & 16:20:33 &
{\begin{lstlisting}^^J
SELECT tk FROM
(SELECT m.\_ROWID\_ AS \_id, m.tk AS tk, m.ts AS ts, m.mt AS mt
FROM msgs AS m)
WHERE ((tk=? AND mt=? AND ts>?))^^J
\end{lstlisting}
}
\\
11 & 16:22:14 &
{\begin{lstlisting}^^J
SELECT MIN(ts) FROM fs
WHERE f=? AND tk != ?^^J
\end{lstlisting}}
\\
\end{tabular}
}

Binary file not shown.

View File

@ -49,7 +49,7 @@ Gokhan Kul, Gourab Mitra, Oliver Kennedy, Lukasz Ziarek\\
\label{sec:background}
\input{sections/2-background.tex}
\section{Problem Definition}
\section{Modeling App Workloads}
\label{sec:systemoutline}
%\input{sections/2-systemoutline.tex}
\input{sections/3-systemoverview.tex}

8
sections/1-introduction.tex Executable file → Normal file
View File

@ -39,7 +39,7 @@ For example, latency sensitive operations like list scrolling can trigger rapid
Worse, such queries are frequently offloaded to background worker threads, making it hard to attribute queries to the specific event that triggered them.
In short, directly instrumenting the app is not always a viable option.
We propose a more straightforward summarization technique that only requires a log of the app's queries.
We propose a more straightforward modeling technique that only requires a log of the app's queries.
Such a log can be obtained by simply linking the app against an appropriately instrumented embedded database library~\cite{kennedy2015pocket}.
Overtly, our approach is similar to the naive one, modeling both the user's interactions with the app, as well as the effect of these interactions on the database.
By necessity, this approach can only approximate the user's exact interactions.
@ -55,10 +55,10 @@ Intuitively, a session would correspond to a specific user interaction like chec
However, directly modeling user activities is difficult for two reasons.
First, there are no explicit cues from the user or app that signal a distinct user action has started or completed.
Furthermore, some apps generate queries in background tasks, adding a consistent level of noise throughout the trace.
Instead, we adopt a simpler view, where a session is simply defined as a continuous burst of database activity.
As a result, a single session of the log may encompass multiple user activities, or a single user activity may be split across multiple segments.
Instead, we adopt a simpler view, segmenting out continuous bursts of database activity rather than sessions that correspond to specific user-driven tasks.
As a result, a single user session may be composed of multiple activity bursts.
We show that in spite of this simplification, the resulting models retain their predictive power.
We also propose a strategy for identifying an appropriate time threshold for each app, using similarities in query logs from across a diverse population of users.
We also propose a strategy for identifying an appropriate segmenting threshold for each app, using similarities in query logs from across a diverse population of users.
\tinysection{Assigning Session Categories}

View File

@ -0,0 +1,137 @@
%!TEX root = ../paper.tex
%Smartphones we so naturally carry and use today do not have a long history.
%The smartphones as we know them today started to get used worldwide by average people around 2007.
%Of course, there were earlier representatives of smartphones, but most of them did not reach to the mass crowds due to their price and lack of network coverage.
%This fundamental shift in technology pushed phone-makers into adopting their technology into the new trends as fast as possible, to be able to stay in the market.
%Once world leaders Nokia and Blackberry rapidly lost their market shares due to failing to adapt to the new trends~\cite{ali2013microsoft, gillette2013blackberryfailure}.
%However, these developments led to a chaotic environment, where creating clear and consistent standards got disregarded.
%One of the areas affected by this environment was how the data is stored on these devices.
%The data can be structured or unstructured, and the data storage methodologies got adopted from computers which had a lot more processing power and available memory.
%Although the processing power and memory barriers are fading away with the current technology in the smartphones, the applications still depend on either file-based storages like JSON and CSV or embedded SQL database systems like SQLite~\cite{wei2012android, allen2010sqlite}.
%Although there is a limited number of choices for database management systems available for smartphones, we anticipate the release of alternative systems soon.
%Mobile phones have become ubiquitous in the last few decades.
Many modern smartphone applications (apps), operating systems, and services need to persist structured data.
For this task, developers typically turn to an embedded database like SQLite, which is a part of most modern smartphone operating systems.
Embedded databases play a significant role in the performance of smartphone apps and can easily become sources of user-perceived latency~\cite{yang-icse15}.
Providing database support for good user experiences presently requires tuning indexes, schemas, or other configutation options to the needs of the app.
While the process of (automated) database tuning has received significant attention~\cite{DBLP:conf/vldb/ChaudhuriN07,DBLP:conf/vldb/AgrawalCN00,DBLP:conf/sigmod/AkenPGZ17}, each solution relies on having a representative model of the database workload.
In the server-class systems that the database community is familiar with, workloads are typically high-volume streams of homogeneous queries from a mix of simultaneous users.
Hence, while there may be shifts in workload frequency, queries in the workload itself can be modeled by a representative sample of queries.
Conversely, every smartphone app: (1) has a dedicated database, (2) is typically used by only one user, and (3) is used for a variety of tasks that are usually performed one at a time.
Simple workload samples are not representative of the bursty, variable, and noisy database access patterns of a typical app (e.g., Figure~\ref{fig:sampleFacebook}).
\begin{figure}
\centering
\includegraphics[width=0.45\textwidth]{graphics/ChangeOverTimeData}
\caption{Sample Facebook Workload}
\label{fig:sampleFacebook}
\trimfigurespacing
\end{figure}
In this paper, we develop a process for modeling smartphone database workload activity.
Nominally, this requires us first to understand how users interact with the app, and then understand how these interactions translate into database activity.
The most direct way to do this would be to instrument the app to monitor user interactions, as well as the resulting database activity.
Assuming that it is possible to modify the app --- which is not always the case --- such instrumentation is not always productive.
For example, latency sensitive operations like list scrolling can trigger rapid sequences of single-row or range queries~\cite{DBLP:conf/sigmod/EbensteinKN16}, but can be hard to instrument without affecting user experience.
Worse, such queries are frequently offloaded to background worker threads, making it hard to attribute queries to the specific event that triggered them.
In short, directly instrumenting the app is not always a viable option.
We propose a more straightforward summarization technique that only requires a log of the app's queries.
Such a log can be obtained by simply linking the app against an appropriately instrumented embedded database library~\cite{kennedy2015pocket}.
Overtly, our approach is similar to the naive one, modeling both the user's interactions with the app, as well as the effect of these interactions on the database.
By necessity, this approach can only approximate the user's exact interactions.
However, as we show experimentally, this simple two-level model fits existing training workloads well, and more importantly, reliably predicts testing workloads.
Concretely, the contribution of this paper is a process for modeling smartphone workloads.
The process consists of three stages: (1) Segmentation, (2) Session Categorization, and (3) Session Modeling.
The last of these stages mirrors the server-class workload modeling problem, so we focus our efforts on the first two stages.
\tinysection{Segmenting Query Logs}
The frist step of this process is to segment the query log into a collection of \emph{sessions}.
Intuitively, a session would correspond to a specific user interaction like checking a Facebook feed or composing an email.
However, directly modeling user activities is difficult for two reasons.
First, there are no explicit cues from the user or app that signal a distinct user action has started or completed.
Furthermore, some apps generate queries in background tasks, adding a consistent level of noise throughout the trace.
Instead, we adopt a simpler view, where a session is simply defined as a continuous burst of database activity.
As a result, a single session of the log may encompass multiple user activities, or a single user activity may be split across multiple segments.
We show that in spite of this simplification, the resulting models retain their predictive power.
We also propose a strategy for identifying an appropriate time threshold for each app, using similarities in query logs from across a diverse population of users.
\tinysection{Assigning Session Categories}
The second step in the modeling process is to associate each sequence with a specific class of user interaction by mapping each session to one of a set of \emph{session categories}.
In principle, recovering a session category could be accomplished by comparing subsets of the query log with ground truth queries generated in a controlled setting.
Although our summarization strategy does support this mode of use, we also show that it is unnecessary and that sessions can be categorized automatically, without ground truth.
Abstractly, we can accomplish this by using a clustering algorithm to group sessions with ``similar'' queries.
However, for this we first need a good definition of what makes two queries similar.
Query similarity has already received significant attention~\cite{aouiche2006,aligon2014similarity,makiyama2015text}.
A common approach is to describe queries in terms of features extracted from the query.
Common features include the columns which are projected or what selection predicates are used.
As our underlying approach is agnostic to how these features are extracted, we experiment with a variety of feature extraction techniques.
% However, we do not rely on this correspondence being one-to-one (it may also be many-to-one or one-to-many).
%As we will show, the resulting session categories and their component queries reliably model smartphone app workloads.
Our final clustering approach adopts a feature extraction method proposed by Makiyama \textit{et al.}~\cite{makiyama2015text}, and addresses one further technical challenge.
In principle each sub-sequence of queries could be described by the features of those queries.
As we show, clustering directly on the query features is neither scalable nor reliable.
Instead, we show the need for an intermediate step where we first cluster individual queries, allowing us to first link related queries.
These cluster labels reduce noise, and serve as the basis for the session-similarity metric for clustering similar sessions together.
\tinysection{Target Applications}
The resulting workload models are primarily targeted at database auto-tuning systems, allowing them to more reliably adapt to highly variable smartphone workloads.
However, these models may also be used to help embedded database developers to better understand app requirements, app developers to better tune app performance, mobile phone OS designers to tune OS level services, and benchmark designers to generate synthetic workload traces.
To ensure that the resulting models will be accurate, we validate our approach using recently collected traces of smartphone query activity in the wild~\cite{kennedy2015pocket}, and show that it is able to cluster queries into meaningful summaries of real world data.
%We will model common behaviors and identify unusual patterns which can be used to realistically emulate synthetic workloads created by Android applications.
%In this paper, we introduce PocketBench, a framework to create a benchmarking tool which aims to emulate the workloads of Android applications to compare different mobile database management system implementations.
%Utilizing Android query logs, we model common behaviors and unusual patterns which can be used to realistically emulate synthetic workloads created by Android applications, allowing to test the performance of different mobile database management systems.
%\begin{scenario}
%Charles, a \textit{Mobile Systems Product Manager} at Facebook wants to find a way to improve the performance of their mobile applications and decides to find the performance bottlenecks and performance improvement opportunities. Setting up PocketBench, his team can use the query logs of alpha test users and find out if the current DBMS system in use is better or worse than other alternatives.
%\end{scenario}
%We also argue that many apps could benefit from understanding how the user uses that particular app.
%For example,
%\begin{scenario}
%Bob, an \textit{Android photographer}, may be using Instagram to post a lot of pictures for the brands, hotels and touristic places.
%Alice, on the other hand, may be using Instagram for browsing photos from the users she follows, and may not have the habit of posting too many photos.
%\end{scenario}
%The workload these two people in the scenario create on the local mobile database is different, and should be addressed accordingly.
%We can utilize this usage characteristics information to
%(1) increase performance for various workloads,
%(2) find out bugs and unnecessary database calls in the apps,
%(3) give more accurate recommendations to the user, and
%(4) explore the data flow improvement opportunities within the app.
\tinysection{Paper Overview}
Concretely, in this paper we:
\begin{enumerate*}
\item Identify the challenges posed by mobile app workloads for query log summarization,
\item Propose a two-level, activity-oriented summary format for mobile app workloads,
\item Propose and validate an automated segmentation process for extracting log sessions from query logs,
\item Design and evaluate several schemes for labeling log sessions based on clustering.
\item Evaluate our small-data workload model end-to-end, highlighting its predictive power and ability to generate representative models.
\end{enumerate*}
%An application of the core contribution of this work is the development of synthetic workload generator which could be used to create a benchmark.
%The methods described in this paper can be used to automatically generate benchmarks from query logs of an application.
%Note that although we motivate for creating a benchmark in this paper, developing the data and query emulation step is not in the scope of this work.
This paper is organized as follows.
We first present the related work in Section~\ref{sec:background} and define the scope of the problem we address in Section~\ref{sec:systemoutline}.
Then, we give a detailed description of our session identification scheme in Section~\ref{sec:sessionidentifier} and our session clustering process in Section~\ref{sec:sessionclustering}.
In Section~\ref{sec:experiments}, we introduce a sample dataset for workload characterization, and we evaluate our proposed techniques using this dataset.
We discuss the experiment results and the limitations of our framework in Section~\ref{sec:discussion}.
Finally, we conclude and explore future work in Section~\ref{sec:conclusion}.

View File

@ -1,3 +1,5 @@
% -*- root: ../paper.tex -*-
%The lack of standards and the need for a better understanding of mobile storage systems can easily be seen by surveying through standardized and well-known mobile database system benchmarks, which in fact non-existent~\cite{kennedy2015pocket}, while traditional database systems have a few dependable benchmarks~\cite{council2008tpch, council2010tpcc}.
%Also, traditional database systems are usually managed by professional database administrators who tune-up the databases according to changing workloads while smartphone databases work with predetermined indexes and are not subject to tuning up depending on the workload they are experiencing.
%Although there were some efforts to measure the performance of SQLite and Android devices under different workloads~\cite{kim2012androbench}, these benchmarks do not specify the bottlenecks, how and where the tune ups should be performed or they do not provide any information specific to the app performance.
@ -55,29 +57,63 @@ more semantic definition of a session --- a sequence of queries issued to the d
\subsection{Session Identification}
\label{sec:backgroundSessionIdentification}
The research on identification of database sessions in database query logs approximates session identification in web search engine query logs. It focuses on three approaches: (1) connection time approach, (2) time-out based approach, and (2) semantic segmentation of topics approach.
Research on identification of database sessions in database query logs approximates session identification in web search engine query logs.
It focuses on three approaches to session segmentation: (1) Per-Connection, (2) Time-Out, and (3) Semantic.
\tinysection{Connection time approach} This approach assumes that all the activity (and inactivity) belongs to one session as long as the user or the application is connected to the system~\cite{oliver2010challenges}.
\tinysection{Per-Connection Segmentation}
The first approach assumes that all the activity (and inactivity) belongs to one session as long as the user or the application is connected to the system~\cite{oliver2010challenges}.
This approach is not suitable for mobiles systems since the database user sessions are local to the database.
They do not correspond to the events of applications' connection and disconnection to the database.
This approach is not suitable for mobiles systems since the database user sessions are local to the database. They do not correspond to the events of applications' connection and disconnection to the database.
Soikkeli \textit{et al.}~\cite{soikkeli2011diversity} say that even considering the time between launch and close of an application is not a reliable notion of an application usage session. Applications running in the foreground are visible to the user. Applications running in background are not visible to user even though they might have launched by the same user.
Soikkeli \textit{et al.}~\cite{soikkeli2011diversity} say that even considering the time between launch and close of an application is not a reliable notion of an application usage session.
Applications running in the foreground are visible to the user.
Applications running in background are not visible to user even though they might have launched by the same user.
However, an app being backgrounded is not a reliable indicator of a session ending;
For example, a user might click on a link in a social media app, opening a web browser and backgrounding the social app.
% This approach is not suitable for mobiles systems since the applications tend to connect to the database when the operating system starts up, and the connection is closed when the operating system is turned off.
\tinysection{Timeout based approach} This approach is based on identifying a time-out value that is ideal for the given scenario to detect session boundaries. The queries that are issued between two boundaries belong to the same session~\cite{he2000detecting}.
\tinysection{Time-Out Segmentation}
The second approach uses inter-event arrival times.
Queries that take longer than a pre-defined time-out threshold --- selected specifically for the given scenario --- indicate session boundaries.
All queries issued between two boundaries belong to the same session~\cite{he2000detecting}.
This is the closest to our proposed approach.
%\note{Our approach is not explored in the way that we did, but we can say that the closest approach to ours is timeout based approach. We will briefly discuss how the cited papers differs from our approach.}
\tinysection{Semantic segmentation based methods} This approach focuses on the content of the queries in order to understand the context change in the query workload~\cite{jones2008, huang2006, hagen2011query}. The assumption is that, if two queries are semantically close to each other, they should be placed in the same session, and if there is a shift in the query interest, there should be session boundary between these queries.
\tinysection{Semantic Segmentation}
The third approach uses the content of the queries to detect context changes in the query workload~\cite{jones2008, huang2006, hagen2011query}.
The assumption is that, if two queries are semantically similar, they should be placed in the same session.
Conversely, if query content changes significantly, it indicates a shift in the query interest and the need for a session boundary between these queries.
Naturally, this raises the question of what makes two queries similar?
A number of similarity measures have been proposed for a variety of specific use-cases, including database performance optimization~\cite{aouiche2006}, workload exploration~\cite{makiyama2015text}, and security applications~\cite{kul2016ettu}.
Naturally, this raises the question of \textit{what makes two queries similar?} The research on this question focuses on various motivations, such as database performance optimization~\cite{aouiche2006}, workload exploration~\cite{makiyama2015text}, and security applications~\cite{kul2016ettu}.
A related approach proposed by Yao \textit{et al.}~\cite{huang2006} uses changes in information entropy to identify session boundaries.
They use a language model which is chacterized by an order parameter and a threshhold parameter.
The order parameter determines the granularity of the n-gram model which would be used to break the query log into smaller sequences of queries.
The conditional probability of occurence of those sequences is calculated from the training data consisting of queries with sessions already identified.
Using these probabilities, a cummulative measurement of entropy is calculated for all sessions.
The entropy parameter determines a threshold value at which a session can't accomodate any more new queries.
These queries form part of another session and contribute of its entropy.
If a sequence of queries has been observed to be occuring close to each other before, their entropy value will be low.
This indicates presence of some kind of link between them and hence supports the case of them being in the same session.
However, when a completely unrelated query is being considered to be part of a particular session, the entropy value of the session will rise and the system places the query in a different session.
Although this approach does not require task-specific configuration parameters like similarity measures or time-out thresholds, it requires ground truth data about sessions as an input.
Yao \textit{et al.}~\cite{huang2006} report that sessions can be identified by studying change in information entropy. They use a language model which is chacterized by an order parameter and a threshhold parameter. The order parameter determines the granularity of the n-gram model which would be used to break the query log into smaller sequences of queries. The conditional probability of occurence of those sequences is calculated from the training data consisting of queries with sessions already identified. Using these probabilities, a cummulative measurement of entropy is calculated for all sessions. The entropy parameter determines a threshold value at which a session can't accomodate any more new queries. These queries form part of another session and contribute of its entropy. If a sequence of queries has been observed to be occuring close to each other before, their entropy value will be low. This indicates presence of some kind of link between them and hence supports the case of them being in the same session. However, when a completely unrelated query is being considered to be part of a particular session, the entropy value of the session will rise. The system would place the query in a different session. This approach is not dependent on time intervals. This approach does not work for us because it requires ground truth data about sessions as an input.
% Even though this approach is very intuitive, it falls short of addressing specific issues in smartphone database sessions. As already described, there is no clear way to identify start and end of a session. As a result, it is not possible to obtain a training set of queries with sessions already labelled. Hence, this approach doesn't work for session identification in our case.
Hagen \textit{et al.}~\cite{hagen2011query} present an interesting approach to session identification which doesn't need simultaneous evaluation of all features for every queries in the log. They propose the Cascade method which processes features in different steps each with increasing computational cost. When a computationally cheaper feature (like query string comparison) can make a reliable decision about session identification, features with higher computational cost and runtime (like explicit semantic analysis) are not processed. Additional features are evaluated only when computationally cheaper features don't provide a reliable decision. The Cascade method is designed on the assumption that time is not a good indicator of session boundaries. The scenario described in their work deals with online web searches. So, users stop working and resume their logical sessions after arbitarary amounts of time.
They also refer to the state-of-the-art geometric method by Gayo-Avello ~\cite{gayo2009survey}. Like the geometric method, the cascade method uses the time and lexical similarity of queries to decide session boundaries. But for query pairs that are chronologically very close and lexically very different, the decision of the geometric method are not reliable. Hence, the cascade method invokes explicit semantic analysis and search result comparison to help decide session boundaries. It combines strategies from both timeout based and semantic segmentation based methods.
Hagen \textit{et al.}~\cite{hagen2011query} propose a Cascade method to improve the performance of session segmentation.
Specifically, they note that many inter-query similarity measures can be approximated by simpler measures that can be computed more efficiently.
For example, string comparison is inexpensive compared to most semantic analysis methods, and can instantly identify two similar queries.
Measures are processed in order of computational complexity;.
When a computationally cheaper measure (string comparison) can make a reliable decision about session identification, measures with higher computational cost (semantic analysis) are skipped.
Additional measures are evaluated only when cheaper measures can't provide a reliable decision.
This approach is specifically aimed at cases where time is not a good indicator of session boundaries.
In online web searches, users frequently pause logical sessions, resuming only after an arbitarary amount of time.
They also refer to the state-of-the-art geometric method by Gayo-Avello~\cite{gayo2009survey}.
Like the geometric method, the Cascade method uses the time and lexical similarity of queries to decide session boundaries.
But for query pairs that are chronologically very close and lexically very different, the decision of the geometric method are not reliable.
Hence, the cascade method invokes explicit semantic analysis and search result comparison to help decide session boundaries.
It combines strategies from both timeout based and semantic segmentation based methods.
%Although semantic segmentation of the queries performs very well for web log session identification, it may not always be practical in a database setting, both in traditional database systems, and in mobile databases. In traditional database systems, activity oriented sessions where the user focuses on a single activity can be identified with this approach, but when the user issues queries with different interests, this approach would fail to identify a sequence of queries that could be classified as a session. Similarly, in a mobile application, an activity can create a series of wide range of queries, which renders this approach inapplicable to identify sessions for a mobile application.
@ -87,11 +123,12 @@ They also refer to the state-of-the-art geometric method by Gayo-Avello ~\cite{g
A session can include one or more activities, and activities consist of a bag of queries as illustrated in Figure~\ref{fig:session}.
Exploring the similarities of sessions could be used to identify repeating patterns.
\begin{figure}[h!]
\begin{figure}
\centering
\includegraphics[width=0.45\textwidth]{graphics/Session}
\caption{Session - Activity - Query relationship}
\label{fig:session}
\trimfigurespacing
\end{figure}
Aligon \textit{et al.}~\cite{aligon2014similarity} report that there are 4 approaches in the literature for computing session similarity: (1) Edit-based approach, (2) Subsequence-based approach, (3) Log-based approach, and (4) Alignment-based approach.
@ -106,38 +143,22 @@ Given two sessions \textit{A} and \textit{B}, and a treshold $\theta$, any appro
\tinysection{Alignment-Based Approach} It considers the ordering of the queries along while comparing n-grams of resulting sequences. It finds the best alignments of n-grams to maximize the similarity.
\subsection{Mobile Workload Analysis}
\subsection{Database Benchmarking}
\label{sec:mobileworkloads}
Workload modeling is a central theme in database benchmarking.
There are various widely used benchmarking systems on traditional relational databases~\cite{council2008tpch, council2010tpcc, council2017tpcds, osdb2004}.
DWEB~\cite{darmont2005dweb}, which is a data warehouse benchmark, also provides parameterization of data in terms of number of tables, and tuples; and parameterization of queries in terms of number of queries, and attributes in a query.
These benchmarks create scalable fixed data, and homogenous query workloads.
They focus on throughput and response time as the performance metric.
The usage pattern of databases in smartphones differs significantly from the traditional database server and web application workloads and benchmarks targeting such systems do not apply well in a mobile context.
Most modern day smartphones typically rely on a web service to help a mobile application deliver the desired functionality of the application to the user, as opposed to keeping the actual data in the database.
Thus, a mobile database typically serves as a cache to hold recently accessed data to be used in case of a connection failure, and data required for the business logic of the mobile application.
This enables the application to defer getting updates of recent changes, and downloading of live data until connection is restored.
Also, this helps asynchronously fetching data from the web services while still being able to show the user a consistent state of the information.
Mobile applications that work on smartphones are the sole owners of their dedicated databases.
None of the other applications, or the operating system can access to an application's database.
When a user starts to interact with an application, the application starts to generate queries for the user's information needs.
Since the database usually acts as a cache for the application, there are a lot of repetitive queries that brings the most recently fetched information to the user interface.
This behavior creates a unique workload for the mobile databases where a user's activity fires up a \emph{burst} of queries in a very short time while little to no activity when the user is not actively engaging with the application. A sample workload gathered from Facebook application can be seen in Figure~\ref{fig:sampleFacebook}.
The effects of these characteristics are threefold:
(1) the workload optimizations does not require considering the behavior of many other users' workloads, and should focus on the individual behavior of the user,
(2) the workloads are bursty; they do not spread over time as traditional database workloads do, and
(3) the bursts are usually focused around a limited number of activities that can be explained with the user's actions, or the governance needs of the application itself.
It creates scalable homogenous query workloads on fixed data, focusing on throughput and response time as performance metrics.
The usage pattern of databases in smartphones differs significantly from workloads and benchmarks targetting traditional database servers and web applications.
Hence, such benchmarks do not translate well to mobile apps.
NoSQL database benchmarks~\cite{cooper2010YCSB, council2017tpcxiot}, appear to be more suitable to mobile database systems by providing support for configuring the query workload in terms of density of inserts, updates, deletes, and selects.
However, the queries are still pre-set in TPCx-IoT~\cite{council2017tpcxiot}. YCSB~\cite{cooper2010YCSB} only measures performance of key-value stores, and requires to be extended in order to process more complex queries.
Additionally, the workload created is still homogenous, and sequential.
Most benchmarks like the TPC-C focus testing peak performance of an OLTP system on homogenous query workloads~\cite{rabl2009generating}. Their goal is to analyze throughput for these homogenous workloads. But it is not correct to truly emulate smartphone query workloads without emulating the intermittent bursts of query activity. These bursts can only be detected by looking at the chronological attributes like query timestamp and query interarrival time.
Another level of abstraction is needed to extract meaningful patterns from the query log.
Most benchmarks like the TPC-C focus testing peak performance of an OLTP system on homogenous query workloads~\cite{rabl2009generating}. Their goal is to analyze throughput for these homogenous workloads. But it is not correct to truly emulate smartphone query workloads without emulating the intermittent bursts of query activity. These bursts can only be detected by looking at the chronological attributes like query timestamp and query interarrival time.
Another level of abstraction is needed to extract meaningful patterns from the query log.
There are also some mobile system database micro benchmarks such as AndroBench~\cite{liu2014application}, which was designed to evaluate the storage performance of the device, and not the database management system itself.
Even though a few of the mentioned benchmarks can satisfy

View File

@ -1,11 +1,70 @@
Our objective is two-fold. First, we address the issue of identifying database user sessions from the query log. These sessions would contain one of more user activities. Second, we want to be able to identifying the most commonly occuring activities.
Hence, we propose an automated, unsupervised method of extracting sessions from the query log and clustering to obtain by identifying the most commonly occuring type of sessions. We begin by looking at inter query arrival time and timestamps in the query log to identify sessions. Then we extract features from queries contained in these sessions to formulate a distance matrix. This enables us to perform hierarchical clustering over the sessions to identify semantically similar sessions. The size of each cluster is proportional to the frequency of the user activity that the sessions represent.
% -*- root: ../paper.tex -*-
In this paper, we describe a process for reliably modeling smartphone app workloads based on logs of queries (e.g., resulting from OS-level instrumentation~\cite{kennedy2015pocket}).
We consider target applications like benchmark design, database-tuning advisors, and app-level workload visualization.
In each case the goal is to create a model that captures the behaviors driving the workload, and not just the workload itself.
In other words, the model should not only be able to describe the trace on which it is trained, but should also be able to faithfully model a testing workload as well.
In the remainder of this section, we outline the problem of modeling smartphone workloads.
We first introduce key features of smartphone app database workloads that drive the design of our two-level workload model
We then introduce the model itself.
Finally, we explore the primary challenge that we address in this paper: automatically fitting the model to a provided query log.
% Our objective is two-fold. First, we address the issue of identifying database user sessions from the query log. These sessions would contain one of more user activities. Second, we want to be able to identifying the most commonly occuring activities.
% Hence, we propose an automated, unsupervised method of extracting sessions from the query log and clustering to obtain by identifying the most commonly occuring type of sessions. We begin by looking at inter query arrival time and timestamps in the query log to identify sessions. Then we extract features from queries contained in these sessions to formulate a distance matrix. This enables us to perform hierarchical clustering over the sessions to identify semantically similar sessions. The size of each cluster is proportional to the frequency of the user activity that the sessions represent.
% \note{How do we extract sessions, defined in terms of considers attributes like query similarity metrics, timestamps, and interarrival time, from a mobile query log in an automated, unsupervised way?}
Considering the observations made in Section~\ref{sec:mobileworkloads}, we believe sessions more accurately represent mobile phone workloads. Consider a typical mobile phone users, he or she interacts with their mobile device multiple times a day, each interaction only for a small
interval of time. Such bursts of intermittent activity are captured in database user sessions --- atomic units of the query log which can be used for detecting patterns.
\begin{figure*}[h!]
\subsection{Smartphone App Workloads}
Server-class workloads are well understood by the database community~\cite{council2008tpch, council2010tpcc, council2017tpcds, osdb2004}.
Even though database accesses can be a bottleneck for smartphone apps~\cite{yang-icse15}, the workloads that trigger these bottlenecks are far less-well understood~\cite{kennedy2015pocket}.
The simplest distinguishing feature is one of scale: Smartphones process on the order of 2 SQL queries per second~\cite{kennedy2015pocket}.
However, the workloads are also structurally different as well: (1) Each app has its own dedicated smartphone databases instance, and (2) Each app is typically only used by a single user at a time.
% \tinysection{Smartphone Data Management}
% Although some smartphone apps rely on a web service to deliver the desired functionality, the app frequently uses a local database to cache recently accessed data, store data necessary for the app's business logic, and/or pre-fetch data that the server anticipates will be useful in the future.
% This enables the application to operate offline, with deferred updates~\cite{kennedy2015pocket}.
% Also, this helps asynchronously fetching data from the web services while still being able to show the user a consistent state of the information.
\tinysection{App-Dedicated Databases}
In contrast to server-class workloads, smartphone apps are the sole owners of dedicated databases.
When a user interacts with an application, the application starts to generate queries to this one specific database.
Since these are typically machine-generated queries generated by a user interface, they are frequently structurally similar.
For example, scrolling through a list may trigger a rapid-fire sequence of queries, each requesting the range of records presently displayed on the screen.
% This behavior is unique to smartphone apps, as a user's activity creates \emph{bursts} of queries that last for only short periods.
% Conversely, the database experiences little to no activity when the user is not actively engaging with the app.
\tinysection{Single-User Effects}
Workloads resulting from a single user's actions (whether or not they are mediated by an application) are typically (1) \emph{heterogeneous}, varying with the user's current goals, and (2) \emph{bursty}, with pauses as the user interprets displayed information, takes a break, or turns to different tasks.
While this is the case for some server-class databases, such databases are typically expected to serve large numbers of users simultaneously.
Hence, the overall workload experienced by the database is more uniform than any of its component parts.
Conversely, smartphone apps serve only a single user at a time and create an overall workload that is both heterogeneous and bursty\footnote{Queries may also be created to manage internal application state; We address this point as well}.
\subsection{The Session/Query Model}
These workload characteristics have three immediate implications on the design of a workload model: two requirements and one simplification.
First, the model must capture the burstiness of the workload.
Second, the model must capture the variation in activity as the user's goals change.
Finally, the individual queries being modeled are machine generated, and thus comparatively easy measure for similarity.
% Considering these observations, we believe sessions more accurately represent mobile phone workloads.
Consider a typical mobile phone user who interacts with their mobile device multiple times a day.
Each interaction typically lass only for a short interval, but similar interactions re-occur frequently.
For example, viewing a Facebook feed may result in a different collection of queries than simply posting a picture.
However, a similar pattern of queries will repeat the next time the feed is viewed, or a picture is posted.
Ideally we would model such interactions in two layers.
The first layer models a distribution over tasks (i.e., $P(Task)$) that the user can perform with the app, while the second layer models the distribution of queries associated with each task (i.e., $P(Q\;|\;Task)$).
In this idealized model, a session is defined as a logical task performed by a user on a smartphone that produces a sequence of database queries.
% Since smartphone applications keep switching between foreground and background,
\begin{figure*}
\centering
\includegraphics[width=0.8\textwidth]{graphics/approach}
\vspace{-0.2cm}
@ -13,126 +72,53 @@ interval of time. Such bursts of intermittent activity are captured in database
\label{fig:approach}
\end{figure*}
We define activity as a logical task performed by a user on a smartphone, such as checking for new email, might produce multiple queries to the database. Since smartphone applications keep switching between foreground and background, these queries could be arbitrarily spaced out in time. In our approach, a database session is a logical unit of user interaction. It spans over a period of time and is comprised of sequential queries. Hence, one database user session might contain one or more logical user activities. Similarly, a logical user activity might be spread across multiple database user sessions. These sessions are useful in capturing subset of logical activities which are repetitive. Since there is no discrete indicator of the start and end of a database user session for smartphones --- most users keeps apps open continuously, we use a heuristic to help define one.
If two queries in a log have timestamps whose difference in time exceeds a specified threshold, we consider them to be a part of different sessions.
\subsection{Implementation Challenges}
Fitting a query log to this model requires us to: (1) Enumerate the set of tasks represented in the log, (2) Associate queries with their corresponding task and session, and (3) Compute the probability distribution of each task being started, as well as (4) The probability distribution of queries occuring within each task.
We start with the second goal and first \textbf{segment} the log into bursts of activity that each (roughly) correspond to one session.
Second, we \textbf{cluster} the resulting collection of sessions to enumerate a set of \emph{session categories} that each (roughly) correspond to one task.
The distributions needed for goals three and four can then be trivially computed from the query log and session segments.
Figure~\ref{fig:approach} provides a high level illustration of our approach.
% In the first step, we take the app's query log as the input to the system. The session identifier accepts queries along with their corresponding timestamps for each user. It partitions the queries into sessions based on the timestamps. In the second step, we process these sessions to group the queries into clusters based on their \emph{similarity}. These groups would yield the most frequent session types, along with the rare ones.
% The output of this operation helps in understanding the most repetitive activities that appear together, hence allowing application developers to optimize their database interactions to run faster. It can also be used to design adaptive benchmarks for specific applications which would help in understanding what types of database management systems would be more suitable for a given app based on its workload.
\tinysection{Segmentation}
In practice however, the queries associated with a session may be arbitrarily spaced out in time.
% In our approach, a database session is a logical unit of user interaction.
Additionally, there is no discrete indicator of the start and end of a database user session for smartphones, as most users keeps apps open continuously.
Worse still, it may be possible to define user tasks at a variety of different levels of granularity: one ``session'' might be equally well expressed as a collection of simpler sessions.
To address these challenges, we observe that bursts of queries (e.g., as in~\ref{fig:sampleFacebook}) \emph{typically} correspond to specific user tasks.
Hence, we adopt a simpler model built up around bursts of database activity.
In this simpler model, a session is defined as a contiguous sequence of queries where each query occurs within a fixed time threshold of the previous one.
If two queries in a log have timestamps whose difference in time exceeds a specified threshold, we consider them to be a part of different sessions.
However, the precise threshold may vary by app or by user.
Hence, we need a strategy for automatically selecting an appropriate threshold.
\tinysection{Clustering}
After partitioning the query log into sessions, we create a clustering of the sessions that consist of \emph{similar} activities by providing frequencies of each query pattern detected.
A simple way to do this would be to collect and label each and every possible activity that can be performed on the app to be used to train the system to detect these activities in the session logs.
The frequency of the activities that appear together or individually in these sessions would provide an outlook of how a user utilizes an application.
This labeling approach, however, is a manual process to establish ground truth, required a lot of human effort and time.
Our main contribution is defining an unsupervised methodology session clustering and analysis, which eliminates the human intervention and effort required.
With this approach, we take only the workload as input, and look for the behavior of the user over real-world logs.
The frequency of certain types of queries that appear together or individually in a user session provides a summary of the expected user activity.
The frequency of certain types of queries that appear in a user session provides a summary of the expected user activity.
Figure~\ref{fig:approach} provides a high level illustration of our approach. Our system leverages two blackboxes: (1) Session Identification, and (2) Session Clustering.
In the first step, we take the app's query log as the input to the system. The session identifier accepts queries along with their corresponding timestamps for each user. It partitions the queries into sessions based on the timestamps. In the second step, we process these sessions to group the queries into clusters based on their \emph{similarity}. These groups would yield the most frequent session types, along with the rare ones.
The output of this operation helps in understanding the most repetitive activities that appear together, hence allowing application developers to optimize their database interactions to run faster. It can also be used to design adaptive benchmarks for specific applications which would help in understanding what types of database management systems would be more suitable for a given app based on its workload.
\begin{example}[Running example]
As an illustrative example of our approach, consider a query the Facebook application.
We present a simplified query log in Table~\ref{table:exampletable1}, extracted from actual query log of the Facebook android app.
The overall structure of individual queries has been retained, but they have been modified to reduce verbosity. Most of these queries interact with tables named as $messages$ and $event reminders$. They deal with event reminders and messaging on Facebook.
{\footnotesize
\begin{table}
\caption{Query log input to the system}
\label{table:exampletable1}
\scalebox{0.7}{
\begin{tabular}{EES{3.5in}}
\textbf{ID} & \textbf{Time} & \textbf{Query}\\ \hline
1 & 16:18:47 &
{\begin{lstlisting}^^J
SELECT m, ts FROM msgs
WHERE (tk=? AND mt=?) ORDER BY ts ^^J
\end{lstlisting}
} \\
2 & 16:18:47 &
{\begin{lstlisting}^^J
SELECT * FROM er
WHERE (tk IN (?) AND ert>=?) ORDER BY ert ASC ^^J
\end{lstlisting}
} \\
3 & 16:18:48 &
{\begin{lstlisting}^^J
SELECT tk FROM
(SELECT m.\_ROWID\_ AS \_id, m.tk AS tk, m.ts AS ts, m.mt AS mt
FROM msgs AS m) WHERE ((tk=? AND mt=? AND ts>?)) ^^J
\end{lstlisting}
\begin{figure}
\input{graphics/example_log.tex}
\caption{Example Query Log Input}
\label{table:exampletable1}
}
\\
4 & 16:18:49 &
{\begin{lstlisting}^^J
SELECT * FROM er
WHERE (tk IN (?) AND ert>=?)
ORDER BY ert ASC^^J
\end{lstlisting}
}
\\
5 & 16:18:50 &
{\begin{lstlisting}^^J
SELECT tk FROM
(SELECT m.\_ROWID\_ AS \_id, m.tk AS tk, m.ts AS ts, m.mt AS mt FROM msgs AS m)
WHERE ((tk=? AND mt=? AND ts>?))^^J
\end{lstlisting}
}
\\
6 & 16:20:28 &
{\begin{lstlisting}^^J
SELECT * FROM er
WHERE (tk IN (?) AND ert>=?)
ORDER BY ert ASC^^J
\end{lstlisting}
}
\\
7 & 16:20:30 &
{\begin{lstlisting}^^J
SELECT tk FROM
(SELECT m.\_ROWID\_ AS \_id, m.tk AS tk, m.ts AS ts, m.mt AS mt FROM msgs AS m)
WHERE ((tk=? AND mt=? AND ts>?))^^J
\end{lstlisting}
}
\\
8 & 16:20:33 &
{\begin{lstlisting}^^J
SELECT tk, ifc FROM
(SELECT t.\_ROWID\_ AS \_id, t.tk AS tk, t.ifc AS ifc, t.ts AS ts, t.f AS f
FROM thds AS t)
WHERE (f=?) ORDER BY ts DESC LIMIT 20^^J
\end{lstlisting}
}
\\
9 & 16:20:33 &
{\begin{lstlisting}^^J
SELECT * FROM er
WHERE (tk IN (?) AND ert>=?)
ORDER BY ert ASC^^J
\end{lstlisting}
}
\\
10 & 16:20:33 &
{\begin{lstlisting}^^J
SELECT tk FROM
(SELECT m.\_ROWID\_ AS \_id, m.tk AS tk, m.ts AS ts, m.mt AS mt
FROM msgs AS m)
WHERE ((tk=? AND mt=? AND ts>?))^^J
\end{lstlisting}
}
\\
11 & 16:22:14 &
{\begin{lstlisting}^^J
SELECT MIN(ts) FROM fs
WHERE f=? AND tk != ?^^J
\end{lstlisting}}
\\
\end{tabular}
}
\vspace{-0.2cm}
\end{table}
}
\end{example}
\trimfigurespacing
\end{figure}
@ -141,8 +127,10 @@ Database user sessions don't correspond to an application's connection and disco
Semantic segmentation methods for session identification look at the content of the queries to group them into a session. They assume that high semantic similarity between queries indicates \textit{similar interest}. For example, Query with ids 2 and 8 in Table ~\ref{table:exampletable1} are semantically similar to each other. If we manually inspect the timestamps of the queries, it can be deduced that there are two \textit{bursts} of query activity; one at 16:18 and another at 16:20. These two \textit{bursts} should represent different database user sessions. We observe that semantic methods would have a difficult time separating the sessions because they look for similar queries.
% \todo{add high level observations about these queries, how many sessions do there appear to be, how might we deduce that, etc}
% Note that traditional approach would have a difficult time handling such a log because ... \todo{add description why this example is hard for other approaches to correctly handle}
\end{example}
In the following sections, we describe session identification and session clustering and show how they can be performed on our sample query in Table~\ref{table:exampletable1}.
In the following sections, we describe session identification and session clustering and show how they can be performed on the sample query log in Figure~\ref{table:exampletable1}.
%% I COMMENTED OUT THE REST OF THE SECTION. WE WILL INTRODUCE CLUSTERING LATER - GOKHAN %%