Revising intro

2018-04-15 15:13:40 -04:00 · 2018-04-15 15:13:40 -04:00 · 6eb35b31cb
parent e823f33a3f
commit 6eb35b31cb
25 changed files with 66 additions and 30 deletions
--- a/.gitignore
+++ b/.gitignore
--- a/ACM-Reference-Format.bst
+++ b/ACM-Reference-Format.bst
--- a/Comments/Small-Data
+++ b/Comments/Small-Data
--- a/acmart.cls
+++ b/acmart.cls
--- a/acmart.dtx
+++ b/acmart.dtx
--- a/acmart.ins
+++ b/acmart.ins
--- a/oliver.bib
+++ b/oliver.bib
@ -13,4 +13,39 @@
  booktitle={USENIX Annual Technical Conference},
  pages={309--320},
  year={2013}
-}
+}
+@inproceedings{DBLP:conf/vldb/ChaudhuriN07,
+	author = {Surajit Chaudhuri and Vivek R. Narasayya},
+	booktitle = {{VLDB}},
+	pages = {3--14},
+	publisher = {{ACM}},
+	title = {Self-Tuning Database Systems: {A} Decade of Progress},
+	year = 2007
+}
+
+@inproceedings{DBLP:conf/vldb/AgrawalCN00,
+	author = {Sanjay Agrawal and Surajit Chaudhuri and Vivek R. Narasayya},
+	booktitle = {{VLDB}},
+	pages = {496--505},
+	publisher = {Morgan Kaufmann},
+	title = {Automated Selection of Materialized Views and Indexes in {SQL} Databases},
+	year = 2000
+}
+
+@inproceedings{DBLP:conf/sigmod/AkenPGZ17,
+	author = {Van Aken, Dana and Pavlo, Andrew and J. Gordon, Geoffrey and Zhang, Bohan},
+	booktitle = {{SIGMOD} Conference},
+	pages = {1009--1024},
+	publisher = {{ACM}},
+	title = {Automatic Database Management System Tuning Through Large-scale MachineLearning},
+	year = 2017
+}
+
+@inproceedings{DBLP:conf/edbt/IdreosMG12,
+	author = {Stratos Idreos and Stefan Manegold and Goetz Graefe},
+	booktitle = {{EDBT}},
+	pages = {566--569},
+	publisher = {{ACM}},
+	title = {Adaptive indexing in modern database kernels},
+	year = 2012
+}
--- a/paper.tex
+++ b/paper.tex
@ -1,12 +1,15 @@
-\documentclass[sigconf, anonymous]{acmart}
+% \documentclass[sigconf]{acmart}
 %\documentclass[sigconf]{acmart}
-%\documentclass{vldb}
+\documentclass{vldb}

 \input{preamble}
 \usepackage{balance}  % for  \balance command ON LAST PAGE  (only there!)

 % \toappear{}

+\newtheorem{example}{Example}
+\usepackage[dvipsnames]{xcolor}
+
 %\numberofauthors{1}

 \begin{document}
@ -14,29 +17,27 @@
 \title{Summarizing Small Data Workloads} 


-% \author{
-% \alignauthor
-% Gokhan Kul, Gourab Mitra, Oliver Kennedy, Lukasz Ziarek\\
-%        \affaddr{University at Buffalo, SUNY}\\
-%        \email{\{gokhanku, gourabmi, okennedy, lziarek\}@buffalo.edu}
-% }
 \author{
-    Anonymous authors
+\alignauthor
+Gokhan Kul, Gourab Mitra, Oliver Kennedy, Lukasz Ziarek\\
+       \affaddr{University at Buffalo, SUNY}\\
+       \email{\{gokhanku, gourabmi, okennedy, lziarek\}@buffalo.edu}
 }



 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

+\maketitle
+
 \begin{abstract}
 \input{sections/0-abstract.tex}
 \end{abstract}

 %>>>> Include a list of keywords after the abstract 

-\keywords{Benchmark, Database, Workload, Mobile Systems}
+% \keywords{Benchmark, Database, Workload, Mobile Systems}

-\maketitle

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Introduction}
--- a/sections/0-abstract.tex
+++ b/sections/0-abstract.tex
--- a/sections/1-introduction.tex
+++ b/sections/1-introduction.tex
@ -12,35 +12,34 @@
 %Although there is a limited number of choices for database management systems available for smartphones, we anticipate the release of alternative systems soon. 

 %Mobile phones have become ubiquitous in the last few decades.
-Many modern smartphone apps, operating systems, and services need to persist structured data.
+Many modern smartphone applications (apps), operating systems, and services need to persist structured data.
 For this task, developers typically turn to an embedded database like SQLite, which is a part of most modern smartphone operating systems.
 Embedded databases play a significant role in the performance of smartphone apps and can easily become sources of user-perceived latency~\cite{yang-icse15}.
-Crafting apps with good user experiences thus often requires tuning indexes, schemas, or other configutation options to the needs of the app.
-Unfortunately, these needs can be hard to characterize and optimize for.
-The server-class workloads that the database community is familiar with are typically high-volume streams of homogeneous queries from a mix of simultaneous users.
-In contrast, each smartphone app has a dedicated database, and is typically used by only one user for a variety of tasks that are usually performed one at a time.
-As a consequence, the database workload created by a typical app (e.g., Figure~\ref{fig:sampleFacebook}) is bursty, variable, noisy, and as a result can be hard to summarize.
-Hence, it is more common for researchers and app developers to synthesize workloads to experimentally evaluate tuning options~\cite{kim2012androbench}.
-Unfortunately, these synthetic workloads are typically created in controlled settings, often without any guarantees that they are representative of real world usage.
+Providing database support for good user experiences presently requires tuning indexes, schemas, or other configutation options to the needs of the app.
+While the process of (automated) database tuning has received significant attention~\cite{DBLP:conf/vldb/ChaudhuriN07,DBLP:conf/vldb/AgrawalCN00,DBLP:conf/sigmod/AkenPGZ17}, each solution relies on a representative model of the database workload.

-\begin{figure}[h!]
+In the server-class systems that the database community is familiar with, workloads are typically high-volume streams of homogeneous queries from a mix of simultaneous users.
+Hence, while there may be shifts in workload frequency, the workload itself can be modeled by a representative sample of queries.
+Conversely, each smartphone app has a dedicated database, and is typically used by only one user for a variety of tasks that are usually performed one at a time.
+Simple workload samples are not representative of the bursty, variable, and noisy database access patterns of a typical app (e.g., Figure~\ref{fig:sampleFacebook}).
+
+\begin{figure}
    \centering
    \includegraphics[width=0.45\textwidth]{graphics/ChangeOverTimeData}
    \caption{Sample Facebook Workload}
    \label{fig:sampleFacebook}
 \end{figure}

-The problems of workload synthesis and summarization are linked; A good summary identifies features that need to be reproduced in a synthetic workload.
-In this paper we tackle both problems, creating a process for extracting representative summaries from smartphone database workload traces, which in turn can be used to synthesize representative workloads.
-Nominally, this requires us first to understand how users interact with the app, and second how these interactions translate into database activity.
-Naively, we might do this by instrumenting the app: monitoring user interactions and the app's database activity.
-However, this is not always feasible.
+In this paper, we develop a process for modeling smartphone database workload activity.  
+Nominally, this requires us to (1) understand how users interact with the app, and (2) how these interactions translate into database activity.
+The most direct way to do this would be to instrument the app to monitor user interactions, as well as the resulting database activity.
+Assuming that it is possible to modify the app --- which is not always the case --- such instrumentation is not always productive.
 For example, latency sensitive operations like list scrolling can trigger rapid sequences of single-row or range queries~\cite{DBLP:conf/sigmod/EbensteinKN16}, but can be hard to instrument without affecting user experience.
-Similarly, queries triggered by a list scroll may be offloaded to a worker thread, making it difficult to associate them with the scrolling action.
-In short, the direct approach of app instrumentation is hard and needs to be repeated for each app nearly from scratch.
+Such queries are frequently offloaded to background worker threads, making it hard to attribute these queries to any specific user action.
+In short, directly instrumenting the app is not always feasible.

 We propose a more straightforward summarization technique that only requires a log of the app's queries.
-This in turn can be obtained by simply linking the app against an appropriately instrumented embedded database library~\cite{kennedy2015pocket}.
+Such a log can be obtained by simply linking the app against an appropriately instrumented embedded database library~\cite{kennedy2015pocket}.
 Overtly, our approach is similar to the naive one: We first summarize user interactions with the app and then the effect of these interactions on the database.
 To summarize user interactions, we treat the query log as a collection of \emph{sessions}, or bursts of database activity typically triggered by self-contained user activities, such as checking a Facebook feed or composing an email.
 After partitioning a log into sessions, we attempt to recover the specific class of interaction associated with each sequence, mapping each session to one of a set of \emph{session categories}.
--- a/sections/2-background.tex
+++ b/sections/2-background.tex
--- a/sections/3-systemoverview.tex
+++ b/sections/3-systemoverview.tex
--- a/sections/3a-clustering.tex
+++ b/sections/3a-clustering.tex
--- a/sections/3b-patternmatching.tex
+++ b/sections/3b-patternmatching.tex
--- a/sections/3d-resourceutilization.tex
+++ b/sections/3d-resourceutilization.tex
--- a/sections/4-experiments.tex
+++ b/sections/4-experiments.tex
@ -1,3 +1,4 @@
+%!TEX root=../paper.tex

 In this section, we describe the datasets, the environment we performed our experiments in, and the experiment designs along with their results.
 All of our experiments were run on a machine with 3.6 GHz Intel i7 6th. Generation processor with 16GB RAM. We leveraged the Java 1.8 SE Runtime Environment and R v3.3.2 on Ubuntu 16.04 operating system.
@ -274,7 +275,7 @@ The aim of this experiment is to investigate if detected session clusters corres

 \begin{figure}[h!]
    \centering
-    \includegraphics[width=0.45\textwidth]{graphics/activityRecognition}
+    \includegraphics[width=0.45\textwidth]{graphics/ActivityRecognition}
    \vspace{-0.5cm}
    \caption{Activity recognition performance for different profiler methods}
    \label{fig:activityRecognition}
--- a/sections/4-experiments.tex.bak.tex
+++ b/sections/4-experiments.tex.bak.tex
--- a/sections/4-sessionclustering.tex
+++ b/sections/4-sessionclustering.tex
--- a/sections/4a-sessionidentification.tex
+++ b/sections/4a-sessionidentification.tex
--- a/sections/4b-profiler.tex
+++ b/sections/4b-profiler.tex
--- a/sections/4c-analyzer.tex
+++ b/sections/4c-analyzer.tex
--- a/sections/5-conclusion.tex
+++ b/sections/5-conclusion.tex
--- a/sections/6-discussion.tex
+++ b/sections/6-discussion.tex
--- a/sections/6-futurework.tex
+++ b/sections/6-futurework.tex
--- a/vldb.cls
+++ b/vldb.cls