Merge branch 'master' of gitlab.odin.cse.buffalo.edu:pocketdata/grant-NSF-CRI-2018

Added timeline notes
A few missing bibs
A few fixes related to Caroline's comments
main
Oliver Kennedy 2018-01-10 13:41:12 -05:00
commit 37569a093c
12 changed files with 122 additions and 90 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

View File

@ -339,3 +339,21 @@
volume = 30,
year = 2005
}
@article{DBLP:journals/tods/MaddenFHH05,
author = {Samuel Madden and Michael J. Franklin and Joseph M. Hellerstein and Wei Hong},
journal = {{ACM} Trans. Database Syst.},
number = 1,
pages = {122--173},
title = {TinyDB: an acquisitional query processing system for sensor networks},
volume = 30,
year = 2005
}
@inproceedings{DBLP:conf/sensys/Group05,
booktitle = {SenSys},
pages = 320,
publisher = {{ACM}},
title = {TinyOS 2.0},
year = 2005
}

View File

@ -8,4 +8,18 @@
@MISC{apple:battery,
howpublished = {https://support.apple.com/en-us/HT208387},
title = {iPhone Battery and Performance: Understand iPhone performance and its relation to your battery.}
}
% Optional fields: author, title, howpublished, month, year, note
@MISC{google:playstorestats,
author = {Google},
title = {Play Store Dashboards},
howpublished = {https://developer.android.com/about/dashboards/index.html}
}
% Optional fields: author, title, howpublished, month, year, note
@MISC{opensignal:androidfragmentation,
author = {OpenSignal},
title = {Android Fragmentation},
howpublished = {https://opensignal.com/reports/2015/08/android-fragmentation/}
}

View File

@ -5,11 +5,12 @@
\usepackage[inline]{enumitem}
\usepackage{xspace}
\usepackage{subcaption}
\usepackage{paralist}
\setlist[itemize]{leftmargin=*,partopsep=5pt}
\setlist[enumerate]{leftmargin=*,partopsep=5pt}
\newcommand{\tinysection}[1]{\medskip \noindent \textbf{#1}\hspace{4mm}}
\newcommand{\tinysection}[1]{\noindent \textbf{#1}\hspace{4mm}}
\newcommand{\citedquote}[2]{
\begin{center}\parbox{0.95\textwidth}{\textit{#2}\hfill\mbox{-{#1}}}\end{center}
}
@ -32,6 +33,8 @@
\end{center}
}
\newcommand{\trimfigurespacing}{\vspace*{-4mm}}
\newcommand{\PhoneLab}{\textsc{PhoneLab}\xspace}
\hyphenation{Phone-Lab}

View File

@ -1,6 +1,6 @@
% !TEX root = ../proposal.tex
As part of the planning proposal, the PIs undertok several efforts to build interest in the community.
As part of the planning proposal, the PIs undertook several efforts to build interest in the community.
The PIs established a website located at \url{http://pocketdata.info}, and undertook outreach in the form of talks and faculty discussions at major universities (RIT, OSU, Columbia, Harvard, The University of Washington, The University of Michigan, SUNY Binghamton, and the University of Houston).
The aim of these talks was to describe and advertise the proposed infrastructure, and to solicit interest and feedback from the community.
As part of these efforts, PI Kennedy organized, and PI Nandi participated in a panel entitled ``Small Data'' at ICDE 2017~\cite{kennedy2017small}.
@ -41,7 +41,7 @@ As noted previously, D. Richard Hipp, creator of SQLite (the world's most widely
\hline
\textbf{researcher} & \textbf{affiliation} & \textbf{research area} & \textbf{enabled research} \\ \hline
Stratos Idreos & \emph{Harvard} & Databases & Adaptive indexes\\
Andy Pavlo & \emph{CMU} & Databases & Predictve Indexing\\
Andy Pavlo & \emph{CMU} & Databases & Predictive Indexing\\
Christoph Koch & \emph{EPFL} & Databases/Theory & Database compilers\\
D. Richard Hipp & \emph{SQLite} & Embedded Databases & SQLite Metrics\\
Eugene Wu & \emph{Columbia} & Databases & Natural Interfaces\\
@ -59,7 +59,7 @@ Reza Taheri & \emph{VMWare} & Databases & Performance measureme
\end{figure}
\tinysection{Steering Committee}
D. Richard Hipp, Stratos Idreos, and Eugene Wu have agreed to serve on a steering committe for the project.
D. Richard Hipp, Stratos Idreos, and Eugene Wu have agreed to serve on a steering committee for the project.
Once per year, the PIs will send out the infrastructure, associated documentations, build scripts, as well as benchmarking scripts for feedback.
In response, the committee will provide feedback through a short survey and a virtual committee meeting.
The PIs will present to all collaborators the summary of the feedback, discuss implementation plans for next year, and salient changes to the infrastructure.
@ -75,7 +75,6 @@ Our primary direct costs beyond this point come from four sources:
\item The human resources needed to maintain and administrate the physical testbed.
\item The human resources needed to review and update our standard collection of workloads.
\end{enumerate*}
We plan to explore a variety of techniques for defraying these costs, starting with charging nominal fees for significant or sponsored use of the testbed.
For example, it is common practice for such testbeds to request participants to include line-item on grant budgets involving the infrastructure.
We also plan to explore options for industry involvement: Smartphone vendors like Google, Apple, and Microsoft have a vested interest in efficient data management on their platforms.

View File

@ -18,7 +18,7 @@ As we now discuss briefly, mobile systems break many of these assumptions, makin
\vspace{-0.3cm}
\caption{\small YCSB Benchmark A on a Nexus 6.}
\label{fig:elbow}
\vspace{-0.1in}
\trimfigurespacing
\end{wrapfigure}
Figure~\ref{fig:elbow} shows performance of three database libraries (SQLite, BerkeleyDB, and H2) on YCSB benchmark workload \textbf{A}, with Java's concurrent hash map used as an upper-bound on performance.
Each point represents another thread worth of load being offered to the database --- small points are individual runs, large points are averages of 10 runs.
@ -68,18 +68,19 @@ The result is as much as a 10x increase in measured compute cost, in spite of a
\end{subfigure}
\caption{Results for YCSB Workload C. One result is presented for each delay parameter.}
\label{fig:overview-clustered-C}
% \trimfigurespacing
\trimfigurespacing
\end{figure*}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tinysection{Assumption 3: App-level instrumentation has only a minimial impact on performance.}
\begin{wrapfigure}{r}{.5\textwidth}
\includegraphics[width=\linewidth]{graphics/Stacked_B}
\begin{wrapfigure}{r}{.45\textwidth}
\centering
\hspace*{-0.2\linewidth}\includegraphics[width=1.2\linewidth]{graphics/Stacked_B}
\caption{The Effect of Adding an Additional 1.5\% of Log Tags (to track Delay Time) from Android Userspace on Database Performance}
\label{fig:overview:2c3c-dirty-0ms-B}
% \trimfigurespacing
\trimfigurespacing
\end{wrapfigure}
In debugging frequency scaling, we wanted to measure how much more on-core time was spent processing queries.

View File

@ -6,6 +6,15 @@ In contrast, each smartphone app has a dedicated database, and is typically used
As a consequence, the database workload created by a typical app (e.g., Figure~\ref{fig:sampleFacebook}) is bursty, variable, and noisy.
Furthermore, each of the millions of apps that exist, or the dozens of apps on any given phone, will have its own workload characteristics.
\begin{wrapfigure}{r}{.45\textwidth}
\centering
\vspace*{-5mm}
\includegraphics[width=1.1\linewidth]{graphics/ChangeOverTimeData}
\caption{Sample Facebook Workload}
\label{fig:sampleFacebook}
\trimfigurespacing
\end{wrapfigure}
\tinysection{Trace-Based Workload Synthesis}
Although we will consider a range of options for synthesizing workloads, our primary approach will be to leverage traces gathered from the NSF-supported \PhoneLab platform to identify workload patterns that synthetic workloads can replicate.
Specifically, we collected traces of SQLite activity from the primary phones of students, faculty, and staff at UB: The first study involved 11 participant's phones over the course of a month in 2014~\cite{DBLP:conf/tpctc/KennedyACZ15}, while the second monitored 53 user's phones over the course of 21 days in 2017.
@ -18,14 +27,6 @@ Even if it did, it may be difficult to associate specific database operations wi
For example, latency sensitive operations like list scrolling can trigger rapid sequences of single-row or range queries~\cite{DBLP:conf/sigmod/EbensteinKN16}, but these database operations are likely to be offloaded to a worker thread so that they do not affect the user experience.
Hence, there may not be a direct correlation evident between the database activity and the triggering event.
\begin{wrapfigure}{r}{.5\textwidth}
\centering
\includegraphics[width=0.45\textwidth]{graphics/ChangeOverTimeData}
\caption{Sample Facebook Workload}
\label{fig:sampleFacebook}
\vspace{-0.1in}
\end{wrapfigure}
As a starting point, we will use a two-layer model of the trace files for each individual app: We use the trace file to first model the user's interactions with the app, and then model the effect of these interactions on the database.
Specifically, this approach treats the query log as a collection of \emph{sessions}, or bursts of database activity typically triggered by self-contained user activities, such as checking a Facebook feed or composing an email.
Hence, the two layers of the model are respectively a distribution of sessions over time, and then a distribution of query traces produced by each session.
@ -37,11 +38,13 @@ Third, we need to be able to label sessions that belong to the same group.
Finally, we need a way to synthesize realistic datasets for the queries to run on.
\begin{wrapfigure}{r}{.5\textwidth}
\centering
\includegraphics[width=0.45\textwidth]{graphics/WorkloadPredictibility}
\vspace{-0.5cm}
\caption{Workload Prediction Accuracy Comparison}
\label{fig:averagesimilarity}
\centering
\vspace*{-4mm}
\includegraphics[width=\linewidth]{graphics/WorkloadPredictibility}
\vspace{-0.5cm}
\caption{Workload Prediction Accuracy Comparison}
\label{fig:averagesimilarity}
\trimfigurespacing
\end{wrapfigure}
\tinysection{Defining Query Similarity}

View File

@ -13,8 +13,8 @@ For example, Android micro-manages CPU frequencies, reducing power use under lig
Counterintuitively, this means that a less efficient database can have lower latencies, as a consequence of higher load pinning the CPU at its highest frequency.
To date, there have been a few initial explorations of small-, personal-, or pocket-scale data management, both in academia and by industry:
\todo{This list is largely from the original proposal and still needs some love.}
\begin{itemize}
% \todo{This list is largely from the original proposal and still needs some love.}
\begin{enumerate*}
\item Arnab Nandi's group at Ohio State is exploring issues of query specification, database responsiveness, and database performance on smartphones and tablets.
\item Saarland University's Janiform Document project explores interactive manuscripts that include embedded, query-able research data and visualizations.
\item The vendors behind major databases including Oracle, SAP Labs, Facebook, LMDB, SQLite, MongoDB, and numerous others are all actively engaged in research on and development of embedded databases\footnote{At time of writing, Wikipedia~\cite{wikipedia:mobiledatabase} identifies ten distinct database libraries available for use.}.
@ -22,14 +22,14 @@ To date, there have been a few initial explorations of small-, personal-, or poc
\item The DAS Lab at Harvard's work on adaptive data management considers the challenges of specializing databases for small data.
\item The Just-in-Time data structures project at the University at Buffalo is data organization techniques for highly dynamic environments.
\item ICDE 2017 hosted an active panel discussion on ``Small Data''
\end{itemize}
\end{enumerate*}
In short, mobile data management, or \PocketData, is being recognized as an incredibly important problem. However, although it is important, the complexity of mobile systems makes research on \PocketData systems significantly more challenging compared to classical ``big-data'' research:
\begin{enumerate}
\begin{compactitem}
\item Smartphone hardware and operating systems self-regulate in unpredictable ways, making it difficult to obtain consistent, reproducible performance measures.
\item Data accesses on a smartphone may be triggered in response to a variety of touch or camera gestures, specific sensor inputs, recognized activities, network connectivity, or any of a range of other events, making it difficult to synthesize realistic workloads.
\item Small differences in hardware or operating system can lead to significant changes in system performance, but it is not reasonable to expect researchers to obtain dozens or hundreds of smartphones for testing purposes.
\end{enumerate}
\end{compactitem}
\tinysection{Infrastructure Overview}
This proposal aims to
@ -49,11 +49,12 @@ As part of the NSF-funded \PhoneLab{} testbed~\cite{DBLP:conf/sensys/NandugudiMK
For the planning phase of this proposal, we were able to gather additional log data from roughly 50 additional smartphones over the course of 20 days.
Goal 2 will provide synthetic workloads that are verifiably representative of real-world behaviors.
\textit{Component 3:} \textit{Deploy a diverse set of smartphones into a publicly accessible smartphone testbed.} \\[0.5mm]
As of September 2017, the Google play store registered 13 different versions of Android
\footnote{\url{https://developer.android.com/about/dashboards/index.html}}
running on tens of thousands of Android-compatible devices
\footnote{\url{https://opensignal.com/reports/2015/08/android-fragmentation/}}.
\textit{Component 3:} \textit{Package components 1 and 2 into a plug-and-play benchmarking tool.} \\[0.5mm]
Simply having the technology to benchmark innovations is insufficient. These tools must be simple, easy to deploy, and ideally should include resources to aid in interpreting the results.
\textit{Component 4:} \textit{Deploy a diverse set of smartphones into a publicly accessible smartphone testbed.} \\[0.5mm]
As of September 2017, the Google play store registered 13 different versions of Android~\cite{google:playstorestats}
running on tens of thousands of Android-compatible devices~\cite{opensignal:androidfragmentation}.
Small hardware differences like changes in power management heuristics can lead to wildly different performance measurements.
Hence, thorough performance evaluations require having access to a diverse array of different test devices.
We will develop such a resource and make it available to researchers.

View File

@ -13,27 +13,27 @@ In this section we present a few concrete projects that would benefit from \Pock
research for the PIs and the broader CISE community.
\subsection{Embedded Database Design}
Data management systems specialized to lightweight and embedded computers have arisen in two settings.
First, the TinyDB~\cite{DBLP:journals/tods/MaddenFHH05} system is part of a wireless sensor network stack called TinyOS~\cite{DBLP:conf/sensys/Group05}.
Like a stream processing system, TinyDB allows queries to be posed over (changing) data generated by sensor motes.
However, TinyDB proactively seeks out the requested data by contacting relevant motes and pushing computation into the network when appropriate.
Ultimately, while not explicitly intended for IoT devices, embedded databases like SQLite~\cite{sqlite} or BerkeleyDB~\cite{olson1999berkeley} have also been extensively adopted for use on mobile devices and embedded devices as tools for persistent data storage~\cite{DBLP:conf/tpctc/KennedyACZ15}.
Recently, researchers have started looking at real-time databases~\cite{Ramamritham:1993:RD:175320.175624} and their applicability to IoT styles workloads and applications~\cite{Tsiftes:2011:DS:2070942.2070974}. Most of this work has been done at the level of scheduling queries and transactions~\cite{DBLP:conf/rtss/Kang16,DBLP:journals/tkde/KangSS04,DBLP:journals/tc/XiongHLC08,DBLP:journals/tc/HanCXLMR14,Abbott:1988:SRT:44203.44209}, data models~\cite{Lam:2001:MDM:376868.376911}, developing query languages~\cite{Leite:2008:QLS:1621087.1621111} with
temporal constraints~\cite{Bodlaender:2000:TTR:338407.338675}, and database system design~\cite{Sivasankaran:1996:PAR:765750.765752,Kuo:1996:RDM:381854.381873,Sha:1988:CCD:44203.44210,Haque:2007:SCD:1404680.1404736,Kao:1999:UVM:319950.320020}.
\tinysection{Infrastructure Justification} Most embedded and embedded, real-time databases inherently operate over \PocketData{} styles workloads, as opposed to enterprise workloads. Currently, the only
available benchmarks and comparison methodologies for such embedded and embedded, real-time database systems are those geared for enterprise workloads. While these existing workloads allow for database designer to tune their infrastructures for low latency, high throughput, they provide little to no insight as to the workloads the DBMS will work on during their deployment. Indeed, our prior work has highlighted the shortcomings of existing benchmarking strategies. We examine this in more detail with targeted comparisons below.
Arnab says: (re DRH's letter:)
Btw this letter is a big one, and we should plaster its mention ( and that he has billions of devices as a deploy base ) across the entire grant ( if you have done that already, then great!)
\subsection{Adaptive Data Management}
Selecting the correct physical structure for a database under a given workload is an extremely challenging~\cite{Chaudhuri:1997:ECI:645923.673646,Chaudhuri:1998:ALI:276304.276337,Chaudhuri:2007:SDS:1325851.1325856,Agrawal:2000:ASM:645926.671701} part of database management.
The index selection problem becomes even harder when workload characteristics fluctuate rapidly or are not known in advance.
There is currently substantial interest in a breed of self-adjusting, adaptive index structures~\cite{idreos2007database,Idreos:2011:MWC:2002938.2002944} that address dynamic index selection by facilitating \textit{incremental, online} changes to the index.
Examples of adaptive indexes include Cracker Indexes~\cite{Idreos:2012:AIM:2247596.2247667,Idreos:2007:UCD:1247480.1247527,Halim:2012:SDC:2168651.2168652}, Adaptive Merge Trees~\cite{Graefe:2010:SSI:1739041.1739087,Graefe:2012:CCA:2180912.2180918}, SMIX~\cite{Voigt:2013:SSI:2484838.2484862}, H2O~\cite{163421}, and Just-in-Time Data Structures~\cite{kennedy2015just}.
Adaptive indexes automatically optimize their physical representation in response to incoming queries, reusing work used to answer the query to also improve subsequent queries. Given enough time, a stable workload, and queries that touch all data objects, an adaptive index eventually converges to a data representation similar to that of a static index.
Selecting the correct physical structure for a database under a given workload is an extremely challenging~\cite{Chaudhuri:1997:ECI:645923.673646,Chaudhuri:1998:ALI:276304.276337,Chaudhuri:2007:SDS:1325851.1325856,Agrawal:2000:ASM:645926.671701} part of database management. The index selection problem becomes even harder when workload characteristics fluctuate rapidly or are not known in advance. There is currently substantial interest in a breed of self-adjusting, adaptive index structures~\cite{idreos2007database,Idreos:2011:MWC:2002938.2002944} that address dynamic index selection by facilitating \textit{incremental, online} changes to the index. Examples of adaptive indexes include Cracker Indexes~\cite{Idreos:2012:AIM:2247596.2247667,Idreos:2007:UCD:1247480.1247527,Halim:2012:SDC:2168651.2168652}, Adaptive Merge Trees~\cite{Graefe:2010:SSI:1739041.1739087,Graefe:2012:CCA:2180912.2180918}, SMIX~\cite{Voigt:2013:SSI:2484838.2484862}, H2O~\cite{163421}, and Just-in-Time Data Structures~\cite{kennedy2015just}. Adaptive indexes automatically optimize their physical representation in response to incoming queries, reusing work used to answer the query to also improve subsequent queries. Given enough time, a stable workload, and queries that touch all data objects, an adaptive index eventually converges to a data representation similar to that of a static index.
\textbf{Infrastructure Justification:} Although there have been several efforts~\cite{Graefe:2010:BAI:1946050.1946063,schuhknecht2013uncracked} to develop benchmarks for adaptive indexes, these benchmarks rely on purely synthetic data and unit-tests rather than real-world scenarios.
This is in part because typical enterprise workloads rarely exhibit the type of drastic shifts that adaptive indexes target.
As a result most data management benchmarks evaluate systems under stable, steady-state workloads.
By contrast, \PocketData{} workloads often show extreme variation in both application demands and resource availability.
As a trivial example, an app might demand low-latency, low-power access to data when a user is actively using the phone, while admitting high-latency high-power organizational tasks when the phone is plugged in~\cite{Challen:2015:MWE:2699343.2699361}.
\tinysection{Infrastructure Justification} Although there have been several efforts~\cite{Graefe:2010:BAI:1946050.1946063,schuhknecht2013uncracked} to develop benchmarks for adaptive indexes, these benchmarks rely on purely synthetic data and unit-tests rather than real-world scenarios. This is in part because typical enterprise workloads rarely exhibit the type of drastic shifts that adaptive indexes target and exploit to achieve their performance benefits. As a result most data management benchmarks evaluate systems under stable, steady-state workloads. By contrast, \PocketData{} workloads often show extreme variation in both application demands and resource availability. We believe such workloads would be a useful testbed for adaptive indexes, not only because of changing nature, but also because the temporal gaps between query arrival times provide time for adaptation {\em prior} to query processing. As a trivial example, an app might demand low-latency, low-power access to data when a user is actively using the phone, while admitting high-latency high-power organizational tasks when the phone is plugged in~\cite{Challen:2015:MWE:2699343.2699361}.
\textbf{Community Interest:} \textit{Stratos Idreos}'s DAS lab at Harvard will use the \PocketData{} metrics and benchmark workloads to evaluate their work on adaptive data systems.
\citedquote{Stratos Idreos (Harvard)}{I think work on adaptive data systems could benefit. I assume Pocket Data will capture diverse workloads (from various apps) and so this would be a perfect environment to test adaptive data systems.
I have a new project on easy to design systems out of modules that can be synthesized. The input is workloads. Perhaps PocketData can provide a testing framework for such work for designing data systems for mobile environments.
}
\tinysection{Community Interest} \textit{Stratos Idreos}'s DAS lab at Harvard will use the \PocketData{} metrics and benchmark workloads to evaluate their work on adaptive data systems.
\citedquote{Stratos Idreos (Harvard)}{I think work on adaptive data systems could benefit. I assume Pocket Data will capture diverse workloads (from various apps) and so this would be a perfect environment to test adaptive data systems. I have a new project on easy to design systems out of modules that can be synthesized. The input is workloads. Perhaps PocketData can provide a testing framework for such work for designing data systems for mobile environments.}
%The PIs will likewise use these resources to evaluate their own work on Just-in-Time Data Structures.
\subsection{Small-Data Analytics and Personal Internet of Things}
@ -43,36 +43,30 @@ Apple's iTunes Store has an ``Apps for Healthcare Professionals'' section with d
These apps are part of a growing number of small-data analytics and personalized IoT applications that present new and intriguing opportunities for research.
For example, smartphone and tablet touch-based interfaces require a significant redesign of the way users pose queries~\cite{nandi2013querying,Nandi:2013:GQS:2732240.2732247,Jiang:2013:GMD:2536274.2536311,Erkan:2015:EGQ:2801948.2802006}.
Embedded databases create opportunities for more detailed, interactive academic manuscripts~\cite{Dit2015CIDR,Dittrich:2015:JIA:2824032.2824114} that help to ensure reproducible results.
The relatively limited compute and memory resources available on tablets and smartphones also demand new techniques for rapidly building visualizations of medium sized databases~\cite{Jiang:2015:SPI:2809974.2809986,Singh:2012:SRS:2213836.2213858,6228146,Nobari:2013:TIS:2463676.2463700}.
The relatively limited compute and memory resources available on tablets and smartphones also demand new techniques for rapidly building visualizations of medium sized databases~\cite{Jiang:2015:SPI:2809974.2809986,Singh:2012:SRS:2213836.2213858,6228146,Nobari:2013:TIS:2463676.2463700}. We expect interest and the number and type of applications to grow over the coming years as the
health care industry expands to include soft real-time applications for controlling wearable and implantable devices. Such proposed applications provide a computation platform over streaming
sensor data and rely on data analytics to asses a patients overall health.
\textbf{Infrastructure Justification:} Small-data analytics efforts are presently siloed, with most research efforts targeting entire software stacks, from the user interface front-end to the back-end database.
\tinysection{Infrastructure Justification} Small-data analytics efforts are presently siloed, with most research efforts targeting entire software stacks, from the user interface front-end to the back-end database.
The standard evaluation tools offered by the \PocketData{} benchmark would help to decouple the research challenges involved in small-data analytics and allow a broader community of researchers to contribute.
For example, an embedded database benchmark simulating a visual query interface workload would serve as a standard for evaluating novel algorithms, indexes, and data management tools.
\textbf{Community Interest:}
\tinysection{Community Interest} Small-Data Analytics and the Personal Internet of Things are both new and quickly growing fields of research. \PocketData{} is well placed to provide a common infrastructure
for evaluating and comparing proposed as well as existing systems. We have identified two researchers in this area that plan on leveraging or infrastructure.
\textit{Arnab Nandi} from Ohio State has offered to contribute traces of human interactions with his tools for gestural query specification to the \PocketData{} effort.
\textit{Jens Dittrich} of Saarland University is interested in connections between PocketData and his work on Janiform Documents~\cite{Dittrich:2015:JIA:2824032.2824114}.
\subsection{Data-Driven Apps}
Virtually all access to embedded databases on smartphones occurs through SQL statements that have been procedurally generated by apps --- Smartphone users do not manually write SQL queries.
In this respect, data-driven smartphone apps are similar to data-driven enterprise applications.
However, enterprise software is typically supported by experienced database administrators who can carefully fine-tune the database to efficiently support the application.
This is not the case for smartphone apps, which instead rely on compiler tools and software libraries to efficiently mediate access to persistent data.
In this respect, data-driven smartphone apps are similar to data-driven enterprise applications. However, enterprise software is typically supported by experienced database administrators who can carefully fine-tune the database to efficiently support the application. This is not the case for smartphone apps, which instead rely on compiler tools and software libraries to efficiently mediate access to persistent data.
Consequently, \PocketData{} offers new research opportunities at the interface between imperative programming languages like C, C\#, or Java, and back-end data management tools.
Forms of inline SQL like LinQ~\cite{box2007linq,Meijer:2006:LRO:1142473.1142552} have existed for nearly a decade, but are not frequently used in the development of smartphone apps.
Instead, app developers frequently rely on higher level primitives including object-relational mappers~\cite{Melnik:2007:CMB:1247480.1247532} (ORMs) like Hibernate~\cite{hibernate} to mediate access to the database.
Unfortunately, at present, most ORMs are implemented as libraries and lack the ability to introspect the invoking program.
This creates an impedance mismatch between the available information and SQL's declarative syntax, forcing ORMs to misuse SQL, or to rely on optional hints provided by the app developer to provide efficient data access.
In our preliminary exploration~\cite{pocketdata}, we found significant anti-patterns emerging in queries to SQLite.
Examples include the use of expensive \texttt{UPSERT} operations when \texttt{UPDATE}s would be sufficient, the use of multiple \texttt{SELECT} queries to dereference foreign-keys instead of using an outer-join query, and the use of separate read-then-write queries rather than in-place updates.
Several research efforts, including StatusQuo~\cite{StatusQuo}, Sloth~\cite{Cheung:2014:SLV:2588555.2593672}, and Truffle/Graal~\cite{wimmer2012truffle} have addressed similar problems in enterprise data-driven applications and could find new challenges in the \PocketData{} space.
Other research efforts explore data-flow in smartphones for performance optimization~\cite{yang-phd15,yang-icse15,rountev-cgo14} and correctness~\cite{yan-cgo14}, and would benefit from more detailed tools for introspection and measurement.
Instead, app developers frequently rely on higher level primitives including object-relational mappers~\cite{Melnik:2007:CMB:1247480.1247532} (ORMs) like Hibernate~\cite{hibernate} to mediate access to the database. Unfortunately, at present, most ORMs are implemented as libraries and lack the ability to introspect the invoking program. This creates an impedance mismatch between the available information and SQL's declarative syntax, forcing ORMs to misuse SQL, or to rely on optional hints provided by the app developer to provide efficient data access. In our preliminary exploration~\cite{pocketdata}, we found significant anti-patterns emerging in queries to SQLite. Examples include the use of expensive \texttt{UPSERT} operations when \texttt{UPDATE}s would be sufficient, the use of multiple \texttt{SELECT} queries to dereference foreign-keys instead of using an outer-join query, and the use of separate read-then-write queries rather than in-place updates. Several research efforts, including StatusQuo~\cite{StatusQuo}, Sloth~\cite{Cheung:2014:SLV:2588555.2593672}, and Truffle/Graal~\cite{wimmer2012truffle} have addressed similar problems in enterprise data-driven applications and could find new challenges in the \PocketData{} space. Other research efforts explore data-flow in smartphones for performance optimization~\cite{yang-phd15,yang-icse15,rountev-cgo14} and correctness~\cite{yan-cgo14}, and would benefit from more detailed tools for introspection and measurement.
\textbf{Infrastructure Justification:} Research on data-driven app development requires a detailed understanding of application requirements, and programming language research needs real-world workloads to demonstrate its viability.
The metrics that we propose to gather and the benchmark suite we propose to develop are critical for driving research in this space.
\tinysection{Infrastructure Justification} Research on data-driven app development requires a detailed understanding of application requirements, and programming language research needs real-world workloads to demonstrate its viability. The metrics that we propose to gather and the benchmark suite we propose to develop are critical for driving research in this space.
\textbf{Community Interest:} \textit{Atanas Rountev} of Ohio State will use \PocketData{} as part his work on control-flow and data-flow analysis to debug GUI responsiveness issues and as part of his LeakDroid project.
\tinysection{Community Interest} \textit{Atanas Rountev} of Ohio State will use \PocketData{} as part his work on control-flow and data-flow analysis to debug GUI responsiveness issues and as part of his LeakDroid project.
\subsection{Database-App Coupling}
Smartphone apps are integrated with the data management tools they use to a far greater degree than enterprise applications.
@ -81,32 +75,20 @@ Apps generate virtually all queries procedurally, making it possible to specify
Moreover, access to data often occurs through higher-level primitives like ORMs.
In short, although embedded databases are in principle capable of emulating stand-alone database engines, in practice they are used more as toolkits of data management building blocks.
The tight coupling between app and database promises to offer numerous avenues for workload-driven database optimization.
A leader in this area is BerkeleyDB.
Although BerkeleyDB does provide a SQL emulation front-end, its core functionality is to provide simple database building blocks like primary and secondary indexing, foreign-key consistency primitives, and transactional access to data.
Similar efforts are taking place across multiple industrial research labs and startup companies, as numerous organizations have begun to invest into embedded databases. Corporate investment in embedded databases includes MongoDB's WiredTiger~\cite{shakuntalagupta2015practical}, SAP's SqlAnywhere~\cite{4401024}, and Facebook's RocksDB, as well as open-source efforts including the H2 Database~\cite{mueller2006h2} and SQLite~\cite{sqlite}.
The tight coupling between database and the invoking application also admits possibilities for more aggressive database specialization.
Database compilers like DBToaster~\cite{kennedy2011dbtoaster,koch2013dbtoaster,Ahmad:2012:DHD:2336664.2336670}, HyPer/LLVM~\cite{Neumann:2011:ECE:2002938.2002940}, and Legorithmics~\cite{Klonatos:2013:ASO:2463676.2465334,Klonatos:2014:BEQ:2732951.2732959} aggressively compile and optimize database engines that are uniquely specialized for a specific application's query and update workload, as well as its underlying hardware.
As already noted above, many of these statistics are available at compile time, making the \PocketData{} setting an ideal candidate for deploying these applications.
A leader in this area is BerkeleyDB. Although BerkeleyDB does provide a SQL emulation front-end, its core functionality is to provide simple database building blocks like primary and secondary indexing, foreign-key consistency primitives, and transactional access to data. Similar efforts are taking place across multiple industrial research labs and startup companies, as numerous organizations have begun to invest into embedded databases. Corporate investment in embedded databases includes MongoDB's WiredTiger~\cite{shakuntalagupta2015practical}, SAP's SqlAnywhere~\cite{4401024}, and Facebook's RocksDB, as well as open-source efforts including the H2 Database~\cite{mueller2006h2} and SQLite~\cite{sqlite}. The tight coupling between database and the invoking application also admits possibilities for more aggressive database specialization. Database compilers like DBToaster~\cite{kennedy2011dbtoaster,koch2013dbtoaster,Ahmad:2012:DHD:2336664.2336670}, HyPer/LLVM~\cite{Neumann:2011:ECE:2002938.2002940}, and Legorithmics~\cite{Klonatos:2013:ASO:2463676.2465334,Klonatos:2014:BEQ:2732951.2732959} aggressively compile and optimize database engines that are uniquely specialized for a specific application's query and update workload, as well as its underlying hardware. As already noted above, many of these statistics are available at compile time, making the \PocketData{} setting an ideal candidate for deploying these applications.
\textbf{Infrastructure Justification:} Realistic evaluation of embedded databases and database compilers requires realistic workloads. Moreover, smartphones are one of the most prolific examples of embedded databases deployed in the wild. Given the variation in smartphone apps' data management requirements, even limited data releases by a single app developer will not be representative. The metrics we will gather, and the benchmark we are proposing will be key to helping researchers evaluate new embedded database tools.
\tinysection{Infrastructure Justification} Realistic evaluation of embedded databases and database compilers requires realistic workloads. Moreover, smartphones are one of the most prolific examples of embedded databases deployed in the wild. Given the variation in smartphone apps' data management requirements, even limited data releases by a single app developer will not be representative. The metrics we will gather, and the benchmark we are proposing will be key to helping researchers evaluate new embedded database tools.
\textbf{Community Interest:} \textit{Christoph Koch}'s DATA lab at EPFL is interested in using the \PocketData{} benchmark to evaluate their work on database compilers. \textit{Ashok Joshi} and \textit{Michael Brey} of Oracle are interested in participating in the \PocketData{} community to advance research on embedded databases.
\citedquote{Ashok Joshi (Senior Director at Oracle)}{I got some feedback from one of my colleagues on this topic. Yes, the real-world traces of embedded data usage would be useful; so would the benchmarking toolkit.}
\tinysection{Community Interest} \textit{Christoph Koch}'s DATA lab at EPFL is interested in using the \PocketData{} benchmark to evaluate their work on database compilers. \textit{Ashok Joshi} and \textit{Michael Brey} of Oracle are interested in participating in the \PocketData{} community to advance research on embedded databases.
% \citedquote{Ashok Joshi (Senior Director at Oracle)}{I got some feedback from one of my colleagues on this topic. Yes, the real-world traces of embedded data usage would be useful; so would the benchmarking toolkit.}
\citedquote{Michael Brey (Oracle)}{Within Oracle, we are always looking at how the industry both consumer and enterprise is using data in mobile applications. Things like db size, access patterns, single/multi user (multiple apps accessing same db), speed of access required, record size/structure etc. are all important to understand. We are also very interested in the movement of data from the device to some backend repository.}
%Additionally, PI Kennedy will make use of the same resources in his efforts on incremental computation.
\subsection{Enabled Research For the PIs}
The PIs have a joint research project aimed at exposing \emph{uncertainty} in mobile computing~\cite{Challen:2015:MWE:2699343.2699361}. The project focuses on exposing new language primitives to the programmer to specify multiple implementations of
system functionality allowing the system to pick which implementation to use at runtime. This allows the system to specialize software to a given hardware platform and more importantly to a given set of external
considerations (e.g. network connectivity, available sensors, etc.). Our proposed infrastructure will enable us to study two key aspects of uncertainty: (1) Almost all mobile applications store user data and configuration parameters in
mobile databasesand access to this data can have a profound impact on the behavior of an application. \PocketData{} will allow us to more readily study this aspect of mobile uncertainty; (2) The infrastructure powering our
runtime system for exposing uncertainty is built around a mobile database that stores possible choices the software system can make. \PocketData{} will allow us to optimize this database to reduce choice latency.
The PIs have a joint research project aimed at exposing \emph{uncertainty} in mobile computing~\cite{Challen:2015:MWE:2699343.2699361}. The project focuses on exposing new language primitives to the programmer to specify multiple implementations of system functionality allowing the system to pick which implementation to use at runtime. This allows the system to specialize software to a given hardware platform and more importantly to a given set of external considerations (e.g. network connectivity, available sensors, etc.). Our proposed infrastructure will enable us to study two key aspects of uncertainty: (1) almost all mobile applications store user data and configuration parameters in mobile databases and access to this data can have a profound impact on the behavior of an application. \PocketData{} will allow us to more readily study this aspect of mobile uncertainty; (2) the infrastructure powering our runtime system for exposing uncertainty is built around a mobile database that stores possible choices the software system can make. \PocketData{} will allow us to optimize this database to reduce choice latency.
PIs Kennedy and Ziarek have a joint research project, Just-in-Time Data Structures (JITDs), focusing on adaptive indexing~\cite{kennedy2015just}.
The project explores the use of standardized, composable data structure building blocks to dynamically assemble indexes that adapt to rapidly changing workload requirements.
The level of variation in load and resource availability that occurs in \PocketData{} workloads creates an ideal use-case for JITDs.
As noted above, our proposed infrastructure will provide us with a benchmark workload that will help us to evaluate adaptive indexes under real-world conditions, rather than through purely synthetic workloads.
PIs Kennedy and Ziarek have a joint research project, Just-in-Time Data Structures (JITDs), focusing on adaptive indexing~\cite{kennedy2015just}. The project explores the use of standardized, composable data structure building blocks to dynamically assemble indexes that adapt to rapidly changing workload requirements. The level of variation in load and resource availability that occurs in \PocketData{} workloads creates an ideal use-case for JITDs. As noted above, our proposed infrastructure will provide us with a benchmark workload that will help us to evaluate adaptive indexes under real-world conditions, rather than through purely synthetic workloads. Additionally, it will open op opportunities for exploring polices of continual adaptation --- adaptation that occurs when no queries are actively being evaluated.
PI Kennedy is part of a collaborative research project with \textit{Shambhu Upadhyaya} (UB), \textit{Varun Chandola} (UB), \textit{Hung Ngo} (UB), and \textit{Long Nguyen} (UMich) that explores techniques for identifying insider attacks on databases (NSF-CNS-1409551).
Although the threat of insider attacks on mobile devices may be minimal, the specific methodology behind the work involves summarizing query logs by creating clusters of queries with similar ``intent.''
@ -114,4 +96,4 @@ The approach is showing promise for summarizing query logs from a corporate (ban
Having query logs from other settings like \PocketData{} would show that the approach can be generalized to domains other than Insider Threat detection (for example to the design of index selection tools).
If successful, these efforts could also contribute back to the \PocketData{} project, as a tool for quickly summarizing and clustering query logs would help to build out the visualization and benchmark design components of the proposed infrastructure.
The project will provide a foundational evaluation and test bed for Projects such as PI Nandis NSF 1453582, “Querying Beyond Keyboards”, which focuses on a gesture and touch driven database system targeted towards smartphones and tablets.
The project will provide a foundational evaluation and test bed for PI Nandi's NSF project ``Querying Beyond Keyboards'' (IIS-1453582), which focuses on a gesture and touch driven database system targeted towards smartphones and tablets. PI Rountev's NSF projects ``LeakDroid: Exposing leaks and jank in Android applications'' (CCF-1319695) and ``Control-flow and data-flow analysis of Android software'' (CCF-1526459) aim at analyzing and improving the performance of Android apps, with focus on resource leaks (e.g., leaking of database cursors) and poor responsiveness (e.g., due to database operations in the UI thead) \cite{yan-issre13,yang-mobs13,wang-mobilesoft16,zhang-ast16,yang-ase15,wu-cc16}. The testbed will be used in these projects to evaluate app performance and to measure the effects of code optimizations such as automated refactoring that moves database operations outside of the app's UI thread. For the purposes of test evaluation and generation, static app code analysis will map the query events to the app components that issued them, and will compute test coverage metrics and profiling information at the code/GUI level; this information is of great value to app developers and testers.

Binary file not shown.

View File

@ -20,22 +20,33 @@ published at\\
\url{http://www.buffalo.edu/research/research-services/ub-rates-and-facts/ub-and-rf-rates.html}.
\subsection*{Supplies and Publication Costs}
\textbf{Mobile Phone Testbed: } We aim to initially purchase 2 instances each of 10 different devices, for a total of 20 phones. Android phones such as the Android One Moto X4, Google Pixel 2, and Sony Experia XZ range in price from \$200 to \$600. Assuming an average price of \$500, the total requested budget for phones in year 1 is \$10,000. We will purchase a server to act as a command and control system. We have tentatively selected a Dell PowerEdge T640 Tower, which costs \$4,398 in an appropriate configuration. We further budget \$602 for miscelaneous infrastructure assembly expenses such as additional power supplies, cables, cooling, or mounting brackets and surfaces.
\textbf{Mobile Phone Testbed: }
We plan for the testbed to initially consist of 10 different mobile devices, and plan to add 5 devices per year to ensure that the testbed stays up-to-date with the latest hardware advancements.
We will purchace 2 of each device to ensure that we have replacement parts on hand, as well as the ability to scale out if necessary.
In summary, we will purchase 20 devices in year 1, and a further 10 devices per year.
Android phones such as the Android One Moto X4, Google Pixel 2, and Sony Experia XZ range in price from \$200 to \$600. Assuming an average price of \$500, the total requested budget for phones in year 1 is \$10,000 and a further \$5,000 in year 2 and in year 3.
We will purchase a server to act as a command and control system. We have tentatively selected a Dell PowerEdge T640 Tower, which costs \$4,398 in an appropriate configuration. We further budget \$602 for miscelaneous infrastructure assembly expenses such as additional power supplies, cables, cooling, or mounting brackets and surfaces.
In total, the requested budget for the mobile phone testbed is \$25,000 over 3 years.
\noindent \textbf{Publication Costs: } We have budgeted \$1,000 for publication costs to enable us to publicize the proposed infrastructure and to disseminate results and surveys via open access journals.
\subsection*{Travel}
The budget allocates travel funds for the PIs and/or supported graduate research assistants to advertise the infrastructure through conferences and talks at other universities. Because the location of conferences is not known more than
a year in advance, the countries that we will travel to are currently unknown. The PIs have budgeted \$6,000 dollars
of domestic travel and \$8,000 dollars of international travel in each year of the grant. This roughly amounts to
sending two individuals to one international conference (Roughly \$4,000 each) and one domestic conference (Roughly \$2,000 each) per year, as well as sending one individual to give two invited talks (Roughly \$1,000 each).
The budget allocates travel funds for the PIs and/or supported graduate research assistants.
These funds will be used for collaborative site meetings, and to advertise the infrastructure through conferences and talks at other universities.
Because the location of conferences is not known more than a year in advance, the countries that we will travel to are currently unknown.
The PIs have budgeted \$6,000 dollars of domestic travel and \$8,000 dollars of international travel in each year of the grant.
This roughly amounts to (1) sending two individuals to one international conference (Roughly \$4,000 each), (2) sending two individuals to one domestic conference (Roughly \$2,000 each) per year, (3) sending two individuals to give one invited talk each (Roughly \$500 each), and (4) sending three individuals to a site meeting in Columbus (Roughly \$300 each).
\subsection*{Tuition}
Tuition is charged onto the grant as per the university's tuition policy.
Tuition for the graduate research assistants is budgeted according to the
university out-of-states rates available at\\
\url{http://www.buffalo.edu/research/research-services/ub-rates-and-facts/tuition-rates-for-budgeting.html}.
university out-of-states rates available at
\begin{center}
{\small
\url{http://www.buffalo.edu/research/research-services/ub-rates-and-facts/tuition-rates-for-budgeting.html}
}
\end{center}
The total tuition budget for the grant is \$69,264.
\subsection*{Computer Services Fee}

View File

@ -112,7 +112,7 @@ We have established a preliminary infrastructure review committee consisting of
\noindent The initial infrastructure review committee will include:
\begin{compactitem}
\item Stratos Idreos (Harvard)
\item D. Richard Hipp (Hipp, Wyrick \& Company, Inc.)
\item D. Richard Hipp (SQLite)
\item Eugene Wu (Columbia)
\end{compactitem}
\end{document}