Read-through results

master
Oliver Kennedy 2016-01-20 13:22:26 -05:00
parent 56c7396965
commit 3ae0b7ded8
16 changed files with 140 additions and 98 deletions

View File

@ -1,8 +1,8 @@
% !TEX root = ../fullproposal.tex
In a preliminary study~\cite{pocketdata}, we instrumented Android smartphones being used as the primary device of 11 UB students, faculty and staff for a period of one month.
The SQLite embedded database included as part of the Android platform was modified to log a trace of all queries executed, along with metadata such as the number of rows returned, time taken, and the application process executing the query.
The SQLite embedded database included as part of the Android platform was modified to log a trace of all SQL statements executed, along with metadata such as the number of rows returned, time taken, and the application process that issued the statement.
To protect participant privacy, our instrumentation removed as much personally-identifying information as possible and recorded prepared statement arguments only as hash values.
With participant permission, we have made these traces publicly available.
With participant permission, we have made these traces publicly available~\cite{pocketdata}.
We conducted a preliminary analysis to summarize these traces, the key parts of which we summarize here to provide a sense of the type of information that we will make available to the \PocketData{} community.
We captured approximately 45 million statements executed by SQLite over the 1 month period.
@ -31,7 +31,7 @@ Figure~\ref{fig:coarseSelectComplexity} shows the distribution of \texttt{SELECT
Even at this coarse-grained view of query complexity, the read-only portion of the embedded workload distinguishes itself from existing TPC benchmarks.
Like TPC-C~\cite{tpcc}, the vast majority of the workload involves simple, small requests for data that touch a small number of tables.
29.15 million, or about 87\% of the \texttt{SELECT} queries were simple select-project-join queries. Of those, 28.72 million or about 86\% of all queries were simple single-table scans or look-ups. In these queries, which form the bulk of SQLite's read workload, the query engine exists simply to provide an iterator over the relationally structured data it is being used to store.
Conversely, the workload also has a tail that consists of complex, TPC-H-like~\cite{tpch} queries. Several hundred thousand queries involve at least 2 levels of nesting, and over a hundred thousand queries access 5 or more tables. As an extreme example, our trace includes 10 similar \texttt{SELECT} queries issued by the Google Play Games Service, each of which accesses up to 8 distinct tables to combine developer-provided game state, user preferences, device profile meta-data, and historical game-play results from the user.
Conversely, the workload also has a tail that consists of complex, TPC-H-like~\cite{tpch} queries. Several hundred thousand queries involve at least 2 levels of nesting, and over a hundred thousand queries access 5 or more tables. As an extreme example, our trace includes 10 similar \texttt{SELECT} queries issued by the Google Play Games Service, each of which accesses up to 8 distinct tables to combine and summarize developer-provided game state, user preferences, device profile meta-data, and historical game-play results from the user.
\begin{figure}
\centering
@ -47,13 +47,48 @@ This query would have a join width of 2 (\texttt{R}, \texttt{S}) and 2 conjuncti
% For uniformity, \texttt{NATURAL JOIN} and \texttt{JOIN ON} (\textit{e.g.}, \texttt{SELECT R.A from R JOIN S ON B}) expressions appearing in the \texttt{FROM} clause are rewritten into equivalent expressions in the \texttt{WHERE} clause.
The first column of this table indicates queries to a single relation. Just over 1 million queries were full table scans (0 where clauses), and just under 27 million queries involved only a single conjunctive term. This latter class constitutes the bulk of the simple query workload, at just over 87\% of the simple look-up queries. Single-clause queries appear to be the norm.
\begin{figure*} \begin{subfigure}[t]{0.5\textwidth} \centering \includegraphics[width=0.9\textwidth]{graphs/select_count_cdf_by_app} \caption{} \label{fig:selectByApp:all} \end{subfigure}% \begin{subfigure}[t]{0.5\textwidth} \centering \includegraphics[width=0.9\textwidth]{graphs/select_percent_simple_cdf_by_app} \caption{} \label{fig:selectByApp:simple} \end{subfigure}% \caption{\textbf{Breakdown of \texttt{SELECT} queries by app. (a) Cumulative distribution of applications by the number of \texttt{SELECT} queries issued (note the logarithmic scale). (b) Cumulative distribution of applications by the percent of the app's \texttt{SELECT} queries that are key value queries (full table scans or exact key look-ups).}} \label{fig:selectByApp} \end{figure*}
\begin{figure*}
\begin{subfigure}[t]{0.5\textwidth}
\centering
\includegraphics[width=0.9\textwidth]{graphs/select_count_cdf_by_app}
\caption{}
\label{fig:selectByApp:all}
\end{subfigure}%
\begin{subfigure}[t]{0.5\textwidth}
\centering
\includegraphics[width=0.9\textwidth]{graphs/select_percent_simple_cdf_by_app}
\caption{}
\label{fig:selectByApp:simple}
\end{subfigure}%
\caption{\textbf{Breakdown of \texttt{SELECT} queries by app. (a) Cumulative distribution of applications by the number of \texttt{SELECT} queries issued (note the logarithmic scale). (b) Cumulative distribution of applications by the percent of the app's \texttt{SELECT} queries that are key value queries (full table scans or exact key look-ups).}}
\label{fig:selectByApp}
\end{figure*}
Over the course of the one-month trace we observed 179 distinct apps, varying from built-in Android applications such as \textit{Gmail} or \textit{YouTube}, to video players such as \textit{VLC}, to games such as \textit{3 Kingdoms}. Figure~\ref{fig:selectByApp:all} shows the cumulative distribution of apps sorted by the number of queries that the app performs. The results are extremely skewed, with the top 10\% of apps each posing more than 100 thousand queries over the one month trace. The most query-intensive system service, \textit{Media Storage} was responsible for 13.57 million queries or just shy of 40 queries per minute per phone. The most query-intensive user-facing app was \textit{Google+}, which performed 1.94 million queries over the course of the month or 5 queries per minute.
At the other end of the spectrum, the bottom 10\% of apps posed as few as 30 queries over the entire month.
We noted above that a large proportion of \texttt{SELECT} queries were exact look-ups; Indeed many applications running on the device are using SQLite as a simple key-value store. As seen in Figure~\ref{fig:selectByApp:simple}, for 24 apps (13.4\%), we observed \emph{only} key-value queries during the entire, month-long trace.
We noted above that a large proportion of \texttt{SELECT} queries were exact look-ups; Indeed many applications running on the device are using SQLite as a simple key-value store. As seen in Figure~\ref{fig:selectByApp:simple}, for 24 apps (13.4\%), we observed \emph{only} queries that would have been supported by a trivial key-value API for the full span of the month-long trace.
\begin{figure} \begin{subfigure}[t]{0.5\textwidth} \centering \includegraphics[width=0.9\textwidth]{graphs/data_mod_ops_cdf_by_app} \caption{} \label{fig:updateByApp:modOps} \end{subfigure}% \begin{subfigure}[t]{0.5\textwidth} \centering \includegraphics[width=0.9\textwidth]{graphs/read_write_ratio_cdf_by_app} \caption{} \label{fig:updateByApp:writeRatio} \end{subfigure}% \caption{\textbf{App-level write behavior. (a) Cumulative distribution of applications by number of data manipulation statements performed (note the logarithmic scale). (b) Cumulative distribution of applications by read/write ratio. }} \label{fig:updateByApp} \end{figure} Figure~\ref{fig:updateByApp:modOps} illustrates app-level write workloads, sorting applications by the number of \texttt{INSERT}, \texttt{UPSERT}, \texttt{UPDATE}, and \texttt{DELETE} operations that could be attributed to each. The CDF is almost perfectly exponential, suggesting that the number of write statements performed by any given app follows a long-tailed distribution, a feature to be considered in the design of a pocket data benchmark. Figure~\ref{fig:updateByApp:writeRatio} breaks apps down by their read/write ratio. Surprisingly, 25 apps (14\% of the apps seen) did not perform a single write over the course of the entire trace. Manual examination of these apps suggested two possible explanations. Several apps have reason to store state that is updated only infrequently. For example, \textit{JuiceSSH} or \textit{Key Chain} appear to use SQLite as a credential store. A second, far more interesting class of apps includes apps like \textit{Google Play Newsstand}, \textit{Eventbrite}, \textit{Wifi Analyzer}, and \textit{TuneIn Radio Pro}, which all have components that query data stored in the cloud. We suspect that the cloud data is being encapsulated into a pre-constructed SQLite database and being pushed to, or downloaded by the client applications. This type of behavior might be compared to a bulk ETL process or log shipment in a server-class database workload, except that here, the database has already been constructed. Pre-caching through database encapsulation is a unique feature of embedded databases, and one that is already being used in a substantial number of apps.
\begin{figure}
\begin{subfigure}[t]{0.5\textwidth}
\centering
\includegraphics[width=0.9\textwidth]{graphs/data_mod_ops_cdf_by_app}
\caption{}
\label{fig:updateByApp:modOps}
\end{subfigure}%
\begin{subfigure}[t]{0.5\textwidth}
\centering
\includegraphics[width=0.9\textwidth]{graphs/read_write_ratio_cdf_by_app}
\caption{}
\label{fig:updateByApp:writeRatio}
\end{subfigure}%
\caption{\textbf{App-level write behavior. (a) Cumulative distribution of applications by number of data manipulation statements performed (note the logarithmic scale). (b) Cumulative distribution of applications by read/write ratio. }}
\label{fig:updateByApp}
\end{figure}
Figure~\ref{fig:updateByApp:modOps} illustrates app-level write workloads, sorting applications by the number of \texttt{INSERT}, \texttt{UPSERT}, \texttt{UPDATE}, and \texttt{DELETE} operations that could be attributed to each. The CDF is almost perfectly exponential, suggesting that the number of write statements performed by any given app follows a long-tailed distribution, a feature to be considered in the design of a pocket data benchmark.
Figure~\ref{fig:updateByApp:writeRatio} breaks apps down by their read/write ratio. Surprisingly, 25 apps (14\% of the apps seen) did not perform a single write over the course of the entire trace. Manual examination of these apps suggested two possible explanations. Several apps have reason to store state that is updated only infrequently. For example, \textit{JuiceSSH} or \textit{Key Chain} appear to use SQLite as a credential store. A second, far more interesting class of apps includes apps like \textit{Google Play Newsstand}, \textit{Eventbrite}, \textit{Wifi Analyzer}, and \textit{TuneIn Radio Pro}, all of which have components that query data stored in the cloud. We suspect that the cloud data is being encapsulated into a pre-constructed SQLite database and being pushed to, or downloaded by the client applications.
This type of behavior might be compared to a bulk ETL process or log shipment in a server-class database workload, except that here, the database has already been constructed. Pre-caching through database encapsulation is a unique feature of embedded databases, and one that is already being used in a substantial number of apps.
\begin{figure*}[t]
\centering
@ -79,15 +114,15 @@ We noted above that a large proportion of \texttt{SELECT} queries were exact loo
Figure~\ref{fig:app} shows query interarrival times, runtimes, and returned row
counts for ten of the most active SQLite clients. As seen in
Figure~\ref{fig:app:interarrival}, a 0.01Hz periodicity in arrival times is not unique to any one
application, suggesting filesystem locking as a culprit. Two of the most
Figure~\ref{fig:app:interarrival}, a 0.01Hz periodicity in arrival times is common to all
applications, suggesting filesystem locking as a culprit. Two of the most
prolific SQLite clients, \textit{Google Play services} and \textit{Media Storage}
appear to be very bursty: 70\% of all statements for these applications are issued
within 0.1ms of the previous statement. Also interesting is the curve for queries
issued by the \textit{Android System} itself. The interarrival time CDF appears
to be almost precisely logarithmic for rates above 10$\mu$s, but has a notable lack
of interarrival times in the 1ms to 10ms range. This could suggest caching
effects, with the cache expiring after 1ms.
effects, with the cache expiring after 1ms.
As seen in Figure~\ref{fig:app:runtime}, most apps hold to the average runtime of
100$\mu$s, with several notable exceptions. Over 50\% of the
@ -103,7 +138,7 @@ the number of rows returned in general varies much more widely. Many of these
apps' user interfaces have both a list and a search view that show multiple records
at a time, suggesting that these views are backed directly by SQLite. Although all
apps have long tails, two apps in particular: \textit{Gmail} and \textit{Google+} are
notable for regularly issuing queries that return on the order of 100 rows.
notable for regularly issuing queries that return on the order of hundreds of rows.
\begin{figure*}
\centering
@ -119,18 +154,19 @@ notable for regularly issuing queries that return on the order of 100 rows.
\label{fig:burstiness}
\end{figure*}
Figure~\ref{fig:burstiness} shows variations in query burstiness across multiple apps and users\footnote{The PIs have already incorporated material from this proposal into their coursework. Figure \ref{fig:burstiness} is from a student report~\cite{ramamurthy2015pocketdata} from UB's CSE-662, jointly instructed by PIs Kennedy and Ziarek.}.
Figure~\ref{fig:burstiness} shows variations in query burstiness across multiple apps and users\footnote{The PIs have already incorporated material from this proposal into their coursework. Figure \ref{fig:burstiness} is from a student report~\cite{ramamurthy2015pocketdata} from UB's CSE-662, jointly instructed by PIs Kennedy and Ziarek. The student group performed an app-centric analysis of the query traces.}.
Two features immediately emerge from this data.
First, \PocketData{} workloads are extremely bursty; The default steady state is completely idle, with infrequent bursts of hundreds of operations per second.
Second, the nature of these bursts varies significantly by the calling app; In this trace Facebook generates a read-only workload, while Whatsapp produces two bursts each with a distinct mix of updates, inserts, deletes, and selects.
\medskip
We plan to freely releasing aggregate metrics about database usage patterns in
We will freely release aggregate metrics about database usage patterns in
embedded smartphone databases.
We also plan to make our source traces available under IRB-approved conditions.
We also plan to make our source traces available to researchers with approval
from their institution's IRB.
By doing this, we will enable other researchers to begin exploring the bottlenecks
in and practical limitations of existing embedded databases and abstraction layers
like object-relational mappers developed over them.
in and practical limitations of existing embedded databases, as well as in abstraction layers
like object-relational mappers.
Better understanding the space will help to identify new research challenges, and
help to encourage researchers to join the \PocketData{} community.

View File

@ -3,7 +3,8 @@
We will provide an instrumentation toolkit for the \PocketData{} community. The goal of this toolkit is twofold: (1) Gathering usage traces and metrics from phones deployed in real-world settings, and (2) Reliably measuring system performance on simulated and replayed \PocketData{} workloads.
There are several challenges unique to the \PocketData{} setting that make instrumenting smartphone embedded databases difficult.
The simplest of these is that smartphones rely on specialized operating systems, hardware, and virtualization that can make it difficult to deploy existing measurement tools designed for desktops.
Many of these tools are easily portable, but there are several more subtle and difficult challenges involved in instrumentation.
Many of these tools can be ported and we will endeavor to supplement existing community efforts in doing so.
There are also several more subtle challenges specific to instrumenting \PocketData{}.
A key challenge is the types of bottlenecks that \PocketData{} workloads encounter.
Typical metrics for enterprise benchmarks include throughput at saturation, joules per unit of throughput, and throughput vs latency curves.

View File

@ -5,7 +5,7 @@ suite, which will serve three roles for the \PocketData{} community.
First, a benchmark will foster research on embedded databases by
creating a realistic standard for evaluation, allowing for fair comparisons
across competing research efforts.
Second, by providing a precise set of metrics to optimize, a benchmark
Second, by providing a precise set of metrics to optimize for, a benchmark
will serve to guide the research community's efforts towards pertinent
real-world challenges faced by smartphone applications.
@ -16,9 +16,9 @@ effort to track changes in app usage behaviors and bottlenecks.
We will develop a modular benchmark along the lines of
PolePosition~\cite{poleposition}, driven by modules that
capture the semantics and behavior of a class of applications.
Using the metrics data that we gather and release, we will lead a
an effort to continually monitor for changes in app usage patterns,
and how phone users engage with data-driven apps.
Based on the metrics gathering efforts discussed above, we will lead
an effort to continually monitor app's data usage patterns for changes,
as well as for changes in how phone users engage with data-driven apps.
As new patterns are discovered by the \PocketData{} community,
we will maintain \textit{a repository of modules describing these
behaviors}.
@ -31,8 +31,8 @@ Ideally, we will be able to link individual queries to triggering events (user i
Although we hope to automate this process eventually, our initial approach will be to focus on one app at a time.
This will not only help us to better understand the space, but also to generate realistic datasets by being able to analyze the specific app's schema and updates/inserts.
The Application tier of the full benchmark will consist of a representative cover of the 179 apps that we encountered in our preliminary analysis, as well as apps that we encounter in subsequent data gathering efforts.
The User tier will simulate the complete phone environment; Statistics for single user include of a cluster of app modules, and patterns of charging (when is the phone plugged in?), network access (when is the phone on the internet, and with what quality?), and other behavioral traits that impact app data access patterns.
To simulate users, we will use standard clustering techniques on our trace data to create both canonical user profiles, and to identify natural variation around those profiles.
The User tier will simulate the complete phone environment. Statistics for single user include of a cluster of app modules, patterns of charging behavior (when is the phone plugged in?), network access (when is the phone on the internet, and with what quality?), and other behavioral traits that impact app data access patterns.
To simulate users, we will use standard clustering techniques on our trace data first to create canonical user profiles, and then to identify natural variation around those profiles.
It is reasonable to ask why a specialized \PocketData{} database
@ -45,12 +45,12 @@ and mobile software.
Although AndroBench does include a component for simulating
the filesystem access patterns of SQLite, neither of these
benchmarks explicitly generates the structured data access patterns
necessary to evaluate a data management system.
necessary to evaluate a complete data management system.
Previous research efforts~\cite{jeong2013iostack} have used mobility
traces generated by MobiGen, fed to a virtual machine running
standard apps to generate semi-realistic traces of embedded
common apps such as Facebook to generate semi-realistic traces of embedded
database access patterns.
Although our approach follows a more principled approach based
Although \PocketData{} follows a more principled approach based on
real-world traces, the metrics we release could be used to validate
and standardize data generation tricks of this sort.
@ -68,17 +68,17 @@ The most intensive database user in our preliminary study,
\textit{Google Play services} had 14.8 million statements attributed
to it, just under half of which were writes.
This equates to about one write every 3 seconds, which is substantial
from a power management and latency perspective, but for concurrency.
from a power management and latency perspective, but which is unlikely
to create a concurrency bottleneck.
Second, many OLAP benchmarks focus on comparatively simple
queries.
This is reasonably descriptive of a notable portion of the workload we
observed in our preliminary study:
A notable portion of the workload we observed in our preliminary study can indeed be described as simple:
13\% of the applications we observed had a read workload that
consisted exclusively of key/value queries, and over half of the applications
we observed had a workload that consisted of at least 80\% key/value queries.
However, the remaining queries are not as simple.
The more complex queries we observed in our preliminary study include
However, the trace also exhibited a long tail of extremely complex queries.
A small, but significant number of queries we observed in our preliminary study include
multiple levels of query nesting, wide joins, and extensive use of aggregation.
As such, they more closely resemble analytics (OLAP) workload benchmarks
such as TPC-H~\cite{tpch}, The Star-Schema Benchmark~\cite{ssb}, and
@ -98,11 +98,11 @@ PolePosition simulates the behavior of specific data structure abstractions
that need to be backed by a data management system.
Because data structures are defined using higher-level operational
semantics rather than through a fixed database API, databases are
allowed to specialize for specific access patterns that the database may
allowed to specialize benchmarks to specific access patterns that the database may
be optimized for.
The fundamental goals of PolePosition and the \PocketData{} benchmark
are similar, but \PocketData{} will operate at a higher level of abstraction,
capturing the behavior of entire apps and users engaging with those apps.
capturing the behavior of entire apps, as well as users that engage with those apps.

View File

@ -1,10 +1,10 @@
% !TEX root = ../fullproposal.tex
Even the month-long trace of 11 user's queries on which our preliminary study was based included over 45 million SQL statements.
Even the short, month-long query trace with only 11 users on which our preliminary study was based included over 45 million SQL statements.
As the experiment is scaled up, analyzing these query traces will become increasingly difficult.
Compounding the issue, the comparatively high complexity many of the the queries makes it difficult to flatten the SQL parse trees into a simple relational format for analysis.
Our preliminary analysis required repeated iterations of our feature extraction process: We would define a procedure for extracting interesting features of a sql query parse tree, construct a visualization from the extracted feature, and then identify a new feature of interest.
Compounding the issue, the comparatively high complexity of many of the the queries makes it difficult to flatten the SQL parse trees into a simple relational format for analysis.
Our preliminary analysis required repeated iterations of our feature extraction process: We would define a procedure for extracting interesting features of a SQL statement's parse tree, construct a visualization from the extracted feature, and then identify a new feature of interest.
As part of the proposed work, we will release tools for analyzing query logs that streamline this iterative process, by making it easy define new feature extractors.
As feature extraction is an embarrassingly parallel task, simple optimizations like caching, parallelism, and incremental computation~\cite{kennedy2011dbtoaster} can be used to make these tools extremely efficient\footnote{As a comment on the utility of specialized tools for log analysis, the previously mentioned CSE-662 project involved analyzing \PocketData{} logs. The four students began with a naive analysis tool (written by the students in Java) that took multiple hours to complete one iteration of the analytics cycle. By the end of the course, they had optimized the tool to run in under 10 seconds~\cite{ramamurthy2015pocketdata}.}.
As feature extraction is an embarrassingly parallel task, simple optimizations like caching, parallelism, and incremental computation~\cite{kennedy2011dbtoaster} can be used to make these tools extremely efficient\footnote{As a comment on the utility of specialized tools for log analysis, we return to the CSE-662 project involving analyzing \PocketData{} logs. The four students began with a naive analysis tool (written by the students in Java) that took multiple hours to complete one iteration of the analytics cycle. By the end of the course, they had optimized the tool to run in under 10 seconds~\cite{ramamurthy2015pocketdata}.}.
Source code for all visualizations that we release as part of our summary metrics will be released to the public to further encourage community participation in \PocketData{}.

View File

@ -1,6 +1,6 @@
% !TEX root = ../fullproposal.tex
We will build an initial \PocketData{} community and facilitate engagement with the broader CISE community through outreach efforts including attending poster sessions and hosting workshops and tutorials
We will build an initial \PocketData{} community and facilitate engagement with the broader CISE community through outreach efforts including attending poster and demo sessions and hosting workshops and tutorials
co-located with major conferences in databases (VLDB, SIGMOD, ICDE), mobile and real-time systems (MobiSys, OSDI, RTSS, RTAS), and programming languages (POPL, PLDI, OOPSLA).
Poster sessions provide an ideal opportunity to meet researchers in related areas, to advertise the resources we plan to offer, and gather feedback about the needs of potential \PocketData{} community members.

View File

@ -56,12 +56,11 @@ the challenges of specializing databases for small data.
smartphone apps interact with embedded-databases.
\end{itemize}
%
There is clearly interest in data management challenges that arise at the small- and
pocket-scales.
There is clearly interest in data management challenges that arise in small-scale data management.
%
Unfortunately, unlike the largely homogeneous workloads and platforms that
common to research on classical monolithic enterprise databases, \PocketData{}
is far more diverse.
Unfortunately, unlike the largely homogeneous workloads and platforms that are
standard in research on classical enterprise databases, this new \PocketData{}
setting is far more diverse.
%
Data access patterns are extremely bursty and can vary wildly by user, time of day, mix of installed apps,
network accessibility, and many other factors.
@ -70,7 +69,7 @@ Platform properties such as RAM, persistent storage, CPU performance, and networ
bandwidth also exhibit extreme variations across phones, sometimes by multiple orders of magnitude.
%
Resource availability can also vary; Some users keep their phones constantly
charged, while others go multiple days without charging.
charged, while others go multiple days without plugging their phones in.
%%%%%%%%%%%%%%%%
@ -78,7 +77,11 @@ The heterogeneity of the \PocketData{} setting makes it challenging for research
understand the tradeoffs and requirements of the setting.
%
This lack of clear high-level goals, in turn, makes it difficult to clearly identify successful
research contributions and creates a daunting environment for new research efforts.
research contributions.
%
Unfortunately, pinning down specific goals first requires a concerted effort to gather (and analyze)
traces of data usage patterns from real-world settings, creating a high barrier to entry for new
researchers.
%
Lacking the resources necessary to better understand and adapt to the \PocketData{} scale,
research efforts in the area are presently limited.
@ -100,12 +103,12 @@ as their fundamental computation engine (e.g. the `\texttt{maybe}' system develo
consider the performance characteristics of database systems (e.g. power modeling).
This proposal aims to create a community research infrastructure around our \PocketData{} toolchain
to enable a myriad of research activities for above mentioned communities. Additionally,
to enable a myriad of research activities for the above mentioned communities. Additionally,
in this planning grant, we will explore the precise needs of these communities to ensure an
infrastructure that has broad applicability.
We will reach out to researchers in closely related areas including Internet of Things,
Adaptive Data Management, Sensor Networks, and help them to explore how \PocketData{}
can impact their research.
can help to improve their research.
As part of these outreach efforts, we will provide resources that will simultaneously support
researcher's existing projects, while also helping to enable new projects with a focus on
\PocketData{}.
@ -119,16 +122,16 @@ During this planning grant we will focus our efforts in three key areas:
\item \textbf{Growth of the Mobile Embedded Database community}: We have established an
initial community of interested CISE researchers for \PocketData{} from both
academia and industry. We believe that this community
shows that there is sufficient interest in CISE to pursue our proposed \PocketData{} infrastructure.
shows that there is sufficient interest within CISE to pursue our proposed \PocketData{} infrastructure.
However, for long term success we would like to expand this community to ensure that the infrastructure
meets the needs of the broader community and not just a specific research niche.
\item \textbf{Expansion to IoT}: Our current efforts have focused primarily exploring questions
\item \textbf{Expansion to IoT}: Our preliminary efforts have focused on questions
relating to \PocketData{} in the mobile domain, specifically Android. Although a \PocketData{}
infrastructure based solely in Android is valuable, we believe a more comprehensive infrastructure
must take into account recent developments in IoT. There are similarities between how mobile
applications leverage embedded databases and how proposed IoT applications would use embedded
databases, specifically in the areas of personal health care devices that aggregate and summarize a user's personal data and smart city deployments where small devices process data before sending \emph{relevant} data for more centralized big data processing.
databases, specifically in the areas of personal health care devices that aggregate and summarize a user's personal data and smart city deployments where small devices process data before sending \emph{relevant} data for more centralized big data analytics.
We propose to expand and modify our \PocketData{} infrastructure to meet the needs of IoT community.
\item \textbf{Workshops and Tutorials}:
@ -163,17 +166,17 @@ summarizing those datasets.
\item \textbf{Standards and Benchmarks}:
We will create a toolkit to establish a set of standards for evaluating research efforts on
\PocketData{} for both Android and IoT.
The \PocketData{} setting requires unique metrics that can be difficult to reliably measure and attribute on the Android platform.
The \PocketData{} setting requires unique metrics that can be difficult to reliably measure on the Android platform.
The toolkit will include instrumentation for Android that will make it easier for researchers
to measure the performance through as-yet-uncommon measures like availability of idle time, thread scheduling, power consumption, and other metrics that can be hard to measure reliably on the Android platform like CPU and memory usage for specific libraries.
to measure performance through rarely used metrics like availability of idle time, thread scheduling, power consumption, and other measures that can be hard to gather reliably on the Android platform like CPU and memory usage for specific libraries.
Second, to standardize comparisons across different research efforts, the toolkit will
include a benchmark suite.
This benchmark will create clearly defined metrics for evaluating success. Moreover, by
This benchmark will established clearly defined metrics for evaluating data management solutions. Moreover, by
making it extensible, the benchmark will act a clearinghouse for app behaviors discovered
in the wild and changing database requirements.
\item \textbf{Visualization}: We will create a data visualization tool and associated queries
to help researchers understand and navigate the data. The raw traces we plan to offer researchers are very
large and the rich structure and variability of SQL queries generated by smartphone apps do not admit traditional indexing strategies for analytics. By providing database-driven tools that aid in the analysis and visualization of the resulting queries, we will enable researchers to more quickly and accurately explore
large. Moreover, the rich structure and variability of SQL queries generated by smartphone apps does not admit traditional indexing strategies often used for analytics. By providing database-driven tools that aid in the analysis and visualization of the resulting queries, we will enable researchers to more quickly and accurately explore
relevant characteristics of real-world \PocketData{} workloads.
\end{enumerate}

View File

@ -7,18 +7,18 @@
%\item Existing related resources along with a justification that the proposed research cannot be accomplished with these resources at the institution or elsewhere
%\end{itemize}}
In this section we present a few concrete projects that would benefit from \PocketData{} and then describe how the proposed infrastructure will enable
reach for the PIs and the broader CISE community.
In this section we present a few concrete projects that would benefit from \PocketData{} and describe how the proposed infrastructure will enable
research for the PIs and the broader CISE community.
\subsection{Adaptive Data Management}
Selecting the correct physical structure for a database under a given workload is an extremely challenging~\cite{Chaudhuri:1997:ECI:645923.673646,Chaudhuri:1998:ALI:276304.276337,Chaudhuri:2007:SDS:1325851.1325856,Agrawal:2000:ASM:645926.671701} part of database management.
The index selection problem becomes even harder when workload characteristics fluctuate rapidly or are not known in advance.
There is currently substantial interest in a breed of self-adapting, adaptive index structures~\cite{idreos2007database,Idreos:2011:MWC:2002938.2002944} that address dynamic index selection by facilitating \textit{incremental, online} changes to the index.
There is currently substantial interest in a breed of self-adjusting, adaptive index structures~\cite{idreos2007database,Idreos:2011:MWC:2002938.2002944} that address dynamic index selection by facilitating \textit{incremental, online} changes to the index.
Examples of adaptive indexes include Cracker Indexes~\cite{Idreos:2012:AIM:2247596.2247667,Idreos:2007:UCD:1247480.1247527,Halim:2012:SDC:2168651.2168652}, Adaptive Merge Trees~\cite{Graefe:2010:SSI:1739041.1739087,Graefe:2012:CCA:2180912.2180918}, SMIX~\cite{Voigt:2013:SSI:2484838.2484862}, H2O~\cite{163421}, and Just-in-Time Data Structures~\cite{kennedy2015just}.
Adaptive indexes automatically optimize their physical representation in response to incoming queries, reusing work used to answer the query to also improve subsequent queries. Given enough time, a stable workload, and queries that touch all data objects, an adaptive index eventually converges to a data representation similar to that of a static index.
\textbf{Infrastructure Justification:} Although there have been several efforts~\cite{Graefe:2010:BAI:1946050.1946063,schuhknecht2013uncracked} to develop benchmarks for adaptive indexes, these benchmarks rely on purely synthetic data and unit-tests rather than real-world scenarios.
This is in part because the typical enterprise workloads that rarely exhibit the type of drastic shifts that adaptive indexes target.
This is in part because typical enterprise workloads rarely exhibit the type of drastic shifts that adaptive indexes target.
As a result most data management benchmarks evaluate systems under stable, steady-state workloads.
By contrast, \PocketData{} workloads often show extreme variation in both application demands and resource availability.
As a trivial example, an app might demand low-latency, low-power access to data when a user is actively using the phone, while admitting high-latency high-power organizational tasks when the phone is plugged in~\cite{Challen:2015:MWE:2699343.2699361}.
@ -40,7 +40,7 @@ The relatively limited compute and memory resources available on tablets and sma
\textbf{Infrastructure Justification:} Small-data analytics efforts are presently siloed, with most research efforts targeting entire software stacks, from the user interface front-end to the back-end database.
The standard evaluation tools offered by the \PocketData{} benchmark would help to that decouple the research challenges involved in small-data analytics, and allow a broader community of researchers to contribute.
The standard evaluation tools offered by the \PocketData{} benchmark would help to decouple the research challenges involved in small-data analytics and allow a broader community of researchers to contribute.
For example, an embedded database benchmark simulating a visual query interface workload would serve as a standard for evaluating novel algorithms, indexes, and data management tools.
\textbf{Community Interest:}
@ -53,11 +53,11 @@ In this respect, data-driven smartphone apps are similar to data-driven enterpri
However, enterprise software is typically supported by experienced database administrators who can carefully fine-tune the database to efficiently support the application.
This is not the case for smartphone apps, which instead rely on compiler tools and software libraries to efficiently mediate access to persistent data.
Consequently, \PocketData{} offers new research opportunities at the interface between imperative programming languages like C, C\#, or Java, and back-end data management tools.
Forms of inline SQL like LinQ~\cite{box2007linq,Meijer:2006:LRO:1142473.1142552} have existed for nearly a decade, but are not frequently used in the design smartphone apps.
Forms of inline SQL like LinQ~\cite{box2007linq,Meijer:2006:LRO:1142473.1142552} have existed for nearly a decade, but are not frequently used in the development of smartphone apps.
Instead, app developers frequently rely on higher level primitives including object-relational mappers~\cite{Melnik:2007:CMB:1247480.1247532} (ORMs) like Hibernate~\cite{hibernate} to mediate access to the database.
Unfortunately, at present, most ORMs are implemented as libraries and lack the ability to introspect the invoking program.
This creates an impedance mismatch between the available information and SQL's declarative syntax, forcing ORMs to misuse SQL, or to rely on optional hints provided by the app developer to provide efficient data access.
In our preliminary exploration~\cite{pocketdata}, we found significant anti-patterns emerging in data access patterns.
In our preliminary exploration~\cite{pocketdata}, we found significant anti-patterns emerging in queries to SQLite.
Examples include the use of expensive \texttt{UPSERT} operations when \texttt{UPDATE}s would be sufficient, the use of multiple \texttt{SELECT} queries to dereference foreign-keys instead of using an outer-join query, and the use of separate read-then-write queries rather than in-place updates.
Several research efforts, including StatusQuo~\cite{StatusQuo}, Sloth~\cite{Cheung:2014:SLV:2588555.2593672}, and Truffle/Graal~\cite{wimmer2012truffle} have addressed similar problems in enterprise data-driven applications and could find new challenges in the \PocketData{} space.
Other research efforts explore data-flow in smartphones for performance optimization~\cite{yang-phd15,yang-icse15,rountev-cgo14} and correctness~\cite{yan-cgo14}, and would benefit from more detailed tools for introspection and measurement.
@ -65,35 +65,35 @@ Other research efforts explore data-flow in smartphones for performance optimiza
\textbf{Infrastructure Justification:} Research on data-driven app development requires a detailed understanding of application requirements, and programming language research needs real-world workloads to demonstrate its viability.
The metrics that we propose to gather and the benchmark suite we propose to develop are critical for driving research in this space.
\textbf{Community Interest:} \textit{Nasko Rountev} of Ohio State will use \PocketData{} as part of the Presto group's work on data-flow analysis to debug of GUI responsiveness issues and as part of his LeakDroid project.
\textbf{Community Interest:} \textit{Nasko Rountev} of Ohio State will use \PocketData{} as part his work on data-flow analysis debug of GUI responsiveness issues and as part of his LeakDroid project.
\subsection{Database-App Coupling}
Smartphone apps are integrated with the data management tools they use to a far greater degree than enterprise applications.
Embedded databases are libraries that operate within the app's memory space, and not external tools.
Apps generate virtually all queries procedurally, making it possible to specify their data management requirements extremely precisely at compile time.
Moreover, access to data often occurs through higher-level primitives that are supported by their own library wrappers.
Moreover, access to data often occurs through higher-level primitives like ORMs.
In short, although embedded databases are in principle capable of emulating stand-alone database engines, in practice they are used more as toolkits of data management building blocks.
The tight coupling between app and database promises to offer numerous avenues for workload-driven database optimization.
A leader in this area is BerkeleyDB.
Although BerkeleyDB does provide a SQL emulation front-end, its core functionality is to provide simple database building blocks like primary and secondary indexing, foreign-key consistency primitives, and transactional access to data.
Similar efforts are taking place across multiple industrial research labs and startup companies, as numerous organizations have begun to invest into embedded databases, including MongoDB's WiredTiger~\cite{shakuntalagupta2015practical}, SAP's SqlAnywhere~\cite{4401024}, and Facebook's RocksDB, as well as open-source efforts including the H2 Database~\cite{mueller2006h2} and SQLite~\cite{sqlite}.
Similar efforts are taking place across multiple industrial research labs and startup companies, as numerous organizations have begun to invest into embedded databases. Corporate investment in embedded databases includes MongoDB's WiredTiger~\cite{shakuntalagupta2015practical}, SAP's SqlAnywhere~\cite{4401024}, and Facebook's RocksDB, as well as open-source efforts including the H2 Database~\cite{mueller2006h2} and SQLite~\cite{sqlite}.
The tight coupling between database and the invoking application also admits possibilities for more aggressive database specialization.
Database compilers like DBToaster~\cite{kennedy2011dbtoaster,koch2013dbtoaster,Ahmad:2012:DHD:2336664.2336670}, HyPer/LLVM~\cite{Neumann:2011:ECE:2002938.2002940}, and Legorithmics~\cite{Klonatos:2013:ASO:2463676.2465334,Klonatos:2014:BEQ:2732951.2732959} use aggressive compilation to create a database uniquely specialized for a specific application's query and update workload.
Database compilers like DBToaster~\cite{kennedy2011dbtoaster,koch2013dbtoaster,Ahmad:2012:DHD:2336664.2336670}, HyPer/LLVM~\cite{Neumann:2011:ECE:2002938.2002940}, and Legorithmics~\cite{Klonatos:2013:ASO:2463676.2465334,Klonatos:2014:BEQ:2732951.2732959} aggressively compile and optimize database engines that are uniquely specialized for a specific application's query and update workload, as well as its underlying hardware.
As already noted above, many of these statistics are available at compile time, making the \PocketData{} setting an ideal candidate for deploying these applications.
\textbf{Infrastructure Justification:} Realistic evaluation of embedded databases and database compilers requires realistic workloads. Moreover, smartphones are one of the most prolific examples of embedded databases deployed in the wild. Given the variation in smartphone apps' data management requirements, even limited data releases by a single app developer will not be representative. The metrics we will gather, and the benchmark we are proposing will be key to helping researchers evaluate new embedded database tools.
\textbf{Community Interest:} \textit{Ashok Joshi} and \textit{Michael Brey} of Oracle are interested in participating in the \PocketData{} community to advance research on embedded databases.
\textbf{Community Interest:} \textit{Christoph Koch}'s DATA lab at EPFL is interested in using the \PocketData{} benchmark to evaluate their work on database compilers. \textit{Ashok Joshi} and \textit{Michael Brey} of Oracle are interested in participating in the \PocketData{} community to advance research on embedded databases.
\citedquote{Ashok Joshi (Senior Director at Oracle)}{I got some feedback from one of my colleagues on this topic. Yes, the real-world traces of embedded data usage would be useful; so would the benchmarking toolkit.}
\citedquote{Michael Brey (Oracle's BerkeleyDB Team)}{Within Oracle, we are always looking at how the industry both consumer and enterprise is using data in mobile applications. Things like db size, access patterns, single/multi user (multiple apps accessing same db), speed of access required, record size/structure etc. are all important to understand. We are also very interested in the movement of data from the device to some backend repository.}
\citedquote{Michael Brey (Oracle)}{Within Oracle, we are always looking at how the industry both consumer and enterprise is using data in mobile applications. Things like db size, access patterns, single/multi user (multiple apps accessing same db), speed of access required, record size/structure etc. are all important to understand. We are also very interested in the movement of data from the device to some backend repository.}
%Additionally, PI Kennedy will make use of the same resources in his efforts on incremental computation.
\subsection{Enabled Research For the PIs}
The PIs have a joint research project aimed at exposing \emph{uncertainty} in mobile computing~\cite{Challen:2015:MWE:2699343.2699361}. The project focuses on exposing new language primitives to the programmer to specify multiple implementation for
a given functionality allowing the system to pick which implementation to use at runtime. This allows the system to specialize software to a given hardware platform and more importantly to a given set of external
considerations (e.g. network connectivity, available sensors, etc.). Our proposed infrastructure will enable us to study two key aspects of uncertainty: (1) almost all mobile applications store user data and configuration parameters in
mobile databases, access to this data can have a profound impact on the behavior of an application, \PocketData{} will allow us to more readily study this aspect of mobile uncertainty; (2) the infrastructure powering our
The PIs have a joint research project aimed at exposing \emph{uncertainty} in mobile computing~\cite{Challen:2015:MWE:2699343.2699361}. The project focuses on exposing new language primitives to the programmer to specify multiple implementations of
system functionality allowing the system to pick which implementation to use at runtime. This allows the system to specialize software to a given hardware platform and more importantly to a given set of external
considerations (e.g. network connectivity, available sensors, etc.). Our proposed infrastructure will enable us to study two key aspects of uncertainty: (1) Almost all mobile applications store user data and configuration parameters in
mobile databasesand access to this data can have a profound impact on the behavior of an application. \PocketData{} will allow us to more readily study this aspect of mobile uncertainty; (2) The infrastructure powering our
runtime system for exposing uncertainty is built around a mobile database that stores possible choices the software system can make. \PocketData{} will allow us to optimize this database to reduce choice latency.
PIs Kennedy and Ziarek have a joint research project, Just-in-Time Data Structures (JITDs), focusing on adaptive indexing~\cite{kennedy2015just}.
@ -102,7 +102,7 @@ The level of variation in load and resource availability that occurs in \PocketD
As noted above, our proposed infrastructure will provide us with a benchmark workload that will help us to evaluate adaptive indexes under real-world conditions, rather than through purely synthetic workloads.
PI Kennedy is part of a collaborative research project with \textit{Shambhu Upadhyaya} (UB), \textit{Varun Chandola} (UB), \textit{Hung Ngo} (UB), and \textit{Long Nguyen} (UMich) that explores techniques for identifying insider attacks on databases (NSF-CNS-1409551).
Although the threat of insider attacks on mobile devices is minimal, the specific methodology behind the work involves summarizing query logs by creating clusters of queries with similar ``intent.''
Although the threat of insider attacks on mobile devices may be minimal, the specific methodology behind the work involves summarizing query logs by creating clusters of queries with similar ``intent.''
The approach is showing promise for summarizing query logs from a corporate (banking) setting.
Having query logs from other settings like \PocketData{} would show that the approach can be generalized to domains other than Insider Threat detection (for example to the design of index selection tools).
If successful, these efforts could also contribute back to the \PocketData{} project, as a tool for quickly summarizing and clustering query logs would help to build out the visualization and benchmark design components of the proposed infrastructure.

View File

@ -2,7 +2,7 @@
The PIs have already reached out to the database and mobile systems communities for feedback on the current infrastructure, providing the PIs with an initial community and a preliminary source of
feedback on design, APIs, and features. A detailed description can be found above, in Section~\ref{sec:research}. In summary, there is interest from researchers working on embedded databases, small-scale data management, personal sensing, query interfaces, and several closely related areas.
The \PocketData benchmark will serve as a focal point for the community involvement by providing the community with a way to explore, discuss, and disseminate new data management use cases, and offering a standard way to evaluate systems on those use cases.
The \PocketData{} benchmark will serve as a focal point for the community's involvement by providing the community with a way to explore, discuss, and disseminate new data management use cases, and by offering a standard way to evaluate systems on those use cases.
Preliminary work on characterizing differences between \PocketData{} and traditional benchmarking infrastructures
@ -10,14 +10,13 @@ was presented at the TPC-TC symposium, which has allowed the PIs to solicit indu
From this starting point the PIs will also broaden their target communities to included researchers from programming languages as well as real-time and embedded systems.
The PIs believe that expanding the pervue of \PocketData{} to also include IoT will broaden the utility of the proposed infrastructure.
IoT has recently renewed interest in databases systems (stream databases, in-network query processors) that are specialized for IoT and are capable of running on embedded devices (\textit{e.g.}, TinyDB~\cite{madden2005tinydb}).
IoT has recently renewed interest in databases systems that are specialized for IoT (stream databases, in-network query processors) and/or are capable of running on embedded devices (\textit{e.g.}, TinyDB~\cite{madden2005tinydb}).
As embedded processors become more capable, with
larger amounts of main memory available (e.g. Intel's Edison platform), there is a growing push from the embedded
and also from the real-time communities to explore including database
systems and query processing systems in small scale embedded systems. The PIs believe that emerging research in smart cities and personalized
and also from the real-time communities to explore including databases and query processing in small scale embedded systems. The PIs believe that emerging research in smart cities and personalized
medical devices that aggregate and processes biometric data would benefit from \PocketData{}.
Domain specific languages (DSLs) are becoming more pervasive as solutions proposed by the programming language community as mechanisms
to both easy programmer effort for specialized systems, but to also greatly improve performance in time, space, and even energy consumption.
Domain specific languages (DSLs) are becoming more pervasive as mechanisms
to both amplify programmer effort for specialized systems and to greatly improve performance in time, space, and even energy consumption.
The PIs believe that \PocketData{} will be of interested to both database and programming language researchers in the IoT space.
\citedquote{Ashok Joshi (Senior Director; Oracle NoSQL Database, Berkeley DB, Database Mobile Server)}{I think synchronizing device data with server data is a very common occurrence in this space. As a simple example, you should be able to synchronize your `contacts' database on your cell phone with a server repository. Recently, Mike Brey, Raghu Nambiar and I proposed a ``strawman'' IoT benchmark~\cite{ashok2015benchmarking} --- I think extending your work to include large-scale data synchronization would be worth considering.}
@ -34,7 +33,8 @@ our current efforts toward building a community and the community's support for
Stratos Idreos & \emph{Harvard} & Databases & Adaptive indexes\\
Arnab Nandi & \emph{Ohio State} & Databases/HCI & Interactive analytics\\
Nasko Routnev & \emph{Ohio State}& Programming Languages & Mobile data flow analysis\\
Ashok Joshi & \emph{Oracle} & Databases/IoT & IoT metrics\\
Christoph Koch & \emph{EPFL} & Databases/Theory & Database compilers\\
Ashok Joshi & \emph{Oracle} & Databases/IoT & IoT performance\\
Michael Brey & \emph{Oracle} & Databases/Mobile Systems & Embedded DB performance\\
Meikel Poess & \emph{Oracle} & Databases & Performance measurement \\
Raghunath Nambiar & \emph{Cisco} & Databases & Performance measurement \\

View File

@ -2,26 +2,26 @@
Our planning process will consist of a development effort and an outreach effort.
First and foremost, the centerpiece of our development side community-building efforts is the \PocketData{} benchmark.
In addition to acting as a standard for evaluating research efforts that overcome bottlenecks and limitations of existing technology, the benchmark will serve as a hub for the community to discuss and describe these limitations.
In addition to acting as a standard for evaluating research efforts that overcome bottlenecks and limitations of existing technology, the \textit{modular} benchmark will serve as a hub for the community to discuss and describe these limitations.
Under the guidance of the PIs, the graduate student supported by this proposal will be responsible for developing a preliminary prototype benchmark.
The first version of this benchmark will stress bottlenecks identified in our preliminary study~\cite{pocketdata} by simulating the behavior of a small number of smartphone apps.
Using data and query logs derived from our preliminary study, we hope to have version one of the benchmark ready within 4-6 months.
The benchmark will be released and advertised over community mailing lists like DBWorld~\cite{dbworld}.
By this point, we expect to have expanded the \PocketData{} community through our outreach efforts.
After releasing the benchmark we will hold a 3 month community feedback process, allowing us to release version 2 of the benchmark based on community feedback before the end of the planning period.
Additionally we will pursue feedback from the IoT community to understand \PocketData{} needs to extend the benchmark infrastructure to be suited to the IoT's community needs. We envision that these
needs will vary depending on the aspect of IoT a given community is interested (e.g. language runtime design vs. embedded databases). To avoid creating an infrastructure only suited to the needs
Additionally we will pursue feedback from the IoT community to understand how \PocketData{} can be extended to meet the IoT community's needs. We envision that these
needs will vary depending on the aspect of IoT a given community is interested in (e.g. language runtime design vs. embedded databases). To avoid creating an infrastructure only suited to the needs
of a particular niche, we will solicit feedback from many sources.
In addition to the PocketData community, we will leverage interest from the Transaction Processing Council (TPC) in developing an embedded database benchmark.
The TPC represents one of the most prominent names in database benchmarking, and is responsible for benchmarks like TPC-C~\cite{tpcc}, TPC-H~\cite{tpch}, and TPC-DS~\cite{tpcds} that are canonical tools for evaluating research in databases.
The TPC represents one of the most prominent names in database benchmarking, and is responsible for benchmarks like TPC-C~\cite{tpcc}, TPC-H~\cite{tpch}, and TPC-DS~\cite{tpcds} that are touchstones for evaluating research in databases.
After presenting our preliminary work at the TPC's annual symposium colocated with VLDB 2015, \textit{Raghunath Nambiar} (Cisco), \textit{Reza Taheri} (VMWare), and \textit{Meikel Poess} (Oracle) of the TPC expressed interest in helping us to develop \PocketData{} as an eventual TPC benchmark. The PIs hope to also participate in TPC discussions on IoT concerns. The TPC discussions will provide the PIs will both industry and
academic perspectives on both embedded databases as well as IoT. The PIs hope to leverage this information in the design of the proposed \PocketData{} infrastructure.
Although all PIs will be responsible for communicating with the TPC as a joint benchmark is fleshed out, PI Kennedy will act as a lead point of contact.
Although all PIs will be involved in communications with the TPC and its members, PI Kennedy will act as a lead point of contact.
To continue building our current community as well as expanding to include IoT researchers, the PIs expect to travel to top conferences in a variant of fields.
To continue building our current community and to expand it to include IoT researchers, the PIs expect to travel to top conferences in a variety of fields.
Our outreach efforts will begin with poster sessions, tutorials, and demos presented at prominent database conferences. One candidate is ICDE 2017, which takes place early in the planning period. PI Kennedy will coordinate efforts to perform a demonstration at a database conference to incite discussion and interest in \PocketData{} from the database community. PI Ziarek will coordinate efforts for a demonstration or poster presentation initially targeting SPLASH 2016 to reach out to the PL community, and PI Challen will coordinate efforts for a demonstration or poster presentation initially targeting MobiSys 2017 to reach out to the mobile systems community.
At these conferences the PIs will network with researchers who work on IoT. In addition, there are many new conferences focusing on IoT that are emerging. The PIs expect to attend
At these conferences the PIs will network with researchers who work on IoT as well. In addition, there are many new conferences focusing on IoT that are emerging. The PIs expect to attend
IoTA, IoTDI, and WF-IoT. Towards the end of the first year of the proposal, the PIs will begin to develop a tutorial on embedded databases and plan for a \PocketData{} workshop.
The PIs will submit a \textbf{CI-NEW} proposal for \PocketData{} in Fall of 2017, approximately 15 months after the start of the planning proposal.

View File

@ -1,10 +1,10 @@
% !TEX root = ../fullproposal.tex
With 2 billion smartphones in the world and more being added every day, mobile platforms together form the most pervasive distributed systems on the planet.
People are increasingly relying on smartphones to manage their lives, from contacts and todo lists to their health, their homes, and the contents of their wallets.
This proliferation of data-driven smartphone apps is driving a need to create more, better, faster, more user-friendly, and more power-aware techniques for managing their data.
With 2 billion smartphones in the world and more being added every day, mobile platforms together form the most pervasive distributed system on the planet.
People are increasingly relying on smartphones to manage their lives, from contacts and todo lists, to their health, their homes, and the contents of their wallets.
This proliferation of data-driven smartphone apps is causing a need for more, faster, more user-friendly, and more power-aware techniques for managing data on smartphones and embedded devices.
To meet the challenges of this new frontier in data management, it is critical that we begin understand how smartphone apps store and retrieve structured state and establish standards for evaluating potential advances based on this understanding. Our proposal lays the groundwork for research on pocket-scale data management. We have interest from the Transaction Processing Council for our proposed benchmark, and even now several members of the database, systems, and programming language communities have expressed interest in the resources we propose to offer.
To meet the challenges of this new frontier in data management, it is critical that we begin understand how smartphone apps store and retrieve structured state and establish standards for evaluating potential advances based on this understanding. Our proposal lays the groundwork for research on pocket-scale data management. We have interest from the Transaction Processing Council for our proposed benchmark, and even before the planning stage, several members of the database, systems, and programming language communities have expressed interest in the resources we propose to offer.
In addition to supporting research in a critical area, this proposal will support one graduate student during the planning phase and up to two graduate students in later phases, resulting in between one and two PhD theses. We anticipate that the proposed work may also lead to one or two MS theses, and if funded, plan to apply for an REU grant for this proposal.
The resources created by this proposal will also be integrated into courses taught by the PIs; This has already happened: PIs Kennedy and Ziarek co-taught a project-oriented course entitled ``CSE-662: Languages and Runtimes for Big Data.'' The course included material related to \PocketData{} research, and three of the seven groups in the course worked on projects based on \PocketData{} and the Internet of Things.
In addition to supporting research in a critical area, this proposal will support one graduate student during the planning phase and up to two graduate students in later phases, contributing to between one and two PhD theses. We anticipate that the proposed work may also lead to one or two MS theses, and if funded, plan to apply for an REU supplement for this proposal.
The resources created by this proposal will also be integrated into courses taught by the PIs, a process that has already started: PIs Kennedy and Ziarek recently co-taught a project-oriented course entitled ``CSE-662: Languages and Runtimes for Big Data.'' The course included material related to \PocketData{} research, and three of the seven groups in the course worked on projects based on \PocketData{} and the Internet of Things.

Binary file not shown.

View File

@ -53,4 +53,6 @@ The DB/PL lab at the University at Buffalo maintains additional resources specif
The DB/PL lab at University at Buffalo maintains additional resources for internal use, including multiple x86 workstations, laptops, and low-power development boards (Raspberry Pis and Intel Galileos) for general student and PI use. Server infrastructure for the lab includes an application server supporting a lab project management system, teaching support applications, and trial deployments of lab-developed software, an Oracle database server testbed, a 32-core and a 64-core AMD Opteron and a 12-core Intel Xeon-based testbed server, as well as a 16-node Hadoop cluster shared with 3 other labs. Lab workstations and laptops are configured with OSX or Windows. Servers are configured with Redhat Enterprise Linux.
PI Challen is collaborating on this project without support for the duration of the planning phase. He will apply his expertise in mobile systems and operating systems, and will assist in advising students working on this project.
\end{document}

Binary file not shown.

View File

@ -23,7 +23,7 @@
~~\\
\section*{Senior Personnel}
PIs Kennedy and Ziarek are each budgeted half a month of summer salary. PI Kennedy will apply his expertise and experience in the areas of databases, incremental computation, web applications, and security. PI Ziarek will apply his expertise and experience in the areas of programming languages, distributed computation, and security. PI Challen will participate without support and will apply his expertise in mobile systems and operating systems. All three PIs will take responsibility for (1) advising and coordinating student-driven efforts as described below, (2) reaching out to their respective research communities to build interest in research on \PocketData{}, (3) organizing a \PocketData{} workshop.
PIs Kennedy and Ziarek are each budgeted half a month of summer salary. PI Kennedy will apply his expertise and experience in the areas of databases, incremental computation, web applications, and security. PI Ziarek will apply his expertise and experience in the areas of programming languages, distributed computation, and security. All PIs will take responsibility for (1) advising and coordinating student-driven efforts as described below, (2) reaching out to their respective research communities to build interest in research on \PocketData{}, (3) organizing a \PocketData{} workshop.
\section*{Other Personnel}
Funding is requested for one computer science graduate student assistant for one year. The two-semester and summer salary for the student is \$22,000.
@ -40,11 +40,11 @@ N/A
Travel may include trips to NSF meetings, conferences and workshops, and any PI meetings. Major conferences such as SIGMOD, VLDB, POPL, PLDI, and ICDE, typically last 4-5 days, and are located both domestically and internationally. Workshops are often affiliated with major conferences, and attendees frequently attend both. We have budgeted for up to 3 conference visits.
\noindent \textbf{Domestic Conferences} As an example of a domestic conference, we use SIGMOD 2016 being held in San Fransisco, CA. We anticipate a lodging cost of \$99 per night and a \$59 perdiem. The subtotal for 2 attendees over 5 nights is \$2,310. We expect airfare of \$630 and average conference registration fees of \$600 per person for a total domestic travel cost of \$4000.
\noindent \textbf{Domestic Conferences} As an example of a domestic conference, we use SIGMOD 2016 being held in San Fransisco, CA. We anticipate a lodging cost of \$99 per night and a \$59 perdiem. The subtotal for 4 attendees over 5 nights is \$2,310. We expect airfare of \$630 and average conference registration fees of \$600 per person for a total domestic travel cost of \$8000.
\noindent \textbf{Other Domestic Travel} We have budgeted an additional \$2000 for travel to NSF PI meetings and for outreach efforts. Outreach efforts include travel support to allow the PIs to visit potential community members, and travel support for community members to visit UB and present on their work.
\noindent \textbf{Foreign Conferences} As an example of a foreign conference, we use ICDE 2016 being held in Helsinki, Finland. We anticipate a lodging cost of \$200 per person, and a \$260 perdiem. The subtotal for 1 attendee over 5 nights is \$5,200. We expect airfare of \$1000 and average conference registration fees of \$700 per person for a total domestic travel cost of \$4,000.
\noindent \textbf{Foreign Conferences} As an example of a foreign conference, we use ICDE 2016 being held in Helsinki, Finland. We anticipate a lodging cost of \$200 per person, and a \$260 perdiem. The subtotal for 2 attendees over 5 nights is \$5,200 per person. We expect airfare of \$1000 and average conference registration fees of \$700 per person for a total international travel cost of \$8,000.
\section*{Other Direct Costs}
@ -52,11 +52,11 @@ Travel may include trips to NSF meetings, conferences and workshops, and any PI
The negotiated rate with the Department of Computer Science and Engineering for computer services is \$156 per month of effort from faculty and students, or \$2028 for 12 months of student effort and 1 month of faculty effort.
\subsection*{Materials and Supplies}
\$1,268 is requested per year for Materials and Supplies to purchase desktop computers for the graduate research students and faculty working on this project. The computers will be used for code development, experimental evaluation, paper writing and typesetting and other efforts related to this project.
\$852 is requested per year for Materials and Supplies to purchase desktop computers for the graduate research students and faculty working on this project. The computers will be used for code development, experimental evaluation, paper writing and typesetting and other efforts related to this project.
\subsection*{Other}
Tuition is budgeted at the standard University at Buffalo rates for the Graduate Research Assistant at 9 credit hours per GRA per semester.
The anticipated out-of-state student tuition is \$18,144 for one student for one year.
Tuition is budgeted at the standard University at Buffalo rates for a senior Graduate Research Assistant at 3 credit hours per semester.
The anticipated out-of-state student tuition is \$6,048 for one student for one year.
\subsection*{Indirect Costs}
Indirect cost rates are based on the applicable federally negotiated rates published at \url{http://www.research.buffalo.edu/sps/about/rates.cfm}.

View File

@ -1,6 +1,6 @@
---- Overview ----
A common requirement of the 4 million apps running on the world's 2 billion smartphones is persisting structured data. Embedded databases such as SQLite are heavily used for this purpose, with a single typical Android smartphone averaging more than two SQLite queries per second. The fundamental challenges experienced by mobile apps using embedded databases — minimizing energy consumption, latency, and disk utilization — are familiar ground for database researchers. However, in spite of active research in the areas of smartphone query processing and embedded databases, the specific tradoffs introduced by this new domain of pocket-scale data are far less well understood.
A common requirement of the 4 million apps running on the world's 2 billion smartphones is persisting structured data. Embedded databases such as SQLite are heavily used for this purpose, with a single typical Android smartphone averaging more than two SQLite queries per second. The fundamental challenges experienced by mobile apps using embedded databases - minimizing energy consumption, latency, and disk utilization - are familiar ground for database researchers. However, in spite of active research in the areas of smartphone query processing and embedded databases, the specific tradoffs introduced by this new domain of pocket-scale data are far less well understood.
Key challenges in this space include the lack of publicly available data regarding smartphone database usage patterns in the real world, concrete high-level optimization targets, and tools and methodologies for reliably measuring database performance along axes relevant to smartphone apps. We propose infrastructure support and community-building efforts that will both improve existing research on embedded databases, and help to encourage new and innovative research in the area. This infrastructure support will take the form of real-world smartphone usage traces, a benchmarking suite for pocket-scale data, visualization tools, and instrumentation for mobile embedded databases.