Enabled research for the PIs

This commit is contained in:
Oliver Kennedy 2016-01-19 13:19:32 -05:00
parent 0fd140dda0
commit 2a88818877
2 changed files with 25 additions and 19 deletions

View file

@ -1,7 +1,7 @@
% !TEX root = ../fullproposal.tex
We will build an initial \PocketData{} community and facilitate engagement with the broader CISE community through outreach efforts including attending poster sessions and hosting workshops and tutorials
co-located with major conferences in databases (VLDB, SIGMOD, ICDE), mobile and real-time systems (MobiSys, OSDI, \todo{...?}), and programming languages (POPL, PLDI, \todo{...?}).
co-located with major conferences in databases (VLDB, SIGMOD, ICDE), mobile and real-time systems (MobiSys, OSDI, RTSS, RTAS), and programming languages (POPL, PLDI, OOPSLA).
Poster sessions provide an ideal opportunity to meet researchers in related areas, to advertise the resources we plan to offer, and gather feedback about the needs of potential \PocketData{} community members.
Tutorials offer a more formal, extended opportunity to introduce members of broader communities in CISE to technologies related to \PocketData{}, and to train them in the technology's use. As a side effect, a tutorial offers an opportunity to advertise the proposed resources as tools for conducting research in these areas.

View file

@ -17,17 +17,17 @@ There is currently substantial interest in a breed of self-adapting, adaptive in
Examples of adaptive indexes include Cracker Indexes~\cite{Idreos:2012:AIM:2247596.2247667,Idreos:2007:UCD:1247480.1247527,Halim:2012:SDC:2168651.2168652}, Adaptive Merge Trees~\cite{Graefe:2010:SSI:1739041.1739087,Graefe:2012:CCA:2180912.2180918}, SMIX~\cite{Voigt:2013:SSI:2484838.2484862}, H2O~\cite{163421}, and Just-in-Time Data Structures~\cite{kennedy2015just}.
Adaptive indexes automatically optimize their physical representation in response to incoming queries, reusing work used to answer the query to also improve subsequent queries. Given enough time, a stable workload, and queries that touch all data objects, an adaptive index eventually converges to a data representation similar to that of a static index.
\textbf{Infrastructure Needs:} Although there have been several efforts~\cite{Graefe:2010:BAI:1946050.1946063,schuhknecht2013uncracked} to develop benchmarks for adaptive indexes, these benchmarks rely on purely synthetic data and unit-tests rather than real-world scenarios.
\textbf{Infrastructure Justification:} Although there have been several efforts~\cite{Graefe:2010:BAI:1946050.1946063,schuhknecht2013uncracked} to develop benchmarks for adaptive indexes, these benchmarks rely on purely synthetic data and unit-tests rather than real-world scenarios.
This is in part because the typical enterprise workloads that rarely exhibit the type of drastic shifts that adaptive indexes target.
As a result most data management benchmarks evaluate systems under stable, steady-state workloads.
By contrast, \PocketData{} workloads often show extreme variation in both application demands and resource availability.
As a trivial example, an app might demand low-latency, low-power access to data when a user is actively using the phone, while admitting high-latency high-power organizational tasks when the phone is plugged in~\cite{Challen:2015:MWE:2699343.2699361}.
\textbf{Community Interest:} Stratos Idreos from the DAS lab at Harvard will use the \PocketData{} metrics and benchmark workloads to evaluate his group's work on adaptive data systems.
\textbf{Community Interest:} \textit{Stratos Idreos} from the DAS lab at Harvard will use the \PocketData{} metrics and benchmark workloads to evaluate his group's work on adaptive data systems.
\citedquote{Stratos Idreos (Harvard)}{I think work on adaptive data systems could benefit. I assume Pocket Data will capture diverse workloads (from various apps) and so this would be a perfect environment to test adaptive data systems.
I have a new project on easy to design systems out of modules that can be synthesized. The input is workloads. Perhaps PocketData can provide a testing framework for such work for designing data systems for mobile environments.
}
The PIs will likewise use these resources to evaluate their own work on Just-in-Time Data Structures.
%The PIs will likewise use these resources to evaluate their own work on Just-in-Time Data Structures.
\subsection{Small-Data Analytics and Personal Internet of Things}
The prevalence of tablet and smartphone computing devices makes them an ideal analytics front-end.
@ -39,13 +39,13 @@ Embedded databases create opportunities for more detailed, interactive academic
The relatively limited compute and memory resources available on tablets and smartphones also demand new techniques for rapidly building visualizations of medium sized databases~\cite{Jiang:2015:SPI:2809974.2809986,Singh:2012:SRS:2213836.2213858,6228146,Nobari:2013:TIS:2463676.2463700}.
\textbf{Infrastructure Needs:} Small-data analytics efforts are presently siloed, with most research efforts targeting entire software stacks, from the user interface front-end to the back-end database.
\textbf{Infrastructure Justification:} Small-data analytics efforts are presently siloed, with most research efforts targeting entire software stacks, from the user interface front-end to the back-end database.
The standard evaluation tools offered by the \PocketData{} benchmar would help to that decouple the research challenges involved in small-data analytics, and allow a broader community of researchers to contribute.
For example, an embedded database benchmark simulating a visual query interface workload would serve as a standard for evaluating novel algorithms, indexes, and data management tools.
\textbf{Community Interest:}
Arnab Nandi from Ohio State has offered to contribute traces of human interactions with his tools for gestural query specification to the \PocketData{} effort.
Jens Dittrich of Saarland University is interested in connections between PocketData and his work on Janiform Documents~\cite{Dittrich:2015:JIA:2824032.2824114}.
\textit{Arnab Nandi} from Ohio State has offered to contribute traces of human interactions with his tools for gestural query specification to the \PocketData{} effort.
\textit{Jens Dittrich} of Saarland University is interested in connections between PocketData and his work on Janiform Documents~\cite{Dittrich:2015:JIA:2824032.2824114}.
\subsection{Data-Driven Apps}
Virtually all access to embedded databases on smartphones occurs through SQL statements that have been procedurally generated by apps --- Smartphone users do not manually write SQL queries.
@ -62,10 +62,10 @@ Examples include the use of expensive \texttt{UPSERT} operations when \texttt{UP
Several research efforts, including StatusQuo~\cite{StatusQuo}, Sloth~\cite{Cheung:2014:SLV:2588555.2593672}, and Truffle/Graal~\cite{wimmer2012truffle} have addressed similar problems in enterprise data-driven applications and could find new challenges in the \PocketData{} space.
Other research efforts explore data-flow in smartphones for performance optimization~\cite{yang-phd15,yang-icse15,rountev-cgo14} and correctness~\cite{yan-cgo14}, and would benefit from more detailed tools for introspection and measurement.
\textbf{Infrastructure Needs:} Research on data-driven app development requires a detailed understanding of application requirements, and programming language research needs real-world workloads to demonstrate its viability.
\textbf{Infrastructure Justification:} Research on data-driven app development requires a detailed understanding of application requirements, and programming language research needs real-world workloads to demonstrate its viability.
The metrics that we propose to gather and the benchmark suite we propose to develop are critical for driving research in this space.
\textbf{Community Interest:} Nasko Rountev of Ohio State will use \PocketData{} as part of the Presto group's work on data-flow analysis to debug of GUI responsiveness issues and as part of his LeakDroid project.
\textbf{Community Interest:} \textit{Nasko Rountev} of Ohio State will use \PocketData{} as part of the Presto group's work on data-flow analysis to debug of GUI responsiveness issues and as part of his LeakDroid project.
\subsection{Database-App Coupling}
Smartphone apps are integrated with the data management tools they use to a far greater degree than enterprise applications.
@ -81,25 +81,31 @@ The tight coupling between database and the invoking application also admits pos
Database compilers like DBToaster~\cite{kennedy2011dbtoaster,koch2013dbtoaster,Ahmad:2012:DHD:2336664.2336670}, HyPer/LLVM~\cite{Neumann:2011:ECE:2002938.2002940}, and Legorithmics~\cite{Klonatos:2013:ASO:2463676.2465334,Klonatos:2014:BEQ:2732951.2732959} use aggressive compilation to create a database uniquely specialized for a specific application's query and update workload.
As already noted above, many of these statistics are available at compile time, making the \PocketData{} setting an ideal candidate for deploying these applications.
\textbf{Infrastructure Needs:} Realistic evaluation of embedded databases and database compilers requires realistic workloads. Moreover, smartphones are one of the most prolific examples of embedded databases deployed in the wild. Given the variation in smartphone apps' data management requirements, even limited data releases by a single app developer will not be representative. The metrics we will gather, and the benchmark we are proposing will be key to helping researchers evaluate new embedded database tools.
\textbf{Infrastructure Justification:} Realistic evaluation of embedded databases and database compilers requires realistic workloads. Moreover, smartphones are one of the most prolific examples of embedded databases deployed in the wild. Given the variation in smartphone apps' data management requirements, even limited data releases by a single app developer will not be representative. The metrics we will gather, and the benchmark we are proposing will be key to helping researchers evaluate new embedded database tools.
\textbf{Community Interest:} Michael Brey of Oracle is interested in participating in the \PocketData{} community to advance research on embedded databases.
\textbf{Community Interest:} \textit{Michael Brey} of Oracle is interested in participating in the \PocketData{} community to advance research on embedded databases.
\citedquote{Michael Brey (Oracle's BerkeleyDB Team)}{Within Oracle, we are always looking at how the industry both consumer and enterprise is using data in mobile applications. Things like db size, access patterns, single/multi user (multiple apps accessing same db), speed of access required, record size/structure etc. are all important to understand. We are also very interested in the movement of data from the device to some backend repository.}
Additionally, PI Kennedy will make use of the same resources in his efforts on incremental computation.
%Additionally, PI Kennedy will make use of the same resources in his efforts on incremental computation.
\subsection{Enabled Research For the PIs}
The PIs have a joint research project aimed at exposing \emph{uncertainty} in mobile computing. The project focuses on exposing new language primitives to the programmer to specify multiple implementation for
The PIs have a joint research project aimed at exposing \emph{uncertainty} in mobile computing~\cite{Challen:2015:MWE:2699343.2699361}. The project focuses on exposing new language primitives to the programmer to specify multiple implementation for
a given functionality allowing the system to pick which implementation to use at runtime. This allows the system to specialize software to a given hardware platform and more importantly to a given set of external
considerations (e.g. network connectivity, available sensors, etc.). Our proposed infrastructure will enable us to study two key aspects of uncertainty: (1) almost all mobile applications store user data and configuration parameters in
mobile databases, access to this data can have a profound impact on the behavior of an application, \PocketData will allow us to more readily study this aspect of mobile uncertainty; (2) the infrastructure powering our
runtime system for exposing uncertainty is built around a mobile database that stores possible choices the software system can make. \PocketData will allow us to optimize this database to reduce choice latency.
mobile databases, access to this data can have a profound impact on the behavior of an application, \PocketData{} will allow us to more readily study this aspect of mobile uncertainty; (2) the infrastructure powering our
runtime system for exposing uncertainty is built around a mobile database that stores possible choices the software system can make. \PocketData{} will allow us to optimize this database to reduce choice latency.
PIs Kennedy and Ziarek have a joint research project, Just-in-Time Data Structures (JITDs), focusing on adaptive indexing~\cite{kennedy2015just}.
The project explores the use of standardized, composable data structure building blocks to dynamically assemble indexes that adapt to rapidly changing workload requirements.
The level of variation in load and resource availability that occurs in \PocketData{} workloads creates an ideal use-case for JITDs.
As noted above, our proposed infrastructure will provide us with a benchmark workload that will help us to evaluate adaptive indexes under real-world conditions, rather than through purely synthetic workloads.
PI Kennedy is part of a collaborative research project with \textit{Shambhu Upadhyaya} (UB), \textit{Varun Chandola} (UB), \textit{Hung Ngo} (UB), and \textit{Long Nguyen} (UMich) that explores techniques for identifying insider attacks on databases (NSF-CNS-1409551).
Although the threat of insider attacks on mobile devices is minimal, the specific methodology behind the work involves summarizing query logs by clustering queries into groups of queries with similar ``intent.''
The approach is showing promise for summarizing query logs from a corporate (banking) setting.
Having query logs from other settings like \PocketData{} would show that the approach can be generalized and may have applications beyond Insider Threat detection (for example to the design of index selection tools).
If successful, these efforts could also contribute back to the \PocketData{} project, as a tool for quickly summarizing and clustering query logs would help to build out the visualization and benchmark design components of the proposed infrastructure.
\subsection{Enabled Research for the Broader Community}
%\subsection{Smartphone Systems}
%\todo{The two papers we cited in PocketData}