Work towards a research projects section

master
Oliver Kennedy 2016-01-17 20:11:14 -05:00
parent c467ce830e
commit 5073d4d1b8
11 changed files with 2403 additions and 12 deletions

View File

@ -6,7 +6,7 @@
\usepackage{graphicx}
\newcommand{\todo}[1]{\textcolor{red}{[! {#1} !]}}
\newcommand{\tinysection}[1]{\medskip \noindent \textbf{#1}. }
\newcommand{\tinysection}[1]{\smallskip \noindent \textbf{#1}. }
\setlist[itemize]{leftmargin=*,partopsep=5pt}
\setlist[enumerate]{leftmargin=*,partopsep=5pt}
@ -42,7 +42,7 @@ Lukasz Ziarek (Univ. of Buffalo, Dept. of Comp. Sci. and Eng.)}
\chapterstyle{proposal}
\section{Infrastructure Description}
\input{sections/1-description.tex}
\input{sections/1-description}
\section{Community Involvement}
\todo{Communities that will use the proposed NEW infrastructure or that have used the existing infrastructure}
@ -52,16 +52,10 @@ Lukasz Ziarek (Univ. of Buffalo, Dept. of Comp. Sci. and Eng.)}
\todo{Evidence that the new or enhanced infrastructure has community support and that any planned extensions meet the needs of the community (note that planning proposals for future CI-EN projects should include evidence that the current infrastructure that is to be enhanced has been used by CISE research communities and that these communities now desire the extensions envisioned}
\section{New Research Opportunities}
\todo{Compelling new CISE research opportunities enabled by the infrastructure}
\todo{Existing related resources along with a justification that the proposed research cannot be accomplished with these resources at the institution or elsewhere}
\input{sections/2-research}
\section{Planning Activities}
\todo{Planning activities and timeline, including ways in which the related CISE research community will be involved in the design and creation of the infrastructure}
\todo{Clear identification of individuals involved in the planning process and associated community interactions}
\todo{Indications of plans for a future CI-NEW or CI-EN proposal (the timeline and activities should be clearly arranged to align with future CRI submission dates and criteria).}
\input{sections/3-planning}
{

2147
main.bib

File diff suppressed because it is too large Load Diff

View File

@ -110,7 +110,7 @@
\end{minipage}
\small
\theauthors\\
Type: CI-New; CISE Core Division: IIS; Keywords: databases, smartphones,
Type: CI-P; CISE Core Division: IIS; Keywords: databases, smartphones,
benchmarking
}
\renewcommand{\afterchaptertitle}{\vspace{0.5\onelineskip} \hrule \vspace{0.3\onelineskip}}

View File

@ -36,6 +36,8 @@ and disk utilization
produced in this domain of \textit{pocket-scale data} are far less well
understood.
\todo{Make a stronger case that Small Data is the wave of the future.}
%%%%%%%%%%%%%%%%
To date, there have been some initial explorations of small-scale data

66
sections/2-research.tex Normal file
View File

@ -0,0 +1,66 @@
% !TEX root = ../fullproposal.tex
\todo{
\begin{itemize}
\item Compelling new CISE research opportunities enabled by the infrastructure;
\item Existing related resources along with a justification that the proposed research cannot be accomplished with these resources at the institution or elsewhere
\end{itemize}}
\subsection{Adaptive Indexes}
Selecting the correct physical structure for a database under a given workload is an extremely challenging~\cite{Chaudhuri:1997:ECI:645923.673646,Chaudhuri:1998:ALI:276304.276337,Chaudhuri:2007:SDS:1325851.1325856,Agrawal:2000:ASM:645926.671701} part of database management.
The index selection problem becomes even harder when workload characteristics fluctuate rapidly or are not known in advance.
There is currently substantial interest in a breed of self-adapting, adaptive index structures~\cite{idreos2007database,Idreos:2011:MWC:2002938.2002944} that address dynamic index selection by facilitating \textit{incremental, online} changes to the index.
Examples of adaptive indexes include Cracker Indexes~\cite{Idreos:2012:AIM:2247596.2247667,Idreos:2007:UCD:1247480.1247527,Halim:2012:SDC:2168651.2168652}, Adaptive Merge Trees~\cite{Graefe:2010:SSI:1739041.1739087,Graefe:2012:CCA:2180912.2180918}, SMIX~\cite{Voigt:2013:SSI:2484838.2484862}, H2O~\cite{163421}, and Just-in-Time Data Structures~\cite{kennedy2015just}.
Adaptive indexes automatically optimize their physical representation in response to incoming queries, reusing work used to answer the query to also improve subsequent queries. Given enough time, a stable workload, and queries that touch all data objects, an adaptive index eventually converges to a data representation similar to that of a static index.
\textbf{Infrastructure Needs:} Although there have been several efforts~\cite{Graefe:2010:BAI:1946050.1946063,schuhknecht2013uncracked} to develop benchmarks for adaptive indexes, these benchmarks rely on purely synthetic data and unit-tests rather than real-world scenarios.
This is in part because the typical enterprise workloads that rarely exhibit the type of drastic shifts that adaptive indexes target.
As a result most data management benchmarks evaluate systems under stable, steady-state workloads.
By contrast, \PocketData{} workloads often show extreme variation in both application demands and resource availability.
As a trivial example, an app might demand low-latency, low-power access to data when a user is actively using the phone, while admitting high-latency high-power organizational tasks when the phone is plugged in~\cite{Challen:2015:MWE:2699343.2699361}.
\textbf{Community Interest:}
Stratos Idreos from the DAS lab at Harvard, one of the most well known names in adaptive indexes, will use the \PocketData{} metrics and benchmark workloads to evaluate his group's work on adaptive data systems.
The PIs will likewise use these resources to evaluate their own work on Just-in-Time Data Structures.
\subsection{Small-Data Analytics}
The prevalence of tablet and smartphone computing devices makes them an ideal analytics front-end.
Apps such as Zillow, Google Earth, and MapMyRun provide specialized front-ends for data exploration.
Apple's iTunes Store has an ``Apps for Healthcare Professionals'' section with dozens of apps for visualizing and exploring patient statistics.
These apps are part of a growing number of small-data~\cite{Dit2015CIDR} analytics applications that present new and intriguing opportunities for research.
For example, smartphone and tablet touch-based interfaces require a significant redesign of the way users pose queries~\cite{nandi2013querying,Nandi:2013:GQS:2732240.2732247,Jiang:2013:GMD:2536274.2536311,Erkan:2015:EGQ:2801948.2802006}.
Embedded databases create opportunities for more detailed, interactive academic manuscripts~\cite{Dit2015CIDR,Dittrich:2015:JIA:2824032.2824114} that help to ensure reproducible results.
The relatively limited compute and memory resources available on tablets and smartphones also demand new techniques for rapidly building visualizations of medium sized databases~\cite{Jiang:2015:SPI:2809974.2809986,Singh:2012:SRS:2213836.2213858,6228146,Nobari:2013:TIS:2463676.2463700}.
\textbf{Infrastructure Needs:} Most small-data analytics efforts are presently siloed, with most research efforts focusing on the entire software stack, from the user interface front-end to the back-end database.
The standard evaluation tools offered by the \PocketData{} benchmar would help to that decouple the research challenges involved in small-data analytics, and allow a broader community of researchers to contribute.
For example, an embedded database benchmark simulating a visual query interface workload would serve as a standard for evaluating novel algorithms, indexes, and data management tools.
\textbf{Community Interest:}
Arnab Nandi from Ohio State has offered to contribute traces of human interactions with his tools for gestural query specification to the \PocketData{} effort.
Jens Dittrich of Saarland University is interested in connections between PocketData and his work on Janiform Documents~\cite{Dittrich:2015:JIA:2824032.2824114}.
\subsection{Data Management for Programs}
\todo{PL Research on ORMs, Database Compilers/Embedded Databases
\begin{itemize}
\item Alvin Cheung - UWash
\item Atanas Rountev - Ohio State (He's supporting)
\item DBToaster/Legorithmics\cite{Klonatos:2013:ASO:2463676.2465334}\cite{kennedy2011dbtoaster,koch2013dbtoaster,Ahmad:2012:DHD:2336664.2336670}
\item Oracle's efforts on Truffle/Graal
\end{itemize}
}
\subsection{Embedded Databases}
\todo{BerkeleyDB (Michael Brey ?and Margo Setzer? supporting)}
\todo{SQLite and all the other corporate DBs}
\subsection{Smartphone Systems}
\todo{The two papers we cited in PocketData}

8
sections/3-planning.tex Normal file
View File

@ -0,0 +1,8 @@
% !TEX root = ../fullproposal.tex
\todo{Planning activities and timeline, including ways in which the related CISE research community will be involved in the design and creation of the infrastructure}
\todo{Clear identification of individuals involved in the planning process and associated community interactions}
\todo{Indications of plans for a future CI-NEW or CI-EN proposal (the timeline and activities should be clearly arranged to align with future CRI submission dates and criteria).}

Binary file not shown.

View File

@ -0,0 +1,59 @@
\documentclass[11pt]{article} %%Margins must be 1 inch, top & bottom 1 in%%
\marginparwidth 0pt
\oddsidemargin 0pt
\evensidemargin 0pt
\marginparsep 0pt
\topmargin -.50in
\textwidth 6.5in
\textheight 9in
\usepackage{url,hyperref}
\begin{document}
\begin{center}
{\huge \sc Data Management Plan}
\vspace{0.05in}
\hrule
\end{center}
~~\\
\begin{enumerate}
\item \textbf{Types of data and other materials produced during the course
of the project.} \\
There are several main types of materials that the investigators will produce during the course of this project:
\begin{enumerate}
\item smartphone database log traces, associated metadata, and documentation;
\item a public binary and source-code release of a smartphone instrumentation toolkit and embedded database benchmark, developer documentation for relevant APIs and interfaces, and application examples;
\item minutes and proceedings of the proposed workshop;
\item scientific papers describing the tools and datasets generated;
\item course materials, including lecture notes, problem sets, and demonstrations.
\end{enumerate}
\item \textbf{Standards to be used for data format and content.}\\
The standard format the dissemination of research results for the computer science community is to provide access to the research papers in a Portable Document Format (i.e., .pdf) in several online repositories such as \url{www.arXiv.org} and the IEEE and Association for Computing Machinery (\url{http://ieeexplore.ieee.org/Xplore} and \url{http://dl.acm.org/}) digital libraries. It is also standard practice to post a copy on a personal webpage (depending on the copyright restrictions of peer-reviewed journals). Source code is also stored on personal webpages or at public repositories such as \url{www.github.com}. Demonstration deployments of the proposed systems will be made available through SUNY Buffalo or a cloud-hosting platform such as Amazon EC2 (\url{http://aws.amazon.com}).
Log traces will be stored in standard formats such as CSV and JSON, and schema documentation will be stored in plaintext, HTML, Markdown, and/or PDF.
Course materials are stored in a variety of formats from PDF files of lecture notes and problem sets, to PowerPoint and PDF slides will be posted on the personal webpages of the course instructors. Interactive course materials built on top of Astral will be made publicly accessible through the demonstration deployment.
System implementations, log parsers, and other programs produced as part of this project will be in standard programming languages such as Java, JavaScript, and C.
\item \textbf{Methods and policies for providing access and enabling sharing.} \\
All public data will be stored either in SUNY Buffalo databases or on a publicly available website.
To preserve participant privacy, an approval letter from the researcher's institutional review board will be required before we can release query traces.
\item \textbf{Provisions for re-use, re-distribution, and the production of derivatives.} \\
We will use Creative Commons licenses \url{http://creativecommons.org/licenses/} for the re-use, re-distribution, and the production of derivatives of the project data, while respecting the limitations on copyrighted material from published journals. We will use Apache licenses \url{http://www.apache.org/licenses/} for the re-use, re-distribution, and the production of derivatives of all project source code, while respecting the licenses of any library dependencies.
\item \textbf{Methods for archiving and preserving access to data and materials.} \\
All data and materials will be stored on back up systems indefinitely.
The investigators will mainly rely upon the Computer Science department of SUNY Buffalo to maintain web access to the course materials, sources codes, and papers. These are SUNY Buffalo supported facilities.
\end{enumerate}
\end{document}
%% LocalWords: APIs pdf webpage webpages SUNY EC PowerPoint website
%% LocalWords: JavaScript

BIN
supplements/facilities.pdf Normal file

Binary file not shown.

View File

@ -0,0 +1,56 @@
\documentclass[11pt, oneside]{article} % use "amsart" instead of "article" for AMSLaTeX format
\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
\usepackage{graphicx} % Use pdf, png, jpg, or eps§ with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
\usepackage{amssymb}
\title{Brief Article}
\author{The Author}
%\date{} % Activate to display a given date or no date
\begin{document}
\begin{center}
{\LARGE
\textsc{Facilities Statement}
}
\hrule
\end{center}
Computer Science and Engineering (CSE) maintains multiple information technology services and facilities to support its research mission. These resources and facilities include (but are not limited to):
Storage infrastructures, Compute services, Lab and conference facilities, Desktop infrastructures, Application and database hosting, Network and firewalling, Disaster recovery, Asset and license management / procurement, Print and digital imaging services, and Security systems and environmental monitoring.
Computer Science and Engineering (CSE) data storage facilities include vulcan, a NetApp FAS 2050A storage area network that provides 12 TB of installed storage capacity, redundant network access paths and RAID 6 data redundancy. Several network attached storage devices, exceeding 30 TB are used by researchers for various purposes. A variety of timeshare machines and virtualized servers running samba provide data access services.
CSE faculty compute systems include castor, a Sun Blade 1000; citrix[1-3], a load-balanced Citrix farm of Dell PowerEdge 2650 servers; the-who, a Sun Fire V20z desktop virtualization server; benatar, a virtualized general compute server; and the underground cluster, a 4-node compute cluster comprised of Dell 1425s. CSE Faculty also have use of all CSE student systems (below).
CSE student compute systems include timberlake, a Dell PowerEdge R600 compute server; metallica, a Dell PowerEdge R500 compute server; pollux, a Sun Sparc enterprise T5220 compute server; coldplay, a Sun Fire V20z compute server; fork, a Sun Fire V20z dedicated to the Operating Systems course; nickelback, a Dell PowerEdge 1950 desktop virtualization server; dragonforce, a Dell PowerEdge R720 desktop virtualization server; styx, a Dell PowerEdge R400 desktop virtualization server.
CSE research groups occupy 6628 square feet of research lab space ranging from secure, monitored, temperature-controlled data centers to specialized experimental facilities. CSE instructional labs occupy 4096 square feet, each configured to serve the characteristic needs of the courses they host. The Patricia Eberlein is the CSE general student computing lab which occupies 1056 square feet.
More than 270 Windows, MacOS and Linux PCs and thin client terminals are available across the multiple research, instructional and student labs. Each lab is equipped with printing/imaging (accessible remotely) and presentation equipment. Internet connectivity to all lab spaces is provided by 1 Gb/s Ethernet network connections to Cisco Catalyst gigabit layer 3 switches along with a 802.11 b/g/n wireless encrypted network. Various level 3 switches are all connected to the campus 10Gb/s fiber backbone.
CSE faculty, researchers and students also have access to compute labs administered by School of Engineering Node Services (SENS) and Computing and Information Technology (CIT).
Our four CSE conference rooms occupy 2075 square feet, all of which are equipped with presentation equipment and data terminals to enable research presentations.
Our four state of the art data centers occupy 2,173 square feet, all of which are environmentally conditioned and monitored 24/7. Power redundancy is handled both by an emergency generator located in the mechanical penthouse and 48 double-conversion Liebert GXT3 uninterruptable power supplies. Network connectivity to the data centers is provided by 11 Cisco Catalyst gigabit layer 3 switches. Various level 3 switches are all connected to the campus 10Gb/s fiber backbone. Each data center has incrementally increasing access levels ranging from student or research group access to authorized personnel only access.
Access to all CSE facilities is handled by a BASIS card access system which CSE has customized with automated update scripts and web forms to enable low management and administration overhead.
Both Windows and Linux based desktop virtualization infrastructures allow CSE faculty, researchers and students to connect to their hosted workspaces, applications and data from anywhere an internet connection is available. The University provides the Cisco Anyconnect virtual private network client fo r secure off campus access to resources.
Advanced web services and several application frameworks are provided to all CSE faculty, researchers and students on cheshire, a Dell Precision 530. MySQL and PostgreSQL database services are provided by tethys, a Dell Precision 530; Students also have access to the enterprise-wide Academic Oracle Service (AOS) and enterprise-wide compute and e-mail servers. Open source course management, submission and automated grading services are provided on web-cat, a virtual machine running on a Dell R900.
CSE provides networking and firewalling services with a model that seeks to balance researcher flexibility and the need to keep University and CSE services and networks safe, secure, and functional. This is accomplished with the use of: Several sub-networks separated by router/firewall devices (Dell Poweredge SC1425s), Automated network monitoring tools, as well as Provision of network helper services such as: Domain name services, Dynamic host control protocol services, Network time protocol services, Directory services, Trivial file transfer protocol services, Simple mail transfer protocol relays, and File transfer protocol services.
Our departmental systems are backed up using multiple Dell Power Edge SC1425s that communicate with UB'€™s enterprise-wide IBM Tivoli Storage Manager (ITSM), which backs up data to an IBM 3494 Tape Library and an off-site tape library, as well as to a locally-managed Sun L-180 LTO tape library and virtual machine provide point in time.
The DB/PL lab at the University at Buffalo maintains additional resources specifically for internal use, including multiple x86 workstations and laptops, and low-power development boards (Raspberry Pis and Intel Galileos) for general student and PI use, 12- and 16-core Intel Xeon servers, 32- and 64-core AMD Opteron servers, as well as a 16-node Hadoop cluster shared with 3 other labs. Lab workstations and laptops are configured with OS X or Windows. Servers are configured with Redhat Enterprise Linux.
The DB/PL lab at University at Buffalo maintains additional resources for internal use, including multiple x86 workstations, laptops, and low-power development boards (Raspberry Pis and Intel Galileos) for general student and PI use. Server infrastructure for the lab includes an application server supporting a lab project management system, teaching support applications, and trial deployments of lab-developed software, an Oracle database server testbed, a 32-core and a 64-core AMD Opteron and a 12-core Intel Xeon-based testbed server, as well as a 16-node Hadoop cluster shared with 3 other labs. Lab workstations and laptops are configured with OSX or Windows. Servers are configured with Redhat Enterprise Linux.
\end{document}

View File

@ -0,0 +1,61 @@
\documentclass[11pt]{article} %%Margins must be 1 inch, top & bottom 1 in%%
\marginparwidth 0pt
\oddsidemargin 0pt
\evensidemargin 0pt
\marginparsep 0pt
\topmargin -.50in
\textwidth 6.5in
\textheight 9in
\usepackage{url,hyperref}
\begin{document}
~~\\
\begin{center}
{\huge \sc Budget Justification}
\vspace{0.05in}
\hrule
\end{center}
~~\\
\section*{Senior Personnel}
PIs Kennedy and Ziarek are each budgeted one month summer salary. PI Kennedy will apply his expertise and experience in the areas of databases, incremental computation, web applications, and security. PI Ziarek will apply his expertise and experience in the areas of programming languages, distributed computation, and security. Both PIs will take responsibility for (1) advising and coordinating student-driven research as described below, (2) assisting students in publishing their work in workshops, conferences, journals, and other media, and (3) assisting in the development and evaluation of prototype implementations as necessary.
\section*{Other Personnel}
Funding is requested for two computer science graduate student assistants to start in year 1 and in the middle of year 2, respectively. The two-semester and summer salary for the first student in the first year is \$22,000, with a 2\% increase every year (\$22,440 and \$22,889 in years two and three, respectively). The salary for the second graduate student for half of the second year is \$11,000 and \$22,440 in year 3.
Graduate student assistants will be expected to (1) participate in the development of theoretical constructs and system designs, (2) drive the development and evaluation of the prototype implementation and subsequent novel innovations, (3) participate in the identification of research challenges and novel solutions, and (4) document their findings through workshop, conference, and journal papers, blog posts, and other media as appropriate.
\section*{Fringe Benefits and Indirect Costs}
Fringe benefit rates are based on the applicable federally negotiated rates published at \\
\url{http://www.research.buffalo.edu/sps/about/rates.cfm}
\section*{Equipment}
N/A
\section*{Travel}
Travel may include trips to NSF meetings, conferences and workshops, and any PI meetings. Major conferences such as SIGMOD, VLDB, POPL, PLDI, and ICDE, typically last 4-5 days, and are located both domestically and internationally. Workshops are often affiliated with major conferences, and attendees frequently attend both. We have budgeted for 2 conference visits per year.
\noindent \textbf{Domestic Travel} As an example of a domestic conference, we use SIGMOD 2016 being held in San Fransisco, CA. We anticipate a lodging cost of \$99 per night and a \$59 perdiem. The subtotal for 3 attendees over 5 nights is \$2,310. We expect airfare of \$630 and average conference registration fees of \$600 per person for a total domestic travel cost of \$6000.
\noindent \textbf{Foreign Travel} As an example of a foreign conference, we use ICDE 2016 being held in Helsinki, Finland. We anticipate a lodging cost of \$200 per person, and a \$260 perdiem. The subtotal for 2 attendees over 5 nights is \$5,200. We expect airfare of \$1000 and average conference registration fees of \$700 per person for a total domestic travel cost of \$8,000.
\section*{Other Direct Costs}
\subsection*{Computer Services}
The negotiated rate with the Department of Computer Science and Engineering for computer services is \$156 per month of effort from faculty and students, or \$2,184 in year one, \$3,120 in year two, and \$4,056 in year three.
\subsection*{Materials and Supplies}
\$2,000 is requested per year for Materials and Supplies to purchase desktop computers for the graduate research students and faculty working on this project. The computers will be used for code development, experimental evaluation, paper writing and typesetting and other efforts related to this project.
\subsection*{Other}
Tuition is budgeted at the standard University at Buffalo rates for the Graduate Research Assistant at 9 credit hours per GRA per semester.
The anticipated out-of-state student tuition is \$18,144 for one student in year one, \$29,673 for one and a half students in year two, and \$43,128 for two students in year three.
\subsection*{Indirect Costs}
Indirect cost rates are based on the applicable federally negotiated rates published at \url{http://www.research.buffalo.edu/sps/about/rates.cfm}.
\subsection*{Budget Overview}
With the standard fringe rates, student tuition, and university overhead, the requested budget is \$494,274.
\end{document}
%% LocalWords: PIs Ziarek blog UB SIGMOD VLDB POPL PLDI ICDE