First draft of strategy doc

This commit is contained in:
Oliver Kennedy 2024-05-28 17:56:58 -04:00
commit e7776a232f
Signed by: okennedy
GPG key ID: 3E5F9B3ABD3FDB60
2 changed files with 127 additions and 0 deletions

BIN
db_2024.pdf Normal file

Binary file not shown.

127
db_2024.typ Normal file
View file

@ -0,0 +1,127 @@
= UB Database Group: Strategic Planning
== Overview
The University at Buffalo's database group is...
=== Mission Statement
To understand points of friction experienced by users trying to access, query, and comprehend data, both big and small; and to develop new techniques that leverage formal problem specifications to guide the design of high performance, low-friction data management systems.
We believe that, with investment and continued collaboration within the department, the DB group is poised to become one of the top DB compilers groups in the country.
Sub areas related to research in the DB group include:
- Approximate and Uncertain Query Processing
- Query Compilers and Optimizers
- Distributed Systems
- Provenance / Lineage
- Incremental View Maintenance
- Data Structures
- Human Interactions
- API / Language Design
- Statistics
=== Stakeholders
*Core DB Group Members*
- Zhuoyue Zhao
- Oliver Kennedy
- Haonan Lu
*Affiliated Individuals*
- Atri Rudra$*$
- Murat Demirbas$*$
- Andrew Hirsch
- Lukasz Ziarek
- Jaric Zola
$*:$ Publishes at DB venues
=== Strengths
*Approximate and Uncertain Data*:
For several decades now, UB has been one of only a few institutions looking closely at the intersection of data management and statistical approximation: the closely related sub-fields of uncertain and/or approximate data management. In particular, UB database faculty excel at working in formalisms that walk back notions of precision and offer probabilistic and/or bounds-based guarantees.
Although these have been niche areas of the database community, both areas are poised to become critical foundations for areas of growing interest including explainable AI, and scalable AI.
Peer institutions in the US with significant influence in these areas include UPenn, U. Utah, UMass Amherst, U. Washington, Berkeley, Rice U, and U. Maryland.
*Focus on Collaboration-Driven Research*:
Members of the DB group have a history of working closely with stakeholders outside of their direct sub-fields to understand practical data management problems that arise in the field. As evidence, note extensive grant support from Oracle and Google, collaborations with Xanalytix, LogicBlox and Amazon, a startup (Breadcrumb Analytics), and a heavy reliance on high-funding CS+X programs like NSF DIBBS.
Peer institutions in the US that visibly demonstrate needs-driven research include UC Berkeley, and UPenn.
*Systems with Principled Foundations*:
Most research emerging from the UB database group tightly couples systems techniques (understanding of hardware- and user-driven constraints and how they influence system design) with formal methods (language design; code complexity; statistical models). This close integration of systems techniques and modeling is not unique to UB, but not often found: Other institutions well known for their Theory and/or Systems groups are rarely known for close interactions between them. By contrast, UB features extensive collaboration between DB faculty and theory faculty, including Atri Rudra, and Andrew Hirsch.
Peer institutions in the US that have strongly intertwined Systems and Theory groups include U. Washington and UPenn.
*Strong Collaborators, Especially in PL*:
Although UB has never had more than 2 core DB faculty, a substantial number of our faculty publish in DB venues. Examples include Murat Demirbas (several SIGMOD and VLDB papers), Atri Rudra (PODS Best Paper), and several former faculty including Lu Su, and Hung Ngo.
Of particular note, the DB and PL groups at UB are tightly coupled, sharing a regular seminar, lab space, and many informal activities. Both groups have a long history of collaborations (e.g., Oliver Kennedy and Lukasz Ziarek have a long history of collaborative grants and publications).
*Empire AI*:
Data is central to the development of modern AI. With our group's focus on working with data at scale and managing data quality at scale, we are well positioned to support the development of novel artificial intelligence techniques, and leverage recent investments into Artificial Intelligence flowing into Buffalo from Albany and Washington.
=== Weaknesses
*No Core DB Theory Faculty*:
All three currently active Database faculty are heavily systems-oriented. Since the retirement of Jan Chomicki, we have no formal methods / logic-oriented database faculty. Although Atri Rudra remains an active collaborator, the lack of a formal methods perspective internal to the DB group could substantially weaken the DB group in the longer-run.
*Systems-Track Capstone*:
At present, the MS systems track requires CSE 562 as a capstone. Coupled with growing interest in data structures and database systems style courses (e.g., the new CSE 350), we may not be able to staff relevant courses in the near future.
*Weak Systems Backgrounds in New PhD Students*:
It is getting increasingly difficult to recruit and retain PhD students who are capable software developers.
*Weak Offerings in the DB/Data Structures Space*:
At present, the majority of the database-oriented coursework at UB is CSE 4/560, CSE 4/562, and the occasionally offered CSE 662. This leads to several concrete challenges:
- Without CSE 460 as a prerequisite, undergraduates taking CSE 462 end up under-prepared for the course, lacking experience with SQL, data modeling, and/or functional-style collection programming. However, both due to 460 being a senior-level course, and the large enrollments from the graduate program, it is rare to find students interested in taking 460 for the sole purpose of getting them into CSE 462.
- UB offers very few data structures courses at the graduate level. Many MS students arrive without a clear understanding of material we cover in CSE 250, and leave never having picked it up.
=== Main Research Drivers
*Industry (Scalable Systems)*:
*Bioinformatics (Data Structures)*:
*Social Sciences (Data Quality/Integration)*:
=== Desiderata For Students
The database group at UB would like students who graduate with a UB degree to have the following skills, depending on their area of specialization.
All graduates of UB should know:
- Which ADT applies in which situation. *[CSE 250]*
- What data structures implement each ADT and what are their tradeoffs. *[CSE 250]*
- SQL or another formal DBQL. *[CSE 116$dagger$, CSE 350, CSE 4/560]*
- Sufficient extensions to SQL to be useful (e.g., spatial data). *[CSE 4/560]*
It should be possible for a student interested in specializing in database systems to graduate with:
- Collection programming as a mental model. *[CSE 305, CSE 350, CSE 4/562]*
- The ability to modeling data in terms of relations and/or nested collections. *[CSE 4/560, CSE 4/562]*
- An understanding of the cost tradeoffs of data structures, asymptotically in terms of runtime complexity, IO complexity, and memory complexity; as well as in terms of practical considerations. *[CSE 250, CSE 350, CSE 4/562, CSE 662]*
- An understanding of manual memory management and object ownership. *[CSE 220]*
A student entering from UB with an intent to conduct research should understand, or be prepared to quickly learn:
- The mathematical foundations of collections (e.g., category theory) *[???]*
- Fundamental abstractions behind data structures (e.g., pointers, collections, partitions) *[CSE 250, ???]*
- Database Compilers (Relational Algebra, Datalog, Abstract Syntax Trees, Basic Type Theory) *[CSE 450$dagger$, CSE 4/562]*
- Experience hacking on an existing database engine (e.g., Postgresql, Spark, etc...) *[CSE 4/562]*.
== Objectives
=== Near-Term (1-3 years)
- Hire a faculty member working in DB Theory (Logic, Formal Methods)
- Cross-hire a faculty member working at the intersection of databases and compilers (e.g., Datalog and/or one of the recent PL/DB offshoots like Egglog)
=== Long-Term (5-10 years)
- Become the go-to resource for UB's data management challenges.
- Leverage close collaboration with PL group to become one of the top research universities for DB Compilers in the country.
- Develop an industry-supported DB Systems group at UB.