Website/src/teaching/cse-662/index.md
2016-08-17 09:55:06 -04:00

5.5 KiB
Raw Blame History

CSE 662 - Fall 2016

Addressing the challenges of big data requires a combination of human intuition and automation. Rather than tackling these challenges head-on with build-from-scratch solutions, or through general-purpose database systems, developer and analyst communities are turning to building blocks: Specialized languages, runtimes, data-structures, services, compilers, and frameworks that simplify the task of creating a systems that are powerful enough to handle terabytes of data or more or efficient enough to run on your smartphone. In this class, we will explore these fundamental building blocks and how they relate to the constraints imposed by workloads and the platforms they run on.

Coursework consists of lectures and a multi-stage final project. Students are expected to attend all lectures. Projects may be performed individually or in groups. Projects will be evaluated in three stages through code deliverables, reports, and group meetings with either or both of the instructors. During these meetings, instructors will question the entire group extensively about the group's report, deliverables, and any related tools and technology.

  1. At initial stage, students are expected to demonstrate a high level of proficiency with the tools, techniques, data structures, algorithms and source code that will form the basis of their project. The group is expected to submit and defend a roughly 5-page report surveying the space in which their project will be performed. This report and presentation constitute 15% of the final grade.
  2. At the second stage, students are expected to provide a detailed design for their final project. A roughly 5-page report should outline the groups proposed design, any algorithms or data structures being introduced, as well as a strategy for evaluating the resulting project against the current state of the art. This report and presentation constitute 35% of the final grade.
  3. At the final stage, students are expected to provide a roughly 5-page report detailing their project, any algorithms or data structures developed, and evaluating their project against any comparable state of the art systems and techniques. Groups will also be expected to demon- strate their project and present their findings in-class, or in a meeting with both instructors if necessitated by time constraints. This report and presentation constitute 50% of the final grade.

Course Objectives

After the taking the course, students should be able to:

  • Design domain specific query languages, by first developing an understanding the common tropes of a target domain, exploring ways of allowing users to efficiently express those tropes, and developing ways of mapping the resulting programs to an efficient evaluation strategy.
  • Identify concurrency challenges in data-intensive computing tasks, and address them through locking, code associativity, and correctness analysis.
  • Understand a variety of index data structures, as well as their application and use in data management systems for high velocity, volume, veracity, and/or variety data.
  • Understand query and program compilation techniques, including the design of intermediate representations, subexpression equivalence, cost estimation, and the construction of target-representation code.

Instructors

  • Lukasz Ziarek (Davis 338E; Office Hours TBD)
  • Oliver Kennedy (Davis 338H; Office Hours TBD)

Course Schedule

  1. Datstructures and Indexes
    • Functional Datastructures
    • Indexes Review
      • Tree vs Hash
      • Clustered vs Unclustered
    • Adaptive Indexes
      • Cracker Indexes
      • Adaptive Merge Trees
      • Just-in-Time Datastructures
  2. Emerging Workload Challenges
    • PocketData
    • Object-Relational Mappers
  3. Probabilistic Languages & Data
    • Probabilistic DBs
      • Possible Worlds
      • C-Tables vs PC-Tables vs VC-Tables
    • Probabilistic Programming Languages
  4. Transactions & Synchrony
    • CAP and CALM
      • BloomL
      • "Partial results in database systems"
    • Software Transactional Memory
  5. Incremental Computation
    • Incremental View Maintenance
      • DBToaster
    • Self-Adapting Computation
  6. Program Analysis & Optimization (Time Permitting)
    • PL/Compiler Optimization Principles
    • DSLs for Data-Driven Applications
      • Declarative Games
      • Truffle/Graal/LMS

Academic Content

The course will involve lectures and readings drawn from an assortment of academic papers selected by the instructors. There is no textbook for the course.

Academic Integrity

Students may discuss and advise one another on their lab projects, but groups are expected to turn in their own work. Cheating on any course deliverable will result in automatic failure of the course. The University policy on academic integrity can be reviewed at:

http://academicintegrity.buffalo.edu/policies/index.php

Accessibility Resources

If you have a diagnosed disability (physical, learning, or psychological) that will make it difficult for you to carry out the course work as outlined, or that requires accommodations such as recruiting note-takers, readers, or extended time on exams or assignments, please advise the instructor during the first two weeks of the course so that we may review possible arrangements for reasonable accommodations. In addition, if you have not yet done so, contact the Office of Accessibility Resources (formerly the Office of Disability Services).