Website/src/teaching/cse-562/2021sp/index.erb

205 lines
12 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

---
title: CSE-4/562 Database Systems (Spring 2021)
schedule:
- date: "Feb. 2"
topic: "Introduction, SQL, Checkpoint 0"
materials:
lecture: https://youtu.be/KtabZcW0w7g
slides: slide/2021-02-02-Intro.html
- date: "Feb. 4"
topic: "Scala Primer"
materials:
lecture: https://youtu.be/0TndAT2MkPs
slides: slide/2021-02-04-Scala.html
- date: "Feb. 9"
topic: "Relational Algebra + Spark"
materials:
lecture: https://youtu.be/xnJNTTirgoY
slides: slide/2021-02-09-RA-Basics-and-Spark.html
- date: "Feb. 11"
topic: "Relational Algebra Equivalence Rules"
materials:
lecture: https://youtu.be/IJLLCB6tdCk
slides: slide/2021-02-11-RA-Equivs.html
- date: "Feb. 16"
topic: "Algorithms, Checkpoint 1"
due: "Checkpoint 0"
materials:
checkpoint1: "checkpoint1.html"
lecture: https://youtu.be/kCAUs09D1tQ
slides: slide/2021-02-16-Checkpoint1.html
- date: "Feb. 18"
topic: "Relational Algebra Algorithms"
materials:
homework1: https://autograder.cse.buffalo.edu/courses/CSE562-s21/assessments/homework1
lecture: https://youtu.be/z_rleGf-6zE
slides: slide/2021-02-18-QueryAlgorithms.html
- date: "Feb. 23"
topic: "Extended Relational Algebra"
materials:
slides: slide/2021-02-23-ExtRA.html
- date: "Feb. 25"
topic: "Physical Data Layout"
materials:
slides: slide/2021-02-25-PhysicalLayout.html
- date: "Mar. 2"
topic: "Indexes: Tree-Based, Hash"
materials:
slides: slide/2021-03-02-Indexing1.html
- date: "Mar. 4"
topic: "Indexes: View-Based, Modern"
- date: "Mar. 9"
topic: "Spark's Optimizer + Checkpoint 2"
due: "Checkpoint 1"
- date: "Mar. 11"
topic: "Cost-Based Optimization"
- date: "Mar. 16"
topic: "Cost-Based Optimization (contd.)"
- date: "Mar. 18"
topic: "Distributed Queries: Challenges + Partitioning"
- date: "Mar. 23"
topic: "Distributed Queries: Semi + Bloom Join"
- date: "Mar. 25"
topic: "Aggregation + Checkpoint 3"
due: "Checkpoint 2"
- date: "Mar. 30"
topic: "Online Aggregation/Approximate Query Processing"
- date: "Apr. 1"
topic: "Streaming Queries"
- date: "Apr. 6"
topic: "Data Updates + Incremental View Maintenance"
- date: "Apr. 8"
topic: "Indexing Review + Checkpoint 4"
due: "Checkpoint 3"
- date: "Apr. 13"
topic: "Transactions: Intro + Concepts"
- date: "Apr. 15"
topic: "Transactions: Pessimistic"
- date: "Apr. 20"
topic: "Transactions: Optimistic"
- date: "Apr. 22"
topic: "Logging + Recovery"
- date: "Apr. 27"
topic: "Distributed Commit"
- date: "Apr. 29"
topic: "Distributed Commit (contd.)"
- date: "May 4"
topic: "Data Sketching"
- date: "May 6"
topic: "Provenance"
---
<h1 style="text-align: center;"><%= title %></h1>
<p style="text-align: justify;">Data Management Systems (including Relational Databases, Non-Relational Databases, and NoSQL storage systems) form the basis of the Big Data Economy we now live in.  A data management system is responsible for storing data, enabling efficient access to that data, as well as mediating concurrent modifications.  This class approaches the challenges of designing a data management system from a standpoint that is both principled and practical.  The course revolves around a term-long programming assignment, in which you will build a system that answers SQL queries efficiently.  Course lectures will focus on the conceptual basis for this system, and will discuss how the techniques you learn generalize (e.g., to the use of NoSQL systems)</p>
In this course, you will learn...
<ul>
<li>... how to efficiently store and retrieve data programatically.</li>
<li>... how to optimize big-data computations.</li>
<li>... how to use index structures to accelerate computations.</li>
<li>... how to safely and efficiently manipulate data concurrently.</li>
<li>... how to recover state after software and hardware failures.</li>
<li>... how to query and update distributed data consistently.</li>
</ul>
<h2>Course Details</h2>
<ul>
<li><strong>Class</strong> T/R 12:45-2:00 PM on YouTube</li>
<strong>Instructors: </strong><ul>
<li><a href="https://odin.cse.buffalo.edu/people/oliver_kennedy.html">Oliver Kennedy</a>. Office Hours: <a href="https://buffalo.zoom.us/j/91788396230">Monday: 1 PM-3 PM Starting Feb 8</a> (Note: You must be logged into your UB Zoom Account)</li>
</ul></li>
<li><strong>TAs: </strong><ul>
<li>Darshana Balakrishnan: Office Hours: <a href="https://buffalo.zoom.us/j/94779036185">Wednesday and Friday: 10am-11:30am ET</a> (Note:You must be logged into your UB account)</li>
</ul></li>
<li><strong>Course Discussions: </strong> <a href="https://piazza.com/buffalo/spring2021/cse462562">Piazza</a></li>
<li><strong>No Required Textbook</strong></li>
<li><strong>Optional Database Concepts References</strong>: <ul>
<li>"Database Systems: The Complete Book" 2e. by Garcia-Molina, Ullman, and Widom</li>
<li>"<a href="https://smile.amazon.com/Patterns-Data-Management-Flipped-Textbook/dp/1523853964/ref=sr_1_1?ie=UTF8&qid=1483409680&sr=8-1&keywords=patterns+in+data+management">Patterns in Data Management</a>" by Jens Dittrich</li>
<li>"<a href="http://www.redbook.io/">The Red Book: Readings in Databases</a>" ed. Bailis, Hellerstein, and Stonebraker</li>
</ul></li>
<li><strong>Optional SQL References</strong>: <ul>
<li>"SAMS Teach Yourself SQL in 10 Minutes" 4e. by Ben Forta</li>
</ul></li>
<li><strong>Optional Scala References</strong>: <ul>
<li>"<a href="https://docs.scala-lang.org/overviews/scala-book/introduction.html">The Scala Book</a>"</li>
<li>"<a href="https://www.manning.com/books/functional-programming-in-scala">Functional Programming in Scala</a>" by Chiusano and Bjarnason</li>
</ul></li>
<li><strong>Homework Submission: </strong> <a href="https://autograder.cse.buffalo.edu">Autolab</a></li>
<li><strong>Project Submission</strong>: <a href="https://autograder.cse.buffalo.edu">Autolab</a></li>
<li><strong>Git Repository Management</strong>: <a href="https://microbase.odin.cse.buffalo.edu">Microbase</a></li>
<li><strong>Software</strong>: <ul>
<li>Scala 2.12 (
<a href="https://docs.scala-lang.org/">Documentation</a> |
<a href="https://www.scala-lang.org/api/2.12.13/">ScalaDoc</a>
)</li>
<li>Catalyzer (
<a href="https://doc.odin.cse.buffalo.edu/catalyzer/org/apache/spark/index.html">ScalaDoc</a> |
<a href="https://gitlab.odin.cse.buffalo.edu/okennedy/catalyzer">Source Code</a>
)</li>
</ul></li>
<li><strong>Grading</strong>:
<ul>
<li>50% theory<ul>
<li>20% Group Homeworks (Group size: 1-4; Skip any 4 submissions for any reason)</li>
<li>20% Comprehensive Final (see HUB for time/location)</li>
<li>Extra 10% for the better of the above two</li>
</ul>
</li>
<li>50% projects (Group size: 1)</li>
<ul>
<li>5% <a href="checkpoint0.html">Checkpoint 0</a> due on Feb. 16</li>
<li>10% <a href="checkpoint1.html">Checkpoint 1</a> due on Mar. 9</li>
<li>12% <a>Checkpoint 2</a> due on Mar. 30</li>
<li>8% <a>Checkpoint 3</a> due on Apr. 13</li>
<li>15% <a>Checkpoint 4</a> due on May 7</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2>Lecture Schedule</h2>
<ul>
<% schedule.each do |data| %>
<li style="list-style-type: none;list-style-type; margin-top: 3px;">
<div style="display: inline-block; vertical-align: top; text-align: right; padding-right: 20px; width: 100px; font-style: italic;"><%=data["date"]%></div>
<div style="display: inline-block; padding-left: 10px; border-left: solid 1px black;">
<% if data.has_key? "due" %>Due: <u><%= data["due"] %></u><br/><% end %>
<%=data["topic"]%>
<% unless data.fetch("materials", {}).empty? %> ( <%= data["materials"].map { |r,url| "<a href=\"#{url}\">#{r}</a>" }.join(" | ") %> )<% end %>
<% if data.has_key? "textbook" then %>
<div style="font-size:70%; margin-left:20px;"><%= data["textbook"] %></div>
<% end %>
</div>
</li>
<% end %>
</ul>
<h2>Respect</h2>
I expect students in this class to show respect for each other and themselves. This includes, but is not limited to the following forms of respect:
<dl>
<dt>Respect each other's humanity</dt>
<dd>Especially with us not meeting in person, it's easy to lose track of the fact that the folks you're interacting with (fellow students, TAs, and everyone else others) are humans too. Think how what you're saying will be interpreted before you speak/write. Avoid insulting language, and focus on the merits of the ideas being discussed. Avoid dismissing ideas outright (if you can't come up with a good counterargument, maybe it's not actually a bad idea?)</dd>
<dt>Respect each other's intent</dt>
<dd>Especially given how bad text is at communicating emotion, try to avoid assuming that others are attacking you personally. Try to view what others are saying to you in the best possible light. Always ask for clarification before you get angry.</dd>
<dt>Respect yourself and your limits</dt>
<dd>Most students in 4/562 put in a lot of work on this course, so unsurprisingly, occasionally students decide that the course is too much work. If and when this happens to you, talk to me or another member of the course staff. It may be something as simple as you spacing out and missing a critical bit of some lecture, and armed with this information you can proceed to ace the class! Maybe the course actually is too much work for you this semester, in which case we'll still be able to come up with some strategy that lets you move forward with your education. Whatever the case, talk to me or course staff and we will figure something out.</dd>
<dt>Respect each other's effort</dt>
<dd>The flip side of this is that since colleagues are putting in the effort, you should do the same. Do not unilaterally decide that you do not have to do the same work that they are. Do not copy code/answers from the internet or other students. If you do not complete the same work as your classmates, do not expect to earn the same grade.</dd>
</dl>
<h2>Academic Integrity</h2>
<p style="text-align: justify;">Students may discuss and advise one another on their lab projects, but groups are expected to turn in their own work.  Discussing concepts is permitted.  Referencing another group's code is not.  Cheating on an exam or project submission will result in an grade of F in the course for all involved. It is the CSE department's policy not to provide financial support to any student disciplined for plagarism. University policies on academic integrity can be reviewed at:</p>
<p style="text-align: center;"><a href="https://engineering.buffalo.edu/computer-science-engineering/information-for-students/policies/academic-integrity.html">CSE Departmental Policy on Academic Integrity</a></p>
<p style="text-align: center;"><a href="https://catalog.buffalo.edu/policies/integrity.html">UB's University-Wide Undergraduate Academic Integrity Policy</a></p>
<p style="text-align: center;"><a href="http://grad.buffalo.edu/succeed/current-students/policy-library.html">The Graduate School Policy Library</a></p>
<h2>Medical Emergencies</h2>
<p style="text-align: justify;">Accommodations for medical emergencies will be made on a case-by-case basis.  Requests for extensions based on medical emergencies must be accompanied by documentation of the emergency <b>from student health services</b>:</p>
<p style="text-align: center;"><a href="http://www.buffalo.edu/studentlife/who-we-are/departments/health.html">Student Health Services</a></p>
<h2>Accessibility Resources</h2>
<p style="text-align: justify;">If you have a diagnosed disability (physical, learning, or psychological) that will make it difficult for you to carry out the course work as outlined, or that requires accommodations such as recruiting note-takers, readers, or extended time on exams or assignments, please advise the instructor during the first two weeks of the course so that we may review possible arrangements for reasonable accommodations. In addition, if you have not yet done so, contact:</p>
<p style="text-align: center;"><a href="https://www.buffalo.edu/studentlife/who-we-are/departments/accessibility.html">The Office of Accessibility Resources</a>.</p>