Merge branch 'master' of git.odin.cse.buffalo.edu:ODIn/Website

master
Oliver Kennedy 2024-02-21 13:32:40 -05:00
commit 41f4bf9141
Signed by: okennedy
GPG Key ID: 3E5F9B3ABD3FDB60
2 changed files with 211 additions and 0 deletions

View File

@ -0,0 +1,203 @@
# P1 - Binary Search (On Disk)
**Deadline**: Sunday, March 10; 11:59 PM
**Accept Assignment**: https://classroom.github.com/a/gd95nzgc
**Submit Assignment**: https://autolab.cse.buffalo.edu/courses/cse410-s24/assessments/P1-B-Trees
In this assignment, you will implement an on-disk B+Tree.
This assignment is intended to:
- Give you experience building a paged, on-disk data structure
- Give you experience in enforcing data structure constraints
- Give you experience working with an existing codebase
You should expect to spend approximately 20-30 hours on this assignment. Plan accordingly.
----
To complete this assignment, you should:
1. Accept this assignment through [GitHub Classroom](https://classroom.github.com/a/gd95nzgc).
2. Modify the file `src/bplus_tree.rs`, implementing the functions labeled `todo!()`.
3. Commit your changes and push them to Github.
4. Go to [Autolab](https://autolab.cse.buffalo.edu/courses/cse410-s24/assessments/P1-B-Trees), select your repository, acknowledge the course AI Policy, and click Submit.
You may repeat steps 2-4 as many times as desired. You may also modify any of the files in the `page` module.
----
## Overview
In this assignment, you will implement key parts of an on-disk B+Tree.
----
## Documentation
You may find the following documentation useful:
* [The Rust Book](https://doc.rust-lang.org/book/)
* [std::fs::File](https://doc.rust-lang.org/std/fs/struct.File.html)
* Run `cargo doc --open`
----
The following utility methods are provided, and may be useful
#### `bplus_tree::BPlusTree::init(path)`
Initialize a fresh BPlusTree backed by the file at the specified path.
#### `bplus_tree::BPlusTree::open(path)`
Open an existing BPlusTree backed by the file at the specified path.
#### `bplus_tree::BPlusTree::get_page(&self, idx)`
Retrieve the page at the specified index. The type of the page read is determined by Rust's typesystem. Both of the following approaches work:
```
let page: DirectoryPage = self.get_page(idx);
let page = self.get_page::<DirectoryPage>(idx);
```
#### `bplus_tree::BPlusTree::put_page(&self, idx, &page)`
Write the provided page to disk at the provided index.
#### `bplus_tree::BPlusTree::check_tree(&self)`
Sanity check the contents of the tree. If any standard assumptions are not met, this method returns Ok(Some(err_msg))
#### `bplus_tree::BPlusTree::print_tree(&self)`
Print out the tree to standard out
#### The `page` module
The `page` module (`src/page/mod.rs`) provides functionality for reading and writing different types of pages. Each implementation of the page trait (`DirectoryPage`, `LeafPage`, `FreePage`, and `MetadataPage`) provides functionality for manipulating the page. See the project documentation (`cargo doc --open`) for more details.
----
## General File Structure
We're going to assume that page 0 contains a `MetadataPage`. After calling `::init()` or `::open()`, the metadata page contents will be available as the `.meta` field of the returned tree.
Note that you *must* manually write changes to the metadata page back to disk. There is a convenience method for this: `put_meta()`
Note the contents of the `MetadataPage` object. Test cases assume the following:
- `next_free_page`: A pointer to the first free page or NULL_IDX otherwise.
- `root_page`: A pointer to the root directory page.
- `data_head`: A pointer to the first leaf page.
- `data_tail`: A pointer to the last leaf page.
- `pages_allocated`: The number of pages allocated in the file (including the metadata page).
- `depth`: The number of levels of directory pages in the file.
There must always be at least one directory page and one leaf page, even in an empty file.
Finally, recall the B+Tree constraint: A Directory/Leaf page must be at least 50% full at all times. The only exceptions to this rule are:
- The root directory page may contain fewer than 50% entries, but must contain at least one key.
- The root directory page of a depth=1 tree may be completely empty.
- If a tree contains only a single leaf page, this leaf page may contain fewer than 50% entries.
The `.is_underfull()` and `.can_allow_stolen_key()` methods on `DirectoryPage` and `LeafPage` can help to enforce these constraints.
----
## Objectives
In this assignment, you will implement four functions:
#### `bplus_tree::BPlusTree::alloc_page(&self, &page)`
This method should allocate a new page for use by the caller and write the provided page to it. If a previously freed page is available, this should be used first. Otherwise, `alloc_page` should write the page to the end of the file.
`alloc_page` should ensure that the metadata page is appropriately updated:
- `free_page` should
**Complexity:**
- Memory: O(1)
- IO: O(1)
#### `bplus_tree::BPlusTree::free_page(&self, &page)`
This method should release a page after use. The page contents should be overwritten with a `FreePage` and the page should be made available for use by a subsequent call to `alloc_page` (e.g., by marking it in the `MetadataPage`'s `next_free_page` field).
**Complexity:**
- Memory: O(1)
- IO: O(1)
#### `bplus_tree::BPlusTree::put(&self, key, value)`
This method should insert a key/value pair into the tree. Subsequent calls to `get(key)` should return `value`. If `key` is already present, the prior value should be overwritten.
The write should persist restarts; if the same file is later re-opened `get(key)` should still return `value`.
**Note:** Recall that if a leaf page splits, it may also trigger splits in the ancestors.
**Complexity:**
- Memory: O(log_K(N))
- IO: O(log_K(N)) reads, amortized O(1) writes (worst case O(log_K(N)))
#### `bplus_tree::BPlusTree::delete(&self, key)`
This method should remove a key from the tree. Subsequent calls to `get(key)` should return `None`.
The delete should persist restarts.
Recall from the notes above that no page (leaf or directory) should be less than 50% full (see `is_underfull()`). An underfull page can be addressed in one of the following ways:
- 'Stealing' a record from the preceding or following sibling page (i.e., page with the same parent).
- 'Merging' with the preceding or following sibling page.
Since stealing is possible for any page at over 50% capacity, and merging is possible for any two pages at 50% capacity or below, one of the two options will always be possible.
**Complexity:**
- Memory: O(log_K(N))
- IO: O(log_K(N)) reads, amortized O(1) writes (worst case O(log_K(N)))
---
## Strategy
Note the `DIR_KEY_COUNT` and `LEAF_RECORD_COUNT` constants defined in `src/page/dir_page.rs` and `src/page/leaf_page.rs` respectively. In the template file, these are set as high as possible. However, for the purpose of debugging, you may find it convenient to set them to lower values (e.g., `4`).
A compile-time assertion defined will not allow you to set these values to be greater than the size of a single page.
You are encouraged to subdivide the problem into 3 phases. The provided test cases are designed accordingly:
1. Implement a page allocator (`alloc_page`, `free_page`) and pass the first provided test case.
2. Implement the `put` method and pass the second provided test case.
3. Implement the `delete` method and pass the third provided test case.
The `put` and `delete` methods also benefit from being broken down into smaller cases. You may find it convenient to solve these cases one at a time. The `todo!()` macro can be very helpful here, allowing you to test implementations of one case at a time, as you develop them.
#### Put
1. There is enough space on the leaf page.
2. The leaf page needs to split, but its parent directory page has enough space.
3. The leaf page and its immediate parent need to split, and the immediate parent is the root page.
4. The leaf page and its immediate parent need to split, and the immediate parent is not the root page.
5. The leaf page and one or more of its ancestors need to split.
#### Get
1. The leaf page is not underfull after deletion.
2. The leaf page is underfull, but one of its siblings can be stolen from.
3. The leaf page is underfull, neither of its siblings can be stolen from, but the immediate parent doesn't become underfull after merging the leaf pages.
4. The leaf page is underfull, neither of its siblings can be stolen from, and the immediate parent can steal from one of its siblings.
5. The leaf page is underfull, neither of its siblings can be stolen from, the immediate parent is underfull, and neither of its siblings can be stolen from.
6. The leaf page and one or more of its ancestors need to merge.
----
## Additional Notes
- In addition to `src/bplus_tree.rs`, you may modify any of the existing files in `src/mod`.
- You may modify the structure of the files and page layouts, as long as you pass the provided test cases.
- You may **not** add new crates without permission.

View File

@ -82,12 +82,20 @@ deliverables:
- item: "Project 1: Binary Search On Disk"
due: Feb 11
links:
assignment: assignments/p1.html
accept: https://classroom.github.com/a/hsxNiYoj
submit: https://autolab.cse.buffalo.edu/courses/cse410-s24/assessments/P1-Binary
- item: "Written 1: EM Analysis"
due: Feb 18
links:
accept: https://classroom.github.com/a/hsxNiYoj
submit: https://autolab.cse.buffalo.edu/courses/cse410-s24/assessments/P1-Binary
- item: "Project 2: B+ Tree"
due: Mar 10
links:
assignment: assignments/p2.html
accept: https://classroom.github.com/a/gd95nzgc
submit: https://autolab.cse.buffalo.edu/courses/cse410-s24/assessments/P1-B-Trees
- item: "Written 2: B+ Tree Analysis"
due: Mar 10
- item: "Project 3: LSM Tree"