Put things in a predictable location or a specific order.
+
("clustering" the data)
+
+
+
+
Wrap the data.
+
Record where specific data values live
+
("indexing" the data).
+
+
+
+
+
+
Data Organization
+
+
+
+
Unordered Heap
+
No organization at all. $O(N)$ reads.
+
+
+
+
(Secondary) Index
+
Index structure over unorganized data. $O(\ll N)$ random reads for some queries.
+
+
+
+
Clustered (Primary) Index
+
Index structure over clustered data. $O(\ll N)$ sequential reads for some queries.
+
+
+
+
+
+
Data Organization
+
+
+
+
+
Data Organization
+
+
+
+
+
Index Types
+
+
Tree-Based
+
A hierarchy of decisions lead to data at the leaves.
+
+
+
Hash-Based
+
A hash function puts data in predictable locations.
+
+
CDF-Based (new)
+
A more complex function predicts where data lives.
+
+
+
+
+
+
+
+
Tree-Based Indexes
+
+
+
+
+
+
Tree-Based Indexes
+
+
+
+
+
+
Challenges
+
+
+
Balance
+
Bad question orders lead to poor performance!
+
+
IO
+
Each access to a binary tree node is a random access.
+
+
Which Dimension
+
Why limit ourselves to asking about one dimension?
+
+
+
+
+
+
+
+
+
Worst-Case Tree?
+
$O(N)$ with the tree laid out left/right-deep
+
Best-Case Tree?
+
$O(\log N)$ with the tree perfectly balanced
+
+
+
+
+
+
Binary Trees are Bad for IO
+
Every step of binary search is a random access
+
Every tree node access is a random access
+
+
+
+
Random access IO is bad.
+
+
+
+
Idea: Load a bunch of binary tree nodes together.
+
+
+
+
Binary Tree: $1$ separator & $2$ pointers
+
$log_2(N)$ Deep
+
$K$-ary Tree: $(K-1)$ separators & $K$ pointers
+
$log_K(N)$ Deep
+
+
+
+
Important: You still need to do binary search on each node of a $K$-ary tree, but now you're doing random access on memory (or cache) instead of disk (or memory)
+
+
+
+
ISAM Trees
+
+
+
+
+
+
+
How do you handle updates?
+
B+Tree = ISAM + Updates
+
+
+
+
Challenges
+
+
+
Finding space for new records
+
Keeping the tree balanced as new records are added
+
+
+
+
+
Idea 1: Reserve space for new records
+
+
+
+
+
+
+
+
Just maintaining open space won't work forever...
+
+
+
+
Rules of B+Trees
+
+
+
Keep space open for insertions in inner/data nodes.
+
‘Split’ nodes when they’re full
+
+
Avoid under-using space
+
‘Merge’ nodes when they’re under-filled
+
+
+
Maintain Invariant: All Nodes ≥ 50% Full
+
(Exception: The Root)
+
+
+
+
+
+
+
+
+
+
+
+
+
Deletions reverse this process (at 50% fill).
+
+
+
+
Next Class: Hash- and CDF-Based Indexes
+
+
+
+
+
+
+
+
+
+
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-BTree-Reserved.svg b/slides/cse4562sp2018/graphics/2018-02-19-BTree-Reserved.svg
new file mode 100644
index 00000000..6ddc4ea6
--- /dev/null
+++ b/slides/cse4562sp2018/graphics/2018-02-19-BTree-Reserved.svg
@@ -0,0 +1,827 @@
+
+
+
+
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-ISAM.png b/slides/cse4562sp2018/graphics/2018-02-19-ISAM.png
new file mode 100644
index 00000000..cfa043b4
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-ISAM.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-Index-Types.svg b/slides/cse4562sp2018/graphics/2018-02-19-Index-Types.svg
new file mode 100644
index 00000000..b4561d2b
--- /dev/null
+++ b/slides/cse4562sp2018/graphics/2018-02-19-Index-Types.svg
@@ -0,0 +1,572 @@
+
+
+
+
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-1.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-1.png
new file mode 100644
index 00000000..0d3e235e
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-1.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-2.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-2.png
new file mode 100644
index 00000000..45bf4677
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-2.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-3.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-3.png
new file mode 100644
index 00000000..9869a288
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-3.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-4.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-4.png
new file mode 100644
index 00000000..14be8e31
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-4.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-5.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-5.png
new file mode 100644
index 00000000..c8f6a914
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-5.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-6.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-6.png
new file mode 100644
index 00000000..4736c5ec
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-6.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-7.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-7.png
new file mode 100644
index 00000000..a7768154
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-7.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-8.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-8.png
new file mode 100644
index 00000000..3365ad3d
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-8.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-PrimaryVsSecondary.png b/slides/cse4562sp2018/graphics/2018-02-19-PrimaryVsSecondary.png
new file mode 100644
index 00000000..609b5e14
Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-PrimaryVsSecondary.png differ
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-Tree-BinSearch.svg b/slides/cse4562sp2018/graphics/2018-02-19-Tree-BinSearch.svg
new file mode 100644
index 00000000..0efe642b
--- /dev/null
+++ b/slides/cse4562sp2018/graphics/2018-02-19-Tree-BinSearch.svg
@@ -0,0 +1,346 @@
+
+
+
+
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-Tree-Motivation.svg b/slides/cse4562sp2018/graphics/2018-02-19-Tree-Motivation.svg
new file mode 100644
index 00000000..2e8a6720
--- /dev/null
+++ b/slides/cse4562sp2018/graphics/2018-02-19-Tree-Motivation.svg
@@ -0,0 +1,767 @@
+
+
+
+
diff --git a/slides/cse4562sp2018/graphics/2018-02-19-Tree-Unbalanced.svg b/slides/cse4562sp2018/graphics/2018-02-19-Tree-Unbalanced.svg
new file mode 100644
index 00000000..8c3cfd81
--- /dev/null
+++ b/slides/cse4562sp2018/graphics/2018-02-19-Tree-Unbalanced.svg
@@ -0,0 +1,745 @@
+
+
+
+