diff --git a/slides/cse4562sp2018/2018-02-19-Indexing1.html b/slides/cse4562sp2018/2018-02-19-Indexing1.html new file mode 100644 index 00000000..4adf99fb --- /dev/null +++ b/slides/cse4562sp2018/2018-02-19-Indexing1.html @@ -0,0 +1,355 @@ + + + + + + + CSE 4/562 - Spring 2018 + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ + CSE 4/562 - Database Systems +
+ +
+ +
+

Indexes

+

CSE 4/562 – Database Systems

+
February 19, 2018
+
+ +
+
+ + + + + + + + + + + + + +
+ + + +
$150$50
Index
ToC
No Index
ToC Summary
+
+ +
+

Today's Focus

+ +

+ $\sigma_C(R)$ and $(\ldots \bowtie_C R)$ +

+

(Finding records in a table really fast)

+ + +
+ +
+

Indexing Strategies

+ +
+
+
Rearrange the data.
+
Put things in a predictable location or a specific order.
+
("clustering" the data)
+
+ +
+
Wrap the data.
+
Record where specific data values live
+
("indexing" the data).
+
+
+
+ +
+

Data Organization

+ +
+
+
Unordered Heap
+
No organization at all. $O(N)$ reads.
+
+ +
+
(Secondary) Index
+
Index structure over unorganized data. $O(\ll N)$ random reads for some queries.
+
+ +
+
Clustered (Primary) Index
+
Index structure over clustered data. $O(\ll N)$ sequential reads for some queries.
+
+
+
+ +
+

Data Organization

+ +
+ +
+

Data Organization

+ +
+ +
+

Index Types

+
+
Tree-Based
+
A hierarchy of decisions lead to data at the leaves.
+ +
+
Hash-Based
+
A hash function puts data in predictable locations.
+ +
CDF-Based (new)
+
A more complex function predicts where data lives.
+
+
+
+
+ +
+
+

Tree-Based Indexes

+ + +
+ +
+

Tree-Based Indexes

+ + +
+ +
+

Challenges

+ +
+
Balance
+
Bad question orders lead to poor performance!
+ +
IO
+
Each access to a binary tree node is a random access.
+ +
Which Dimension
+
Why limit ourselves to asking about one dimension?
+
+
+ +
+ +
+ +
+

Worst-Case Tree?

+
$O(N)$ with the tree laid out left/right-deep
+

Best-Case Tree?

+
$O(\log N)$ with the tree perfectly balanced
+
+
+ +
+
+

Binary Trees are Bad for IO

+

Every step of binary search is a random access

+

Every tree node access is a random access

+
+ +
+

Random access IO is bad.

+
+ +
+

Idea: Load a bunch of binary tree nodes together.

+
+ +
+

Binary Tree: $1$ separator & $2$ pointers

+

$log_2(N)$ Deep

+

$K$-ary Tree: $(K-1)$ separators & $K$ pointers

+

$log_K(N)$ Deep

+
+ +
+

Important: You still need to do binary search on each node of a $K$-ary tree, but now you're doing random access on memory (or cache) instead of disk (or memory)

+
+ +
+

ISAM Trees

+ +
+
+ +
+
+

How do you handle updates?

+

B+Tree = ISAM + Updates

+
+ +
+

Challenges

+ +
    +
  • Finding space for new records
  • +
  • Keeping the tree balanced as new records are added
  • +
+
+ +
+

Idea 1: Reserve space for new records

+
+ +
+ +
+ +
+

Just maintaining open space won't work forever...

+
+ +
+

Rules of B+Trees

+ +
+
Keep space open for insertions in inner/data nodes.
+
‘Split’ nodes when they’re full
+ +
Avoid under-using space
+
‘Merge’ nodes when they’re under-filled
+
+ +

Maintain Invariant: All Nodes ≥ 50% Full

+

(Exception: The Root)

+
+
+ +
+
+
+
+
+
+
+
+
+

Deletions reverse this process (at 50% fill).

+
+ +
+

Next Class: Hash- and CDF-Based Indexes

+
+ +
+ + + + + + + diff --git a/slides/cse4562sp2018/graphics/2018-02-19-BTree-Reserved.svg b/slides/cse4562sp2018/graphics/2018-02-19-BTree-Reserved.svg new file mode 100644 index 00000000..6ddc4ea6 --- /dev/null +++ b/slides/cse4562sp2018/graphics/2018-02-19-BTree-Reserved.svg @@ -0,0 +1,827 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1,3 + + + + + 5,7 + + + + + + + + + + + + + + + + + + + + + + + + + 15 + + + 5 + 9 + + + + + + + 15 + + + + + + + 13 + + + + + 15 + 13 + + + + + 9 + + + ...12 + + 21 + 27 + + + + + 30 + 40 + 50 + + + + + + diff --git a/slides/cse4562sp2018/graphics/2018-02-19-ISAM.png b/slides/cse4562sp2018/graphics/2018-02-19-ISAM.png new file mode 100644 index 00000000..cfa043b4 Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-ISAM.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-Index-Types.svg b/slides/cse4562sp2018/graphics/2018-02-19-Index-Types.svg new file mode 100644 index 00000000..b4561d2b --- /dev/null +++ b/slides/cse4562sp2018/graphics/2018-02-19-Index-Types.svg @@ -0,0 +1,572 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Heap + Index + ClusteredIndex + Sorted + + diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-1.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-1.png new file mode 100644 index 00000000..0d3e235e Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-1.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-2.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-2.png new file mode 100644 index 00000000..45bf4677 Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-2.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-3.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-3.png new file mode 100644 index 00000000..9869a288 Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-3.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-4.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-4.png new file mode 100644 index 00000000..14be8e31 Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-4.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-5.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-5.png new file mode 100644 index 00000000..c8f6a914 Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-5.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-6.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-6.png new file mode 100644 index 00000000..4736c5ec Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-6.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-7.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-7.png new file mode 100644 index 00000000..a7768154 Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-7.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-8.png b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-8.png new file mode 100644 index 00000000..3365ad3d Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-InsertExample-8.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-PrimaryVsSecondary.png b/slides/cse4562sp2018/graphics/2018-02-19-PrimaryVsSecondary.png new file mode 100644 index 00000000..609b5e14 Binary files /dev/null and b/slides/cse4562sp2018/graphics/2018-02-19-PrimaryVsSecondary.png differ diff --git a/slides/cse4562sp2018/graphics/2018-02-19-Tree-BinSearch.svg b/slides/cse4562sp2018/graphics/2018-02-19-Tree-BinSearch.svg new file mode 100644 index 00000000..0efe642b --- /dev/null +++ b/slides/cse4562sp2018/graphics/2018-02-19-Tree-BinSearch.svg @@ -0,0 +1,346 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + 2 + 1 + 3 + 4 + 5 + 6 + 7 + 8 + + + All things + + + + + ≤4 + + + + + ≤2 + + + diff --git a/slides/cse4562sp2018/graphics/2018-02-19-Tree-Motivation.svg b/slides/cse4562sp2018/graphics/2018-02-19-Tree-Motivation.svg new file mode 100644 index 00000000..2e8a6720 --- /dev/null +++ b/slides/cse4562sp2018/graphics/2018-02-19-Tree-Motivation.svg @@ -0,0 +1,767 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + ≤4? + + ≤2? + 2 + + + + ≤1? + y + y + n + + + + + + + + + + + + + + + + + ≤6? + ≤3? + ≤5? + ≤7? + y + y + y + y + y + n + n + n + n + n + n + + 1 + 3 + 4 + 5 + 6 + 7 + 8 + + diff --git a/slides/cse4562sp2018/graphics/2018-02-19-Tree-Unbalanced.svg b/slides/cse4562sp2018/graphics/2018-02-19-Tree-Unbalanced.svg new file mode 100644 index 00000000..8c3cfd81 --- /dev/null +++ b/slides/cse4562sp2018/graphics/2018-02-19-Tree-Unbalanced.svg @@ -0,0 +1,745 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + ≤1? + + 2 + y + + 1 + 3 + 4 + 5 + 6 + 7 + 8 + + + + n + + ≤2? + y + + + + + + n + ≤3? + y + + + + + + + + + + + ≤6? + ≤7? + y + n + n + + + n + ≤4? + + ≤5? + + n + n + + y + y + y + + +