CSE 662 - Database Languages & Runtimes

Adaptive Indexing

CSE 662 - September 15

You've got an initially unsorted collection of records indexed by a key. You need to run range queries over it:

Can we do better?

How else can we re-use query work?

Pick $k$ so that $\frac{N}{k}$ is a constant, and your startup work is linear!

A little more upfront work, faster responses

Much faster congergence!

Why do we get fast convergence?

Start with partitioned data
Extract goal data from each partition independently
- Crack each partition independently
- Sort each partition independently then merge
- Radix "sort" each partition then crack
Merge cracked regions
Postprocess merged data
- "Crack" the merged data if needed
- Sort the merged data
- Radix partition the merged data