Adaptive Indexing

Adaptive Merge & Generalized Cracking

CSE 662 - September 15

To Index or Not

You've got an initially unsorted collection of records indexed by a key. You need to run range queries over it:

Scan TypePrep CostQuery Cost
Linear ScanNoneExpensive
Index Scan"Very" ExpensiveCheap
Cracker ScanNoneDecreasing

Can we do better?

Cracker Index

  • Partition on each range bound like quicksort
  • A full range query partitions twice
  • Save boundaries for future queries

Adaptive Indexing

How else can we re-use query work?

Adaptive Merge Trees

  • Observation 1: Sorting a list is $O(N \cdot \log(N))$
  • Observation 2: Sorting $k$ partitions is $O(N \cdot \log(\frac{N}{k}))$

Pick $k$ so that $\frac{N}{k}$ is a constant, and your startup work is linear!

Adaptive Merge Trees

A little more upfront work, faster responses

Adaptive Merge Trees

Much faster congergence!

In Practice...

  • A factor of $O(\log(N))$ improvement is not huge
  • "Merging" requires a huge amount of bookkeeping
  • but... that fast convergence is nice!
Why do we get fast convergence?

Mixing Cracking and Merging

  1. Start with partitioned data
  2. Extract goal data from each partition independently
    • Crack each partition independently
    • Sort each partition independently then merge
    • Radix "sort" each partition then crack
  3. Merge cracked regions
  4. Postprocess merged data
    • "Crack" the merged data if needed
    • Sort the merged data
    • Radix partition the merged data

Crack + Crack

Crack+Crack vs Radix+Radix

Hybrid Algorithms