Just-in-Time Data Structures

Oliver Kennedy


Saurav Singhi Darshana Balakrishnan
Hank Lin Ankur Upadhyay Lukasz Ziarek
(PhD In Progress) (MS In Progress) (BS 2017) (MS 2014)

What is best in life?

(for organizing your data)

    Insert $\lt key, value\gt$

    Query for $key \in [low, high)$

Available Structures

Binary Tree, Linked List, Sorted Array

Other Tradeoffs

  • Support for Threads
  • Lookup vs Full Scan vs Range Scan
  • Optimal Update Size
[Best] [Best] [Best] [Best?] [Best?]

Interactive Analytics

  1. User Opens CSV File
  2. User Poses Query as File Loads
  3. Lots More Queries
  4. User Adds More Data

Even in a single session, there may be more than one "optimal" data structure.

State of the Art

  • Find a jack-of-all-trades data structure
  • Trash and re-build structures for different workloads
  • Use a bespoke data structure

What would it take to enable incremental transitions from one set of tradeoffs to another one?

Incremental Transitions

  1. What does it mean for a data structure to be halfway between a Binary Tree and a Linked List?
  2. How would we access and manipulate such a data structure?
  3. When and how should a data structure transition?
  4. How do we automatically generate bespoke data-structures?

Incremental Transitions

  1. A Universal Instance Language
  2. Realizing Universal Data Structures
  3. Just-In-Time Data Structure Optimization
  4. Optimization Policy Discovery

Logical Content

Physical Structure

A Bag of $\lt Key \rightarrow Value \gt$ Pairs

One Physical Realization of the Bag

Core Idea: A grammar of physical realizations.


  • A Key ($\mathbb K$)
  • A Record ($\mathbb R$)
    Logically a single record
  • A Pointer ($\mathbb P$)
    Logically a bag of records


\begin{align} \mathbb P :=\; &|\;Sng(\mathbb R) \\ &|\uplus(\mathbb P, \mathbb P) \\ &|\;BT_{\mathbb K}(\mathbb P, \mathbb P) \\ &|\;Array_N(\mathbb R \ldots \mathbb R) \\ &|\;Sorted_N(\mathbb R \ldots \mathbb R) \end{align}


UIL:$Sng(x: \mathbb R)$
Logical:$\{ x \}$

Union Node

UIL:$\uplus(a: \mathbb P, b: \mathbb P)$
Logical:$a \uplus b$

Combining Primitives: Linked List

UIL:\begin{align}LL :=\;&|\;U(Sng(x: \mathbb R), a: LL)\\&|\;Sng(x)\end{align}
Logical:$\{ x \} \uplus a$ or $\{ x \}

Existing data structures can be expressed as syntactic restrictions on this grammar.

Extension 1: Semantic Constraints

UIL:$BT_{k: \mathbb K}(a: \mathbb P, b: \mathbb P)$
Logical:$a \uplus b$
Constraint:$\forall r \in a: r.key \lt K$
$\forall r \in b: r.key \geq K$

Nodes can define syntactic constraints on the contents of descendents.

Combining Primitives: Binary Tree

\begin{align} BinTree :=\;&|\;BT_{k: \mathbb K}(a: BinTree, b: BinTree)\\&|\;Sng(x: \mathbb R) \end{align}

Extension 2: Repetition

UIL:$Array_{N : \mathbb N}(x_1: \mathbb R, \ldots, x_N: \mathbb R)$
Logical:$\{ x_1, \ldots, x_N \}$

Can repeat structures for efficiency (e.g., B+Tree vs BinTree)

Combining Extensions

UIL:$Sorted_{N : \mathbb N}(x_1: \mathbb R, \ldots, x_N: \mathbb R)$
Logical:$\{ x_1, \ldots, x_N \}$
Constraint:$\forall i \lt j: x_i.key \leq x_j.key$


$\uplus(Sng(1), $ $\uplus(Array_2(2,4,7), $ $BT_6($ $Sorted_2(3, 5)$ $, Sng(6))$ $)$ $)$

Incremental Transitions

  1. A Universal Instance Language
  2. Realizing Universal Data Structures
  3. Just-In-Time Data Structure Optimization
  4. Optimization Policy Discovery

Universal Data Structures

  • Physiological Morphisms
    • Queries
    • Updates
  • Purely Physical Morphisms
    • Optimization

Example: Range Queries

$Q_{\ell,h} : \mathbb P \mapsto \mathbb P$

Return tuples in $[\ell,h)$

\begin{align} Q_{\ell,h}(\uplus(a, b)) \rightarrow &\;\uplus(Q_{\ell,h}(a), Q_{\ell,h}(b))\\[10px] Q_{\ell,h}(BT_k(a, b)) \rightarrow &\; \begin{cases} Q_{\ell,h}(a) & \text{if } h \lt k\\ Q_{\ell,h}(b) & \text{if } \ell \geq k\\ BT_k(Q_{\ell,h}(a), Q_{\ell,h}(b)) & \text{otherwise} \end{cases}\\[10px] Q_{\ell,h}(Array_N(x_1,\ldots,x_N)) \rightarrow &\; Array_{|Y|}(y_1, \ldots, y_{|Y|}) \\&\;\;\text{ s.t. } Y = \{\;x_i\;|\;\ell \leq x_i \lt h\;\}\\[10px] Q_{\ell,h}(Sorted_N(x_1,\ldots,x_N)) \rightarrow &\; Sorted_{j-i+1}(x_i, \ldots, x_j) \\&\;\;\text{ s.t. } i = argmin_i(x_i \geq \ell); \\&\;\;\;\;\;\;\;\;j = argmax_(x_j \lt h); \end{align}


$$Insert: \mathbb P \times \mathbb P \rightarrow \mathbb P$$

Do the least work possible (optimize later)

$$Insert(a, b) \rightarrow \uplus(a, b)$$

Incremental Transitions

  1. A Universal Instance Language
  2. Realizing Universal Data Structures
  3. Just-In-Time Data Structure Optimization
  4. Optimization Policy Discovery

Core Idea: Physical layout as a compiler optimization problem.

Example: Organize A Hybrid Data Structure

$$\uplus(Sng(x), BT_k(a, b)) \rightarrow \begin{cases} BT_k(\uplus(Sng(x), a), b) & \text{if } x.key \lt k\\ BT_k(a, \uplus(Sng(x), b)) & \text{if } x.key \geq k\end{cases}$$
$$\uplus(Sng(x), Sorted_N(y_1, \ldots, y_N) \rightarrow Sorted_N(y_1, \ldots, y_i, x, y_{i+1}, \ldots y_N)$$ $$\text{ where }y_i.key \leq x.key \leq y_{i+1}.key$$


A pattern/replacement pair.

  • Crack-Array
  • Sort-Array
  • Sort-Merge
  • Pushdown-Array
  • Pushdown-BT
  • Pushdown-Sorted
  • ...


A trigger for applying a rewrite.

  • Before-Scan
  • After-Scan
  • Before-Visit
  • After-Visit
  • Before-Insert
  • After-Insert
  • Idle-Tick

Policies (Take 1)

A set of Rewrite/Event pairs.

  • Cracker (Implements [Idreos et.al.-CIDR 2007])
  • Adaptive Merge (Implements [Graefe/Kano-EDBT 2010])
  • Swap (Heuristic Hybrid: Switch after 2000 events)
  • Transition (Heuristic Hybrid: Gradient from 1-3k events)
[Kennedy/Ziarek-CIDR 2015]; https://github.com/UBOdin/jitd

The Entire Transition Policy

package jitd;

import java.util.*;

public class TransitionMode extends Mode {
  int stepsTotal;
  int stepsTaken = 0;
  Random rand = new Random();
  Mode source, target;
  public TransitionMode(Mode source, Mode target, int steps)
    this.stepsTotal = steps;
    this.source = source;
    this.target = target;
  public Mode pick()
    if(rand.nextInt(stepsTotal) < stepsTaken){
      return target;
    } else {
      return source;

  public KeyValueIterator scan(Driver driver, long low, long high)
    return pick().scan(driver, low, high);
  public void insert(Driver driver, Cog values)
    pick().insert(driver, values);
  public void idle(Driver driver)

(40 lines of java)

Cracker Policy

Adaptive Merge Policy

(first read: 33s)

Swap Policy

Transition Policy

Universal data structures allow us to
hybridize policies "for free".

Policies (Take 2)

Core Idea: Physical layout as a just-in-time compiler optimization problem.

Just-in-Time Data Structures

A background thread incrementally optimizes the data structure.


  1. Which rewrite to apply?
  2. Which data to rewrite?

Two simple transforms: Crack or Sort


\begin{align} Array_N(x_1, \ldots, x_N) \rightarrow BT_{x_j.key}(\;\;&Array_{|Y|}(y_1, \ldots, y_{|Y|}), \\&Array_{|Z|}(z_1, \ldots, z_{|Z|})\;\;) \end{align}

where $j \in [1, N]$, $Y = \{x_i | x_i.key \lt x_j\}$, $Z = \{x_i | x_i.key \geq x_j\}$


$$Array_N(x_1, \ldots, x_N) \rightarrow Sorted_N(x_{f(1)}, \ldots, x_{f(N)})$$

where $f : [N] \rightarrow [N]$ and $x_{f(i)} \leq x_{f(i+1)}$


Deqeue: 1x Array

Enqueue: 2x Array


Deqeue: 1x Array

Enqueue: 1x Sorted Array

Option 1: Crack($Array_8(1 \ldots 8)$)

Option 2: Sort($Array_8(1 \ldots 8)$)

Option 1: Crack($Array_4(1 \ldots 4)$)

Option 2: Sort($Array_4(1 \ldots 4)$)

Option 3: Crack($Array_4(5 \ldots 8)$)

Option 4: Sort($Array_4(5 \ldots 8)$)

Option 1: Crack($Array_4(1 \ldots 4)$)

Option 2: Sort($Array_4(1 \ldots 4)$)

Incremental Transitions

  1. A Universal Instance Language
  2. Realizing Universal Data Structures
  3. Just-In-Time Data Structure Optimization
  4. Optimization Policy Discovery

How to prioritize rewrites?

Cost Model

Array_N: $O(N)$

Sorted_N: $O(N\cdot \log(N))$

BT: Negligible

Compute expected utility of a static state.


  1. Throughput
  2. (Negative) Latency
  3. Time spent with latency below 300ms

Heuristic: Sort Below Threshold Size

Short-term value vs long-term performance.

Deriving Policies

  1. Start with a heuristic and optimzie parameters.
    • e.g., Pick a threshold to sort at.
  2. Model the expected cumulative utility of each candidate rewrite
    • e.g., Priority queue of Array nodes remaining.

Just-in-Time Data Structures

  • The Universal Instance Language can describe the intermediate state of a data structure in transition.
  • Localized, event-driven rewrites can emulate the behaviors of existing data structures and be hybridized.
  • Simulation + Cost-Analysis can be used to derive policies to drive direct rewrites.
