Saurav Singhi | Darshana Balakrishnan | Hank Lin | Ankur Upadhyay | Lukasz Ziarek |
(PhD In Progress) | (MS In Progress) | (BS 2017) | (MS 2014) | (Prof @ UB) |
(for organizing your data)
Insert $\lt key, value\gt$
Query for $key \in [low, high)$
Binary Tree, Linked List, Sorted Array
Even in a single session, there may be more than one "optimal" data structure.
What would it take to enable incremental transitions from one set of tradeoffs to another one?
Logical Content
↑
Physical Structure
A Bag of $\lt Key \rightarrow Value \gt$ Pairs
↑
One Physical Realization of the Bag
\begin{align} \mathbb P :=\; &|\;Sng(\mathbb R) \\ &|\uplus(\mathbb P, \mathbb P) \\ &|\;BT_{\mathbb K}(\mathbb P, \mathbb P) \\ &|\;Array_N(\mathbb R \ldots \mathbb R) \\ &|\;Sorted_N(\mathbb R \ldots \mathbb R) \end{align}
Visual: | |
UIL: | $Sng(x: \mathbb R)$ |
Logical: | $\{ x \}$ |
Visual: | |
UIL: | $\uplus(a: \mathbb P, b: \mathbb P)$ |
Logical: | $a \uplus b$ |
Visual: | |
UIL: | \begin{align}LL :=\;&|\;U(Sng(x: \mathbb R), a: LL)\\&|\;Sng(x)\end{align} |
Logical: | $\{ x \} \uplus a$ or $\{ x \} |
Existing data structures can be expressed as syntactic restrictions on this grammar.
Visual: | |
UIL: | $BT_{k: \mathbb K}(a: \mathbb P, b: \mathbb P)$ |
Logical: | $a \uplus b$ |
Constraint: | $\forall r \in a: r.key \lt K$ $\forall r \in b: r.key \geq K$ |
Nodes can define syntactic constraints on the contents of descendents.
Visual: | |
UIL: | $Array_{N : \mathbb N}(x_1: \mathbb R, \ldots, x_N: \mathbb R)$ |
Logical: | $\{ x_1, \ldots, x_N \}$ |
Can repeat structures for efficiency (e.g., B+Tree vs BinTree)
Visual: | |
UIL: | $Sorted_{N : \mathbb N}(x_1: \mathbb R, \ldots, x_N: \mathbb R)$ |
Logical: | $\{ x_1, \ldots, x_N \}$ |
Constraint: | $\forall i \lt j: x_i.key \leq x_j.key$ |
$\uplus(Sng(1), $ $\uplus(Array_2(2,4,7), $ $BT_6($ $Sorted_2(3, 5)$ $, Sng(6))$ $)$ $)$
Return tuples in $[\ell,h)$
Do the least work possible (optimize later)
Core Idea: Physical layout as a compiler optimization problem.
A pattern/replacement pair.
A trigger for applying a rewrite.
A set of Rewrite/Event pairs.
package jitd;
import java.util.*;
public class TransitionMode extends Mode {
int stepsTotal;
int stepsTaken = 0;
Random rand = new Random();
Mode source, target;
public TransitionMode(Mode source, Mode target, int steps)
{
this.stepsTotal = steps;
this.source = source;
this.target = target;
}
public Mode pick()
{
stepsTaken++;
if(rand.nextInt(stepsTotal) < stepsTaken){
return target;
} else {
return source;
}
}
public KeyValueIterator scan(Driver driver, long low, long high)
{
return pick().scan(driver, low, high);
}
public void insert(Driver driver, Cog values)
{
pick().insert(driver, values);
}
public void idle(Driver driver)
{
pick().idle(driver);
}
}
(40 lines of java)
Universal data structures allow us to
hybridize policies "for free".
Core Idea: Physical layout as a just-in-time compiler optimization problem.
A background thread incrementally optimizes the data structure.
Two simple transforms: Crack or Sort
where $j \in [1, N]$, $Y = \{x_i | x_i.key \lt x_j\}$, $Z = \{x_i | x_i.key \geq x_j\}$
where $f : [N] \rightarrow [N]$ and $x_{f(i)} \leq x_{f(i+1)}$
Deqeue: 1x Array
Enqueue: 2x Array
Deqeue: 1x Array
Enqueue: 1x Sorted Array
Option 1: Crack($Array_8(1 \ldots 8)$)
Option 2: Sort($Array_8(1 \ldots 8)$)
Option 1: Crack($Array_4(1 \ldots 4)$)
Option 2: Sort($Array_4(1 \ldots 4)$)
Option 3: Crack($Array_4(5 \ldots 8)$)
Option 4: Sort($Array_4(5 \ldots 8)$)
Option 1: Crack($Array_4(1 \ldots 4)$)
Option 2: Sort($Array_4(1 \ldots 4)$)
How to prioritize rewrites?
Array_N: $O(N)$
Sorted_N: $O(N\cdot \log(N))$
BT: Negligible
Compute expected utility of a static state.
Short-term value vs long-term performance.
Questions?