CSE 4/562 - Database Systems

Cost Based Optimization

CSE 4/562 – Database Systems

February 28, 2018

General Query Optimizers

  1. Apply blind heuristics (e.g., push down selections)
  2. Enumerate all possible execution plans by varying (or for a reasonable subset)
    • Join/Union Evaluation Order (commutativity, associativity, distributivity)
    • Algorithms for Joins, Aggregates, Sort, Distinct, and others
    • Data Access Paths
  3. Estimate the cost of each execution plan
  4. Pick the execution plan with the lowest cost

Idea 1: Run each plan

© Paramount Pictures

If we can't get the exact cost of a plan, what can we do?

Idea 2: Run each plan on a small sample of the data.

Idea 3: Analytically estimate the cost of a plan.

Plan Cost

CPU Time
How much time is spent processing.
# of IOs
How many random reads + writes go to disk.
Memory Required
How much memory do you need.
Randal Munroe (cc-by-nc)

Remember the Real Goals

  1. Accurately rank the plans.
  2. Don't spend more time optimizing than you get back.
  3. Don't pick a plan that uses more memory than you have.

Accounting

Figure out the cost of each individual operator.

Only count the number of IOs added by each operator.

OperationRAIOs Added (#pages)Memory (#tuples)
Table Scan $R$ $\frac{|R|}{\mathcal P}$ $O(1)$
Projection $\pi(R)$ $0$ $O(1)$
Selection $\sigma(R)$ $0$ $O(1)$
Union $R \cup S$ $0$ $O(1)$
Sort $\tau(R)$ $0$ $O(|R|)$
$2 \cdot \lfloor log_{\mathcal B}(|R|) \rfloor$ $O(\mathcal B)$
Index Scan $\sigma_c(R)$ $\log_{\mathcal I}(|R|) + \frac{|\sigma_c(R)|}{\mathcal P}$ $O(1)$
$1$ $O(1)$
  1. Tuples per Page ($\mathcal P$) – Normally defined per-schema
  2. Size of $R$ ($|R|$)
  3. Pages of Buffer ($\mathcal B$)
  4. Keys per Index Page ($\mathcal I$)
OperationRAIOs Added (#pages)Memory (#tuples)
Nested Loop Join $R \times S$ $0$ $O(|S|)$
$\frac{|S|}{\mathcal P}$ $O(1)$
1-Pass Hash Join $R \bowtie S$ $0$ $O(|S|)$
2-Pass Hash Join $R \bowtie S$ $2|R| + 2|S|$ $O(1)$
Sort-Merge Join $R \bowtie S$ $0$ + Sort $O(1)$ + Sort
Index Nested Loop $R \bowtie_c S$ $|R| \cdot (\log_{\mathcal I}(|S|) + \frac{|\sigma_c(S)|}{\mathcal P})$ $O(1)$
$|R| \cdot 1$ $O(1)$
Aggregate $\gamma_A(R)$ $0$ $adom(A)$
$0$ + Sort $O(1)$ + Sort
  1. Tuples per Page ($\mathcal P$) – Normally defined per-schema
  2. Size of $R$ ($|R|$)
  3. Pages of Buffer ($\mathcal B$)
  4. Keys per Index Page ($\mathcal I$)
  5. Number of distinct values of $A$ ($adom(A)$)

Next Class: How to estimate $|R|$