slides

2021-03-07 22:39:50 -05:00 · 2021-03-07 22:39:50 -05:00 · 7f013655b4
parent 864eb55909
commit 7f013655b4
7 changed files with 1466 additions and 5 deletions
--- a/src/teaching/cse-562/2021sp/index.erb
+++ b/src/teaching/cse-562/2021sp/index.erb
@ -51,12 +51,16 @@ schedule:
    materials: 
      slides: slide/2021-03-04-Indexing2.html
  - date: "Mar. 9"
-    topic: "Spark's Optimizer + Checkpoint 2"
-    due: "Checkpoint 1"
-  - date: "Mar. 11"
    topic: "Cost-Based Optimization"
-  - date: "Mar. 16"
+    due: "Checkpoint 1"
+    materials: 
+      slides: slide/2021-03-09-CostOpt1.html
+  - date: "Mar. 11"
    topic: "Cost-Based Optimization (contd.)"
+    materials: 
+      slides: slide/2021-03-11-CostOpt2.html
+  - date: "Mar. 16"
+    topic: "Spark's Optimizer + Checkpoint 2"
  - date: "Mar. 18"
    topic: "Distributed Queries: Challenges + Partitioning"
  - date: "Mar. 23"
--- a/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb
+++ b/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb
@ -17,6 +17,23 @@ textbook: "Ch. 15.1-15.5, 16.7"
  More similar examples with Union and Cross would also help.

  Might help to tighten up the time spent a little too.  I had to cut out before introducing Sort-Merge Joins
+
+
+-------
+  2021 by OK:
+
+  Applied changes above.  Things went better.
+
+  Looking at costs in terms of the "overhead" of each operator is proving to be *really*
+  hard for the students to grasp.  I suspect it might be easier for the students to grasp
+  a recursive definition.  
+
+  e.g., cost(\pi(R)) = cost(R)
+
+  This would, among other things, make the (B)NLJ cost a lot easier to specify.
+
+  I made these changes already to 03-09-CostOpt1, so they should probably be backported here next time I teach the class.
+
 -->

 <section>
--- a/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html
+++ b/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html
@ -1,5 +1,5 @@
 ---
-template: templates/cse4562_2019_slides.erb
+template: templates/cse4562_2021_slides.erb
 title: "Indexing (Part 2) and Views"
 date: March 4, 2021
 textbook: "Papers and Ch. 8.1-8.2"
--- a/src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb
+++ b/src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb
@ -0,0 +1,555 @@
+---
+template: templates/cse4562_2021_slides.erb
+title: "Cost-Based Optimization"
+date: March 9, 2021
+textbook: Ch. 16
+---
+
+<!-- 2019 by OK
+  This went pretty well.  If anything, it might be nice to adjust the \select \distinct propagation formula (uniform prior section) lists into animated tables for consistency with the rest of the presentation.
+-->
+
+<section>
+  <section>
+    <h3>General Query Optimizers</h3>
+    <ol style="font-size: 60%">
+      <li>Apply blind heuristics (e.g., push down selections)</li>
+      <li>Enumerate all possible <i>execution plans</i> by varying (or for a reasonable subset)
+        <ul>
+          <li>Join/Union Evaluation Order (commutativity, associativity, distributivity)</li>
+          <li>Algorithms for Joins, Aggregates, Sort, Distinct, and others</li>
+          <li>Data Access Paths</li>
+        </ul>
+      </li>
+      <li class="fragment highlight-blue">Estimate the cost of each execution plan</li>
+      <li>Pick the execution plan with the lowest cost</li>
+    </ol>
+  </section>
+</section>
+
+<section>
+  <section>
+    <p><b>Idea 1: </b> Run each plan</p>
+  </section>
+
+  <section>
+    <img src="graphics/Clipart/facepalm.jpg" class="stretch" />
+    <attribution>&copy; Paramount Pictures</attribution>
+  </section>
+
+  <section>
+    <p>If we can't get the exact cost of a plan, what can we do?</p>
+  </section>
+
+  <section>
+    <p class="fragment highlight-grey"><b>Idea 2: </b> Run each plan on a small sample of the data.</p>
+    <p style="margin-top: 50px;"><b>Idea 3: </b> Analytically estimate the cost of a plan.</p>
+  </section>
+
+  <section>
+    <h3>Plan Cost</h3>
+    <dl>
+      <div class="fragment" data-fragment-index="1"><div class="fragment highlight-grey" data-fragment-index="4">
+        <dt>CPU Time</dt>
+        <dd>How much time is spent processing.</dd>
+      </div></div>
+
+      <div class="fragment" data-fragment-index="2">
+        <dt># of IOs</dt>
+        <dd>How many random reads + writes go to disk.</dd>
+      </div>
+
+      <div class="fragment" data-fragment-index="3">
+        <dt>Memory Required</dt>
+        <dd>How much memory do you need.</dd>
+      </div>
+    </dl>
+  </section>
+
+  <section>
+    <img src="2021-03-09/EstimationXKCD.png">
+    <attribution>Randal Munroe (<a href="https://creativecommons.org/licenses/by-nc/2.5/">cc-by-nc</a>)</attribution>
+  </section>
+
+  <section>
+    <h3>Remember the Real Goals</h3>
+    <ol>
+      <li class="fragment">Accurately <b>rank</b> the plans.</li>
+      <li class="fragment">Don't spend more time optimizing than you get back.</li>
+      <li class="fragment">Don't pick a plan that uses more memory than you have.</li>
+    </ol>
+  </section>
+</section>
+
+<!-- ============================================ -->
+
+<section>
+  <section>
+    <h3>Accounting</h3>
+    <p class="fragment" data-fragment-index="1" style="margin-top: 50px;">Figure out the IO cost of the <b>entire</b><span class="fragment" data-fragment-index="2">*</span> subtree.</p>
+
+    <p class="fragment" style="margin-top: 50px;" data-fragment-index="3">Only count the amount of memory <b>added</b> by each operator.</p>
+
+
+    <p class="fragment" data-fragment-index="2" style="margin-top: 50px; font-size: 80%">* Different from earlier in the semester.</p>
+
+  </section>
+
+  <section>
+    <table style="font-size: 70%">
+      <tr><th>Operation</th><th>RA</th><th>Total IOs (#pages)</th><th>Memory (#tuples)</th></tr>
+      <tr class="fragment" data-fragment-index="0">
+        <td>Table Scan</td>
+        <td>$R$</td>
+        <td class="fragment" data-fragment-index="1">$\frac{|R|}{\mathcal P}$</td>
+        <td class="fragment" data-fragment-index="2">$O(1)$</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="3">
+        <td>Projection</td>
+        <td>$\pi(R)$</td>
+        <td class="fragment" data-fragment-index="4">$\textbf{io}(R)$</td>
+        <td class="fragment" data-fragment-index="4">$O(1)$</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="5">
+        <td>Selection</td>
+        <td>$\sigma(R)$</td>
+        <td>$\textbf{io}(R)$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="6">
+        <td>Union</td>
+        <td>$R \uplus S$</td>
+        <td>$\textbf{io}(R) + \textbf{io}(S)$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="7">
+        <td style="vertical-align: middle;">Sort <span class="fragment" data-fragment-index="8">(In-Mem)</span></td>
+        <td style="vertical-align: middle;">$\tau(R)$</td>
+        <td class="fragment" data-fragment-index="8">$0$</td>
+        <td class="fragment" data-fragment-index="9">$O(|R|)$</td>
+      </tr>
+      <tr>
+        <td class="fragment" data-fragment-index="10">Sort (On-Disk)</td>
+        <td class="fragment" data-fragment-index="10">$\tau(R)$</td>
+        <td class="fragment" data-fragment-index="11">$\frac{2 \cdot \lfloor log_{\mathcal B}(|R|) \rfloor}{\mathcal P} + \textbf{io}(R)$</td>
+        <td class="fragment" data-fragment-index="10">$O(\mathcal B)$</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="12">
+        <td><span class="fragment" data-fragment-index="13">(B+Tree)</span> Index Scan</td>
+        <td>$Index(R, c)$</td>
+        <td class="fragment" data-fragment-index="13">$\log_{\mathcal I}(|R|) + \frac{|\sigma_c(R)|}{\mathcal P}$</td>
+        <td class="fragment" data-fragment-index="14">$O(1)$</td>
+      </tr>
+      <tr>
+        <td span class="fragment" data-fragment-index="15">(Hash) Index Scan</td>
+        <td span class="fragment" data-fragment-index="15">$Index(R, c)$</td>
+        <td class="fragment" data-fragment-index="15">$1$</td>
+        <td class="fragment" data-fragment-index="16">$O(1)$</td>
+      </tr>
+    </table>
+
+    <ol style="font-size: 50%; margin-top: 50px;">
+      <li class="fragment" data-fragment-index="1">Tuples per Page ($\mathcal P$) <span>– Normally defined per-schema</span></li>
+      <li class="fragment" data-fragment-index="1">Size of $R$ ($|R|$)</li>
+      <li class="fragment" data-fragment-index="10">Pages of Buffer ($\mathcal B$)</li>
+      <li class="fragment" data-fragment-index="13">Keys per Index Page ($\mathcal I$)</li>
+    </ol>
+  </section>
+  <section>
+    <table style="font-size: 70%">
+      <tr><th width="300px">Operation</th><th>RA</th><th>Total IOs (#pages)</th><th style="font-size: 80%;">Mem (#tuples)</th></tr>
+      <tr class="fragment" data-fragment-index="1">
+        <td style="font-size: 60%">Nested Loop Join <span class="fragment" data-fragment-index="2">(Buffer $S$ in mem)</span></td>
+        <td>$R \times_{mem} S$</td>
+        <td class="fragment" data-fragment-index="2">$\textbf{io}(R)+\textbf{io}(S)$</td>
+        <td class="fragment" data-fragment-index="3">$O(|S|)$</td>
+      </tr>
+      <tr>
+        <td class="fragment" data-fragment-index="4" style="font-size: 60%">Block NLJ (Buffer $S$ on disk)</td>
+        <td class="fragment" data-fragment-index="4">$R \times_{disk} S$</td>
+        <td class="fragment" data-fragment-index="5">$\frac{|R|}{\mathcal B} \cdot \frac{|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$</td>
+        <td class="fragment" data-fragment-index="4">$O(1)$</td>
+      </tr>
+      <tr>
+        <td class="fragment" data-fragment-index="4" style="font-size: 60%">Block NLJ (Recompute $S$)</td>
+        <td class="fragment" data-fragment-index="4">$R \times_{redo} S$</td>
+        <td class="fragment" data-fragment-index="6">$\textbf{io}(R) + \frac{|R|}{\mathcal B} \cdot \textbf{io}(S)$</td>
+        <td class="fragment" data-fragment-index="4">$O(1)$</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="7">
+        <td>1-Pass Hash Join</td>
+        <td>$R \bowtie_{1PH, c} S$</td>
+        <td class="fragment" data-fragment-index="8">$\textbf{io}(R) + \textbf{io}(S)$</td>
+        <td class="fragment" data-fragment-index="8">$O(|S|)$</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="9">
+        <td>2-Pass Hash Join</td>
+        <td>$R \bowtie_{2PH, c} S$</td>
+        <td class="fragment" data-fragment-index="10">$\frac{2|R| + 2|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$</td>
+        <td class="fragment" data-fragment-index="10">$O(1)$</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="11">
+        <td>Sort-Merge Join </td>
+        <td>$R \bowtie_{SM, c} S$</td>
+        <td class="fragment" data-fragment-index="12">[Sort]</td>
+        <td class="fragment" data-fragment-index="12">[Sort]</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="13">
+        <td><span class="fragment" data-fragment-index="14">(Tree)</span> Index NLJ</td>
+        <td>$R \bowtie_{INL, c}$</td>
+        <td class="fragment" data-fragment-index="14">$|R| \cdot (\log_{\mathcal I}(|S|) + \frac{|\sigma_c(S)|}{\mathcal P})$</td>
+        <td class="fragment" data-fragment-index="15">$O(1)$</td>
+      </tr>
+      <tr>
+        <td class="fragment" data-fragment-index="16">(Hash) Index NLJ</td>
+        <td class="fragment" data-fragment-index="16">$R \bowtie_{INL, c}$</td>
+        <td class="fragment" data-fragment-index="16">$|R| \cdot 1$</td>
+        <td class="fragment" data-fragment-index="17">$O(1)$</td>
+      </tr>
+      <tr class="fragment" data-fragment-index="18">
+        <td><span class="fragment" data-fragment-index="19">(In-Mem)</span> Aggregate</td>
+        <td>$\gamma_A(R)$</td>
+        <td class="fragment" data-fragment-index="19">$\textbf{io}(R)$</td>
+        <td class="fragment" data-fragment-index="20">$adom(A)$</td>
+      </tr>
+      <tr>
+        <td class="fragment" data-fragment-index="21" style="font-size: 90%">(Sort/Merge) Aggregate</td>
+        <td class="fragment" data-fragment-index="21">$\gamma_A(R)$</td>
+        <td class="fragment" data-fragment-index="21">[Sort]</td>
+        <td class="fragment" data-fragment-index="21">[Sort]</td>
+      </tr>
+    </table>
+
+    <ol style="font-size: 50%;">
+      <li>Tuples per Page ($\mathcal P$) <span>– Normally defined per-schema</span></li>
+      <li>Size of $R$ ($|R|$)</li>
+      <li>Pages of Buffer ($\mathcal B$)</li>
+      <li>Keys per Index Page ($\mathcal I$)</li>
+      <li class="fragment" data-fragment-index="20">Number of distinct values of $A$ ($adom(A)$)</li>
+    </ol>
+  </section>
+
+  <section>
+    <table style="font-size: 70%">
+      <tr><th>Symbol</th><th>Parameter</th><th>Type</th></th></tr>
+      <tr>
+        <td>$\mathcal P$</td><td>Tuples Per Page</td>
+        <td class="fragment" data-fragment-index="1">Fixed ($\frac{|\text{page}|}{|\text{tuple}|}$)</td>
+      </tr>
+      <tr>
+        <td>$|R|$</td><td>Size of $R$</td>
+        <td class="fragment" data-fragment-index="2">Precomputed<span  class="fragment" data-fragment-index="6">$^*$</span> ($|R|$)</td>
+      </tr>
+      <tr>
+        <td>$\mathcal B$</td><td>Pages of Buffer</td>
+        <td class="fragment" data-fragment-index="3">Configurable Parameter</td>
+      </tr>
+      <tr>
+        <td>$\mathcal I$</td><td>Keys per Index Page</td>
+        <td class="fragment" data-fragment-index="4">Fixed ($\frac{|\text{page}|}{|\text{key+pointer}|}$)</td>
+      </tr>
+      <tr>
+        <td>$adom(A)$</td><td>Number of distinct values of $A$</td>
+        <td class="fragment" data-fragment-index="5">Precomputed<span class="fragment" data-fragment-index="6">$^*$</span> ($|\delta_A(R)|$)</td>
+      </tr>
+    </table>
+    <p class="fragment" data-fragment-index="6" style="font-size: 50%">* unless $R$ is a query</p>
+  </section>
+
+</section>
+<!-- ============================================ -->
+
+<section>
+  <section>
+    <p>Estimating IOs requires Estimating $|Q(R)|$, $|\delta_A(Q(R))|$</p>
+  </section>
+
+  <section>
+    <h3>Cardinality Estimation</h3>
+    <p class="fragment">Unlike estimating IOs, cardinality estimation doesn't care about the algorithm, so we'll just be working with raw RA.</p>
+
+    <p class="fragment">Also unlike estimating IOs, we care about the cardinality of $|Q(R)|$ as a whole, rather than the contribution of each individual operator.</p>
+  </section>
+
+  <section>
+    <table style="font-size: 70%">
+      <tr>
+        <th>Operator</th>
+        <th>RA</th>
+        <th>Estimated Size</th>
+      </tr>
+
+      <tr>
+        <td>Table</td>
+        <td>$R$</td>
+        <td class="fragment" data-fragment-index="1">$|R|$</td>
+      </tr>
+
+      <tr>
+        <td>Projection</td>
+        <td>$\pi(Q)$</td>
+        <td class="fragment" data-fragment-index="2">$|Q|$</td>
+      </tr>
+
+      <tr>
+        <td>Union</td>
+        <td>$Q_1 \uplus Q_2$</td>
+        <td class="fragment" data-fragment-index="3">$|Q_1| + |Q_2|$</td>
+      </tr>
+
+      <tr>
+        <td>Cross Product</td>
+        <td>$Q_1 \times Q_2$</td>
+        <td class="fragment" data-fragment-index="4">$|Q_1| \times |Q_2|$</td>
+      </tr>
+
+      <tr>
+        <td>Sort</td>
+        <td>$\tau(Q)$</td>
+        <td class="fragment" data-fragment-index="5">$|Q|$</td>
+      </tr>
+
+      <tr>
+        <td>Limit</td>
+        <td>$\texttt{LIMIT}_N(Q)$</td>
+        <td class="fragment" data-fragment-index="6">$N$</td>
+      </tr>
+
+      <tr>
+        <td>Selection</td>
+        <td>$\sigma_c(Q)$</td>
+        <td class="fragment" data-fragment-index="8">$|Q| \times \texttt{SEL}(c, Q)$</td>
+      </tr>
+
+      <tr>
+        <td>Join</td>
+        <td>$Q_1 \bowtie_c Q_2$</td>
+        <td class="fragment" data-fragment-index="9">$|Q_1| \times |Q_2| \times \texttt{SEL}(c, Q_1\times Q_2)$</td>
+      </tr>
+
+      <tr>
+        <td>Distinct</td>
+        <td>$\delta_A(Q)$</td>
+        <td class="fragment" data-fragment-index="11">$\texttt{UNIQ}(A, Q)$</td>
+      </tr>
+
+      <tr>
+        <td>Aggregate</td>
+        <td>$\gamma_{A, B \leftarrow \Sigma}(Q)$</td>
+        <td class="fragment" data-fragment-index="12">$\texttt{UNIQ}(A, Q)$</td>
+      </tr>
+    </table>
+
+    <ul style="font-size: 50%; margin-top: 20px">
+      <li class="fragment" data-fragment-index="7">$\texttt{SEL}(c, Q)$: Selectivity of $c$ on $Q$, or $\frac{|\sigma_c(Q)|}{|Q|}$</li>
+      <li class="fragment" data-fragment-index="10">$\texttt{UNIQ}(A, Q)$: # of distinct values of $A$ in $Q$.</li>
+    </ul>
+  </section>
+  
+  <section>
+    <h3>Cardinality Estimation</h3>
+    <h4>(The Hard Parts)</h4>
+
+    <dl>
+      <dt style="margin-top: 50px;">$\sigma_c(Q)$ (Cardinality Estimation)</dt>
+      <dd>How many tuples will a condition $c$ allow to pass?</dd>
+
+      <dt style="margin-top: 50px;">$\delta_A(Q)$ (Distinct Values Estimation)</dt>
+      <dd>How many distinct values of attribute(s) $A$ exist?</dd>
+    </dl>
+  </section>
+</section>
+
+<section>
+  <section>
+    <p><b>Idea 1:</b> Assume each selection filters down to 10% of the data.</p>
+  </section>
+
+  <section>
+    <img src="graphics/Clipart/facepalm.jpg" class="stretch" />
+    <p class="fragment">no... really!</p>
+    <attribution>&copy; Paramount Pictures</attribution>
+  </section>
+
+  <section>
+    <h3>... there are problems</h3>
+    <div class="fragment">
+      <h4>Inconsistent estimation</h4>
+      <p style="font-size:70%;">$|\sigma_{c_1}(\sigma_{c_2}(R))| \neq |\sigma_{c_1 \wedge c_2}(R)|$</p>
+    </div>
+    <div class="fragment">
+      <h4>Too consistent estimation</h4>
+      <p style="font-size:70%;">$|\sigma_{id = 1}(\texttt{STUDENTS})| = |\sigma_{residence = 'NY'}(\texttt{STUDENTS})|$</p>
+    </div>
+    <p style="margin-top: 100px" class="fragment">... but remember that all we need is to <u>rank</u> plans.</p>
+  </section>
+
+  <section>
+    <p>Many major databases (Oracle, Postgres, Teradata, etc...) use something like 10% rule if they have nothing better.</p>
+
+
+    <p class="fragment" style="font-size: 80%; margin-top: 20px;">(The specific % varies by DBMS.)</p>
+
+    <p class="fragment" style="font-size: 80%; margin-top: 20px;">(Teradata uses 10% for the first <code>AND</code> clause,<br/>cut by another 75% for every subsequent clause)</p>
+  </section>
+
+  <section>
+    <h3>(Some) Estimation Techniques</h3>
+
+    <dl style="font-size: 80%">
+      <div>
+        <dt>The 10% rule</dt>
+        <dd>Rules of thumb if you have no other options...</dd>
+      </div>
+
+      <div class="fragment">
+        <dt>Uniform Prior</dt>
+        <dd>Use basic statistics to make a very rough guess.</dd>
+      </div>
+
+      <div class="fragment">
+        <dt>Sampling / History</dt>
+        <dd>Small, Quick Sampling Runs (or prior executions of the query).</dd>
+      </div>
+
+      <div class="fragment">
+        <dt>Histograms</dt>
+        <dd>Using more detailed statistics for improved guesses.</dd>
+      </div>
+
+      <div class="fragment">
+        <dt>Constraints</dt>
+        <dd>Using rules about the data for improved guesses.</dd>
+      </div>
+    </dl>
+  </section>
+</section>
+
+<!-- ============================================ -->
+
+<section>
+
+  <section>
+    <h3>Uniform Prior</h3>
+
+    <p style="text-align: left; margin-bottom: 0px; font-weight: bold;">We assume that for $\sigma_c(Q)$ or $\delta_A(Q)$...</p>
+    <ol>
+      <li>Basic statistics are known about $Q$: <ul>
+        <li style="margin-top: 0px;"><code>COUNT(*)</code></li>
+        <li style="margin-top: 0px;"><code>COUNT(DISTINCT A)</code> (for each A)</li>
+        <li style="margin-top: 0px;"><code>MIN(A)</code>, <code>MAX(A)</code> (for each numeric A)</li>
+      </ul></li>
+      <li>Attribute values are uniformly distributed.</li>
+      <li>No inter-attribute correlations.</li>
+    </ol>
+    <p class="fragment" style="font-size: 80%; font-weight: bold; margin-top: 20px;">
+      If necessary statistics aren't available (point 1), fall back to the 10% rule.  
+    </p>
+    <p class="fragment" style="font-size: 80%; font-weight: bold; margin-top: 20px;">
+      If statistical assumptions (points 2, 3) aren't perfectly true, we'll still likely be getting a better estimate than the 10% rule.
+    </p>
+  </section>
+
+  <section>
+    <h3>COUNT(DISTINCT A)</h3>
+    <p class="fragment" style="font-size: 70%; margin-top: 50px;">$\texttt{UNIQ}(A, \pi_{A, \ldots}(R)) = \texttt{UNIQ}(A, R)$</p>
+    <p class="fragment" style="font-size: 70%; margin-top: 50px;">$\texttt{UNIQ}(A, \sigma(R)) \approx \texttt{UNIQ}(A, R)$</p>
+    <p class="fragment" style="font-size: 70%; margin-top: 50px;">$\texttt{UNIQ}(A, R \times S) = \texttt{UNIQ}(A, R)$ or $\texttt{UNIQ}(A, S)$</p>
+    <p class="fragment" style="font-size: 70%; margin-top: 50px;">$$max(\texttt{UNIQ}(A, R), \texttt{UNIQ}(A, S)) \leq\\ \texttt{UNIQ}(A, R \uplus S)\\ \leq \texttt{UNIQ}(A, R) + \texttt{UNIQ}(A, S)$$</p>
+  </section>
+  
+  <section>
+    <h3>MIN(A), MAX(A)</h3>
+    <p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(\pi_{A, \ldots}(R)) = min_A(R)$</p>
+    <p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(\sigma_{A, \ldots}(R)) \approx min_A(R)$</p>
+    <p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(R \times S) = min_A(R)$ or $min_A(S)$</p>
+    <p class="fragment" style="font-size: 70%; margin-top: 50px;">$min_A(R \uplus S) = min(min_A(R), min_A(S))$</p>
+  </section>
+
+  <section>
+    <p>Estimating $\delta_A(Q)$ requires only <code>COUNT(DISTINCT A)</code></p>
+  </section>
+
+  <section>
+    <h3>Estimating Selectivity</h3>
+
+    <p>Selectivity is a probability ($\texttt{SEL}(c, Q) = P(c)$)</p>
+    <table style="font-size: 85%">
+      <tr class="fragment">
+        <td>$P(A = x_1)$</td>
+        <td>$=$</td>
+        <td class="fragment">$\frac{1}{\texttt{COUNT(DISTINCT A)}}$</td>
+      </tr>
+
+      <tr class="fragment">
+        <td>$P(A \in (x_1, x_2, \ldots, x_N))$</td>
+        <td>$=$</td>
+        <td class="fragment">$\frac{N}{\texttt{COUNT(DISTINCT A)}}$</td>
+      </tr>
+
+      <tr class="fragment">
+        <td>$P(A \leq x_1)$</td>
+        <td>$=$</td>
+        <td class="fragment">$\frac{x_1 - \texttt{MIN(A)}}{\texttt{MAX(A)} - \texttt{MIN(A)}}$</td>
+      </tr>
+
+      <tr class="fragment">
+        <td>$P(x_1 \leq A \leq x_2)$</td>
+        <td>$=$</td>
+        <td class="fragment">$\frac{x_2 - x_1}{\texttt{MAX(A)} - \texttt{MIN(A)}}$</td>
+      </tr>
+
+      <tr class="fragment">
+        <td>$P(A = B)$</td>
+        <td>$=$</td>
+        <td class="fragment" style="font-size: 60%">$\textbf{min}\left( \frac{1}{\texttt{COUNT(DISTINCT A)}}, \frac{1}{\texttt{COUNT(DISTINCT B)}} \right)$</td>
+      </tr>
+
+      <tr class="fragment">
+        <td>$P(c_1 \wedge c_2)$</td>
+        <td>$=$</td>
+        <td class="fragment" >$P(c_1) \cdot P(c_2)$</td>
+      </tr>
+
+      <tr class="fragment">
+        <td>$P(c_1 \vee c_2)$</td>
+        <td>$=$</td>
+        <td class="fragment" >$1 - (1 - P(c_1)) \cdot (1 - P(c_2))$</td>
+      </tr>
+    </table>
+
+    <p style="font-size: 60%">(With constants $x_1$, $x_2$, ...)</p>
+  </section>
+
+  <section>
+    <h3>Limitations</h3>
+
+    <dl>
+      <div>
+        <dt>Don't always have statistics for $Q$</dt>
+        <dd>For example, $\pi_{A \leftarrow (B \cdot C)}(R)$</dd>
+      </div>
+
+      <div>
+        <dt>Don't always have clear rules for $c$</dt>
+        <dd>For example, $\sigma_{\texttt{FitsModel}(A, B, C)}(R)$</dd>
+      </div>
+
+      <div>
+        <dt>Attribute values are not always uniformly distributed.</dt>
+        <dd>For example, <span style="font-size: 60%"> $|\sigma_{SPC\_COMMON = 'pin\ oak'}(T)|$ vs $|\sigma_{SPC\_COMMON = 'honeylocust'}(T)|$</span></dd>
+      </div>
+
+      <div>
+        <dt>Attribute values are sometimes correlated.</dt>
+        <dd>For example, $\sigma_{(stump < 5) \wedge (diam > 3)}(T)$</dd>
+      </div>
+    </dl>
+    <p class="fragment">...but handles <b>most</b> usage patterns</p>
+  </section>
+
+  <section>
+    ... next class more!
+  </section>
+
+</section>
--- a/src/teaching/cse-562/2021sp/slide/2021-03-09/EstimationXKCD.png
+++ b/src/teaching/cse-562/2021sp/slide/2021-03-09/EstimationXKCD.png
--- a/src/teaching/cse-562/2021sp/slide/2021-03-11-CostOpt2.erb
+++ b/src/teaching/cse-562/2021sp/slide/2021-03-11-CostOpt2.erb
@ -0,0 +1,606 @@
+---
+template: templates/cse4562_2021_slides.erb
+title: "Cost-Based Optimization"
+date: March 11, 2021
+textbook: Ch. 16
+---
+
+<section>
+  <section>
+    <h3>Remember the Real Goals</h3>
+    <ol>
+      <li>Accurately <b>rank</b> the plans.</li>
+      <li>Don't spend more time optimizing than you get back.</li>
+      <li>Don't pick a plan that uses more memory than you have.</li>
+    </ol>
+  </section>
+
+  <section>
+    <h3>Accounting</h3>
+    <p style="margin-top: 50px;">Figure out the cost of each <b>individual</b> operator.</p>
+    <p style="margin-top: 50px;">Only count the number of IOs <b>added</b> by each operator.</p>
+  </section>
+
+  <section>
+    <table style="font-size: 70%">
+      <tr><th>Operation</th><th>RA</th><th>Total IOs (#pages)</th><th>Memory (#tuples)</th></tr>
+      <tr>
+        <td>Table Scan</td>
+        <td>$R$</td>
+        <td>$\frac{|R|}{\mathcal P}$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td>Projection</td>
+        <td>$\pi(R)$</td>
+        <td>$\textbf{io}(R)$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td>Selection</td>
+        <td>$\sigma(R)$</td>
+        <td>$\textbf{io}(R)$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td>Union</td>
+        <td>$R \uplus S$</td>
+        <td>$\textbf{io}(R) + \textbf{io}(S)$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td style="vertical-align: middle;">Sort <span>(In-Mem)</span></td>
+        <td style="vertical-align: middle;">$\tau(R)$</td>
+        <td>$\textbf{io}(R)$</td>
+        <td>$O(|R|)$</td>
+      </tr>
+      <tr>
+        <td>Sort (On-Disk)</td>
+        <td>$\tau(R)$</td>
+        <td>$\frac{2 \cdot \lfloor log_{\mathcal B}(|R|) \rfloor}{\mathcal P} + \textbf{io}(R)$</td>
+        <td>$O(\mathcal B)$</td>
+      </tr>
+      <tr>
+        <td><span>(B+Tree)</span> Index Scan</td>
+        <td>$Index(R, c)$</td>
+        <td>$\log_{\mathcal I}(|R|) + \frac{|\sigma_c(R)|}{\mathcal P}$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td>(Hash) Index Scan</td>
+        <td>$Index(R, c)$</td>
+        <td>$1$</td>
+        <td>$O(1)$</td>
+      </tr>
+    </table>
+
+    <ol style="font-size: 50%; margin-top: 50px;">
+      <li>Tuples per Page ($\mathcal P$) <span>– Normally defined per-schema</span></li>
+      <li>Size of $R$ ($|R|$)</li>
+      <li>Pages of Buffer ($\mathcal B$)</li>
+      <li>Keys per Index Page ($\mathcal I$)</li>
+    </ol>
+  </section>
+  <section>
+    <table style="font-size: 70%">
+      <tr><th width="300px">Operation</th><th>RA</th><th>Total IOs (#pages)</th><th style="font-size: 80%;">Mem (#tuples)</th></tr>
+      <tr>
+        <td style="font-size: 60%">Nested Loop Join <span>(Buffer $S$ in mem)</span></td>
+        <td>$R \times_{mem} S$</td>
+        <td>$\textbf{io}(R)+\textbf{io}(S)$</td>
+        <td>$O(|S|)$</td>
+      </tr>
+      <tr>
+        <td   style="font-size: 60%">Block NLJ (Buffer $S$ on disk)</td>
+        <td>$R \times_{disk} S$</td>
+        <td>$\frac{|R|}{\mathcal B} \cdot \frac{|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td   style="font-size: 60%">Block NLJ (Recompute $S$)</td>
+        <td>$R \times_{redo} S$</td>
+        <td>$\textbf{io}(R) + \frac{|R|}{\mathcal B} \cdot \textbf{io}(S)$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td>1-Pass Hash Join</td>
+        <td>$R \bowtie_{1PH, c} S$</td>
+        <td>$\textbf{io}(R) + \textbf{io}(S)$</td>
+        <td>$O(|S|)$</td>
+      </tr>
+      <tr>
+        <td>2-Pass Hash Join</td>
+        <td>$R \bowtie_{2PH, c} S$</td>
+        <td>$\frac{2|R| + 2|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td>Sort-Merge Join </td>
+        <td>$R \bowtie_{SM, c} S$</td>
+        <td>[Sort]</td>
+        <td>[Sort]</td>
+      </tr>
+      <tr>
+        <td><span>(Tree)</span> Index NLJ</td>
+        <td>$R \bowtie_{INL, c}$</td>
+        <td>$|R| \cdot (\log_{\mathcal I}(|S|) + \frac{|\sigma_c(S)|}{\mathcal P})$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td>(Hash) Index NLJ</td>
+        <td>$R \bowtie_{INL, c}$</td>
+        <td>$|R| \cdot 1$</td>
+        <td>$O(1)$</td>
+      </tr>
+      <tr>
+        <td><span>(In-Mem)</span> Aggregate</td>
+        <td>$\gamma_A(R)$</td>
+        <td>$0$</td>
+        <td>$adom(A)$</td>
+      </tr>
+      <tr>
+        <td   style="font-size: 90%">(Sort/Merge) Aggregate</td>
+        <td>$\gamma_A(R)$</td>
+        <td>[Sort]</td>
+        <td>[Sort]</td>
+      </tr>
+    </table>
+
+    <ol style="font-size: 50%;">
+      <li>Tuples per Page ($\mathcal P$) <span>– Normally defined per-schema</span></li>
+      <li>Size of $R$ ($|R|$)</li>
+      <li>Pages of Buffer ($\mathcal B$)</li>
+      <li>Keys per Index Page ($\mathcal I$)</li>
+      <li>Number of distinct values of $A$ ($adom(A)$)</li>
+    </ol>
+  </section>
+</section>
+
+<section>
+  <section>
+    <h3>Cardinality Estimation</h3>
+    <h4>(The Hard Parts)</h4>
+
+    <dl>
+      <dt style="margin-top: 50px;">$\sigma_c(Q)$ (Cardinality Estimation)</dt>
+      <dd>How many tuples will a condition $c$ allow to pass?</dd>
+
+      <dt style="margin-top: 50px;">$\delta_A(Q)$ (Distinct Values Estimation)</dt>
+      <dd>How many distinct values of attribute(s) $A$ exist?</dd>
+    </dl>
+  </section>
+
+  <section>
+    <h3>Remember the Real Goals</h3>
+    <ol>
+      <li>Accurately <b>rank</b> the plans.</li>
+      <li>Don't spend more time optimizing than you get back.</li>
+    </ol>
+  </section>
+
+  <section>
+    <h3>(Some) Estimation Techniques</h3>
+
+    <dl style="font-size: 80%">
+      <div class="fragment highlight-grey" data-fragment-index="1">
+        <dt>Guess Randomly</dt>
+        <dd>Rules of thumb if you have no other options...</dd>
+      </div>
+
+      <div class="fragment highlight-grey" data-fragment-index="1">
+        <dt>Uniform Prior</dt>
+        <dd>Use basic statistics to make a very rough guess.</dd>
+      </div>
+
+      <div>
+        <dt>Sampling / History</dt>
+        <dd>Small, Quick Sampling Runs (or prior executions of the query).</dd>
+      </div>
+
+      <div>
+        <dt>Histograms</dt>
+        <dd>Using more detailed statistics for improved guesses.</dd>
+      </div>
+
+      <div>
+        <dt>Constraints</dt>
+        <dd>Using rules about the data for improved guesses.</dd>
+      </div>
+    </dl>
+  </section>
+</section>
+
+
+<section>
+  <section>
+    <h3>(Some) Estimation Techniques</h3>
+
+    <dl style="font-size: 80%">
+      <dt style="color: grey;">Guess Randomly</dt>
+      <dd style="color: grey;">Rules of thumb if you have no other options...</dd>
+
+      <dt style="color: grey;">Uniform Prior</dt>
+      <dd style="color: grey;">Use basic statistics to make a very rough guess.</dd>
+
+      <dt style="color: blue;">Sampling / History</dt>
+      <dd style="color: blue;">Small, Quick Sampling Runs (or prior executions of the query).</dd>
+
+      <dt style="color: grey;">Histograms</dt>
+      <dd style="color: grey;">Using more detailed statistics for improved guesses.</dd>
+
+      <dt style="color: grey;">Constraints</dt>
+      <dd style="color: grey;">Using rules about the data for improved guesses.</dd>
+    </dl>
+  </section>
+
+  <section>
+    <p><b>Idea 1:</b> Pick 100 tuples at random from each input table.</p>
+  </section>
+
+  <section>
+    <svg data-src="2021-03-11/JoinIssue.svg" />
+  </section>
+
+  <section>
+    <h3>The Birthday Paradox</h3>
+
+    <p style="margin-top: 50px;">
+      Assume: $\texttt{UNIQ}(A, R) = \texttt{UNIQ}(A, S) = N$
+    </p>
+
+    <p style="margin-top: 50px;">
+      It takes $O(\sqrt{N})$ samples from both $R$ and $S$ <br/> to get even <b>one match.</b>
+    </p>
+  </section>
+
+  <section>
+    <p>To be resumed later in the term when we talk about AQP</p>
+  </section>
+
+  <section>
+    <p><b>How DBs Do It</b>: Instrument queries while running them.<ul>
+      <li class="fragment">The first time you run a query it <i>might</i> be slow.</li>
+      <li class="fragment">The second, third, fourth, etc... times it'll be fast.</li>
+    </ul></p>
+  </section>
+</section>
+
+<section>
+
+  <section>
+    <h3>(Some) Estimation Techniques</h3>
+
+    <dl style="font-size: 80%">
+      <dt style="color: grey;">Guess Randomly</dt>
+      <dd style="color: grey;">Rules of thumb if you have no other options...</dd>
+
+      <dt style="color: grey;">Uniform Prior</dt>
+      <dd style="color: grey;">Use basic statistics to make a very rough guess.</dd>
+
+      <dt style="color: grey;">Sampling / History</dt>
+      <dd style="color: grey;">Small, Quick Sampling Runs (or prior executions of the query).</dd>
+
+      <dt style="color: blue;">Histograms</dt>
+      <dd style="color: blue;">Using more detailed statistics for improved guesses.</dd>
+
+      <dt style="color: grey;">Constraints</dt>
+      <dd style="color: grey;">Using rules about the data for improved guesses.</dd>
+    </dl>
+  </section>
+
+  <section>
+    <h3>Limitations of Uniform Prior</h3>
+
+    <dl>
+      <div class="fragment highlight-grey" data-fragment-index="1">
+        <dt>Don't always have statistics for $Q$</dt>
+        <dd>For example, $\pi_{A \leftarrow (B \times C)}(R)$</dd>
+      </div>
+
+      <div class="fragment highlight-grey" data-fragment-index="1">
+        <dt>Don't always have clear rules for $c$</dt>
+        <dd>For example, $\sigma_{\texttt{FitsModel}(A, B, C)}(R)$</dd>
+      </div>
+
+      <div class="fragment highlight-blue" data-fragment-index="1">
+        <dt>Attribute values are not always uniformly distributed.</dt>
+        <dd>For example, <span style="font-size: 60%"> $|\sigma_{SPC\_COMMON = 'pin\ oak'}(T)|$ vs $|\sigma_{SPC\_COMMON = 'honeylocust'}(T)|$</span></dd>
+      </div>
+
+      <div class="fragment highlight-grey" data-fragment-index="1">
+        <dt>Attribute values are sometimes correlated.</dt>
+        <dd>For example, $\sigma_{(stump < 5) \wedge (diam > 3)}(T)$</dd>
+      </div>
+
+    </dl>
+  </section>
+
+  <section>
+    <p class="fragment highlight-grey" data-fragment-index="1">
+      <b>Ideal Case:</b> You have some 
+      $$f(x) = \left(\texttt{SELECT COUNT(*) WHERE A = x}\right)$$
+      (and similarly for the other aggregates)
+    </p>
+    <p class="fragment" data-fragment-index="1">
+      <b>Slightly Less Ideal Case:</b> You have some 
+      $$f(x) \approx \left(\texttt{SELECT COUNT(*) WHERE A = x}\right)$$
+    </p>
+  </section>
+
+  <section>
+    <p>If this sounds like CDF-based indexing... you're right!</p>
+
+    <p class="fragment">... but we're not going to talk about NNs today</p>
+  </section>
+</section>
+
+<section>
+  <section>
+    <p>
+      <b>Simpler/Faster Idea: </b> Break $f(x)$ into chunks
+    </p>
+  </section>
+
+  <section>
+    <h3>Example Data</h3>
+    <table style="font-size: 80%">
+      <tr><th>Name</th>      <th>YearsEmployed</th>  <th>Role</th></tr>
+      <tr><td>'Alice'</td>   <td>3</td>              <td>1</td></tr>
+      <tr><td>'Bob'</td>     <td>2</td>              <td>2</td></tr>
+      <tr><td>'Carol'</td>   <td>3</td>              <td>1</td></tr>
+      <tr><td>'Dave'</td>    <td>1</td>              <td>3</td></tr>
+      <tr><td>'Eve'</td>     <td>2</td>              <td>2</td></tr>
+      <tr><td>'Fred'</td>    <td>2</td>              <td>3</td></tr>
+      <tr><td>'Gwen'</td>    <td>4</td>              <td>1</td></tr>
+      <tr><td>'Harry'</td>   <td>2</td>              <td>3</td></tr>
+    </table>
+  </section>
+
+  <section>
+    <h3>Histograms</h3>
+    <table style="font-size: 70%">
+      <tr><th>YearsEmployed</th><th>COUNT</th></tr>
+      <tr><td>1</td>            <td>1</td>    </tr>
+      <tr><td>2</td>            <td>4</td>    </tr>
+      <tr><td>3</td>            <td>2</td>    </tr>
+      <tr><td>4</td>            <td>1</td>    </tr>
+    </table>
+
+    <table>
+      <tr class="fragment"><td style="font-size: 70%"><code>COUNT(DISTINCT YearsEmployed)</code> </td><td class="fragment">$= 4$</td></tr>
+      <tr class="fragment"><td style="font-size: 70%"><code>MIN(YearsEmployed)</code>            </td><td class="fragment">$= 1$</td></tr>
+      <tr class="fragment"><td style="font-size: 70%"><code>MAX(YearsEmplyed)</code>             </td><td class="fragment">$= 4$</td></tr>
+      <tr class="fragment"><td style="font-size: 70%"><code>COUNT(*) YearsEmployed = 2</code>    </td><td class="fragment">$= 4$</td></tr>
+    </table>
+  </section>
+
+  <section>
+    <h3>Histograms</h3>
+    <table style="font-size: 70%">
+      <tr><th>YearsEmployed</th><th>COUNT</th></tr>
+      <tr><td>1-2</td>          <td>5</td>    </tr>
+      <tr><td>3-4</td>          <td>3</td>    </tr>
+    </table>
+
+    <table>
+      <tr class="fragment"><td style="font-size: 70%"><code>COUNT(DISTINCT YearsEmployed)</code> </td><td class="fragment">$= 4$</td></tr>
+      <tr class="fragment"><td style="font-size: 70%"><code>MIN(YearsEmployed)</code>            </td><td class="fragment">$= 1$</td></tr>
+      <tr class="fragment"><td style="font-size: 70%"><code>MAX(YearsEmplyed)</code>             </td><td class="fragment">$= 4$</td></tr>
+      <tr class="fragment"><td style="font-size: 70%"><code>COUNT(*) YearsEmployed = 2</code>    </td><td class="fragment">$= \frac{5}{2}$</td></tr>
+    </table>
+  </section>
+
+  <section>
+    <h3>The Extreme Case</h3>
+    <table style="font-size: 70%">
+      <tr><th>YearsEmployed</th><th>COUNT</th></tr>
+      <tr><td>1-4</td>          <td>8</td>    </tr>
+    </table>
+
+    <table>
+      <tr class="fragment"><td style="font-size: 70%"><code>COUNT(DISTINCT YearsEmployed)</code> </td><td class="fragment">$= 4$</td></tr>
+      <tr class="fragment"><td style="font-size: 70%"><code>MIN(YearsEmployed)</code>            </td><td class="fragment">$= 1$</td></tr>
+      <tr class="fragment"><td style="font-size: 70%"><code>MAX(YearsEmplyed)</code>             </td><td class="fragment">$= 4$</td></tr>
+      <tr class="fragment"><td style="font-size: 70%"><code>COUNT(*) YearsEmployed = 2</code>    </td><td class="fragment">$= \frac{8}{4}$</td></tr>
+    </table>
+  </section>
+
+  <section>
+    <h3>More Example Data</h3>
+    <table style="font-size: 80%; float: left;">
+      <tr><th>Value</th>  <th>COUNT</th>  </tr>
+      <tr><td> 1-10</td>  <td>20</td>     </tr>
+      <tr><td>11-20</td>  <td> 0</td>     </tr>
+      <tr><td>21-30</td>  <td>15</td>     </tr>
+      <tr><td>31-40</td>  <td>30</td>     </tr>
+      <tr><td>41-50</td>  <td>22</td>     </tr>
+      <tr><td>51-60</td>  <td>63</td>     </tr>
+      <tr><td>61-70</td>  <td>10</td>     </tr>
+      <tr><td>71-80</td>  <td>10</td>     </tr>
+    </table>
+
+    <table style="margin-top: 100px;">
+      <tr class="fragment">
+        <td style="font-size: 70%; width: 350px;"><code>SELECT … WHERE A = 33</code> </td>
+        <td class="fragment" style="font-size: 80%; text-align: left; width: 200px;">$= \frac{1}{40-30}\cdot 30 = 3$</td>
+      </tr>
+      <tr><td style="height: 70px;"></td><td></td></tr>
+      <tr class="fragment">
+        <td style="font-size: 70%; width: 350px;"><code>SELECT … WHERE A > 33</code> </td>
+        <td class="fragment" style="font-size: 80%; text-align: left; width: 200px;">$= \frac{40-33}{40-30}\cdot 30+22$ $\;\;\;+63+10+10$ $= 126$ </td>
+      </tr>
+    </table>
+  </section>
+</section>
+
+<section>
+  <section>
+    <h3>(Some) Estimation Techniques</h3>
+
+    <dl style="font-size: 80%">
+      <dt style="color: grey;">Guess Randomly</dt>
+      <dd style="color: grey;">Rules of thumb if you have no other options...</dd>
+
+      <dt style="color: grey;">Uniform Prior</dt>
+      <dd style="color: grey;">Use basic statistics to make a very rough guess.</dd>
+
+      <dt style="color: grey;">Sampling / History</dt>
+      <dd style="color: grey;">Small, Quick Sampling Runs (or prior executions of the query).</dd>
+
+      <dt style="color: grey;">Histograms</dt>
+      <dd style="color: grey;">Using more detailed statistics for improved guesses.</dd>
+
+      <dt style="color: blue;">Constraints</dt>
+      <dd style="color: blue;">Using rules about the data for improved guesses.</dd>
+    </dl>
+  </section>
+
+  <section>
+    <h3>Key / Unique Constraints</h3>
+    <pre style="margin-top: 50px;"><code class="sql">
+      CREATE TABLE R ( 
+        A int,
+        B int UNIQUE
+        ... 
+        PRIMARY KEY A
+      );
+    </code></pre>
+    <p style="margin-top: 50px;">
+      No duplicate values in the column.
+      $$\texttt{COUNT(DISTINCT A)} = \texttt{COUNT(*)}$$
+    </p>
+  </section>
+
+  <section>
+    <h3>Foreign Key Constraints</h3>
+    <pre style="margin-top: 50px;"><code class="sql">
+      CREATE TABLE S ( 
+        B int,
+        ... 
+        FOREIGN KEY B REFERENCES R.B
+      );
+    </code></pre>
+    <p style="margin-top: 50px;">
+      All values in the column appear in another table.
+      $$\pi_{attrs(S)}\left(S \bowtie_B R\right) \subseteq S$$
+    </p>
+  </section>
+
+  <section>
+    <h3>Functional Dependencies</h3>
+
+    <pre style="margin-top: 50px;"><code class="sql">
+      Not expressible in SQL
+    </code></pre>
+
+    <p style="margin-top: 50px;">
+      One set of columns uniquely determines another.<br/>
+      $\pi_{A}(\delta(\pi_{A, B}(R)))$ has no duplicates and...
+      $$\pi_{attrs(R)-A}(R) \bowtie_A \delta(\pi_{A, B}(R)) = R$$
+    </p>
+  </section>
+
+  <section>
+    <h3>Constraints</h3>
+
+    <h4>The Good</h4>
+    <ul>
+      <li style="font-size: 70%" class="fragment">Sanity check on your data: Inconsistent data triggers failures.</li>
+      <li style="font-size: 70%" class="fragment">More opportunities for query optimization.</li>
+    </ul>
+
+    <h4 style="margin-top: 50px;" class="fragment">The Not-So Good</h4>
+    <ul>
+      <li style="font-size: 70%" class="fragment">Validating constraints whenever data changes is (usually) expensive.</li>
+      <li style="font-size: 70%" class="fragment">Inconsistent data triggers failures.</li>
+    </ul>
+
+  </section>
+
+  <section>
+    <h3>Foreign Key Constraints</h3>
+
+    <p style="margin-top: 50px;">Foreign keys are like pointers.  What happens with broken pointers?</p>
+  </section>
+
+  <section>
+    <h3>Foreign Key Enforcement</h3>
+
+    <p>Foreign keys are defined with update triggers <code>ON INSERT [X]</code>, <code>ON UPDATE [X]</code>, <code>ON DELETE [X]</code>.  Depending on what [X] is, the constraint is enforced differently:</p>
+
+    <dl style="font-size: 80%">
+      <dt><code>CASCADE</code></dt>
+      <dd>Create/delete rows as needed to avoid invalid foreign keys.</dd>
+
+      <dt><code>NO ACTION</code></dt>
+      <dd>Abort any transaction that ends with an invalid foreign key reference.</dd>
+
+      <dt><code>SET NULL</code></dt>
+      <dd>Automatically replace any invalid foreign key references with NULL</dd>
+    </dl>
+  </section>
+
+  <section>
+    <p style="font-weight: bold;">
+      <code>CASCADE</code> and <code>NO ACTION</code> ensure that the data never has broken pointers, so
+    </p>
+    $$\pi_{attrs(S)}\left(S \bowtie_B R\right) = S$$
+  </section>
+
+  <section>
+    <h3>Functional Dependencies</h3>
+
+    <p style="margin-top: 50px;"><b>A generalization of keys:</b> One set of attributes that uniquely identify another.</p>
+
+    <ul>
+      <li>SS# uniquely identifies Name.</li>
+      <li>Employee uniquely identifies Manager.</li>
+      <li>Order number uniquely identifies Customer Address.</li>
+    </ul>
+
+    <p class="fragment">Two rows with the same As must have the same Bs</p>
+    <p class="fragment" style="font-size: 80%">(but can still have identical Bs for two different As)</p>
+  </section>
+
+  <section>
+    <h3>Normal Forms</h3>
+    <p style="margin-top: 50px;">"All functional dependencies should be keys."</p>
+    <p class="fragment">(Otherwise you want two separate relations)</p>
+    <p class="fragment">(for more details, see CSE 560)</p>
+  </section>
+  
+  <section>
+    
+    <p style="font-size: 70%">
+      $$P(A = B) = min\left(\frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}, \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } B)}\right)$$
+    </p>
+
+  </section>
+  <section>
+
+    <p>
+      $$R \bowtie_{R.A = S.B} S = \sigma_{R.A = S.B}(R \times S)$$
+      (and $S.B$ is a foreign key referencing $R.A$)
+    </p>
+
+    <p class="fragment" style="margin-top: 30px; font-size: 80%">
+      The (foreign) key constraint gives us two things...
+      $$\texttt{COUNT}(\texttt{DISTINCT } A) \approx \texttt{COUNT}(\texttt{DISTINCT } B)$$
+      <span style="font-size: 60%; font-weight: bold; margin: 0px;">and</span>
+      $$\texttt{COUNT}(\texttt{DISTINCT } A) = |R|$$
+    </p>
+
+    <p class="fragment" style="margin-top: 30px; font-size: 80%">
+      Based on the first property the total number of rows is roughly...
+      $$|R| \times |S| \times \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}$$
+    </p>
+
+    <p class="fragment" style="margin-top: 30px; font-size: 80%">
+      Then based on the second property...
+      $$ = |R| \times |S| \times \frac{1}{|R|} = |S|$$
+    </p>
+
+    <p class="fragment" style="margin-top: 30px; font-size: 50%">(Statistics/Histograms will give you the same outcome... but constraints can be easier to propagate)</p>
+  </section>
+</section>
+
--- a/src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg
+++ b/src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg
@ -0,0 +1,279 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   xmlns:dc="http://purl.org/dc/elements/1.1/"
+   xmlns:cc="http://creativecommons.org/ns#"
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   width="113.48098mm"
+   height="129.51984mm"
+   viewBox="0 0 113.48098 129.51984"
+   version="1.1"
+   id="svg8"
+   inkscape:version="0.92.2 5c3e80d, 2017-08-06"
+   sodipodi:docname="2018-03-05-JoinIssue.svg">
+  <defs
+     id="defs2" />
+  <sodipodi:namedview
+     id="base"
+     pagecolor="#ffffff"
+     bordercolor="#666666"
+     borderopacity="1.0"
+     inkscape:pageopacity="0.0"
+     inkscape:pageshadow="2"
+     inkscape:zoom="0.64"
+     inkscape:cx="214.95617"
+     inkscape:cy="143.89299"
+     inkscape:document-units="mm"
+     inkscape:current-layer="layer2"
+     showgrid="false"
+     fit-margin-top="0"
+     fit-margin-left="0"
+     fit-margin-right="0"
+     fit-margin-bottom="0"
+     inkscape:window-width="1440"
+     inkscape:window-height="852"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1" />
+  <metadata
+     id="metadata5">
+    <rdf:RDF>
+      <cc:Work
+         rdf:about="">
+        <dc:format>image/svg+xml</dc:format>
+        <dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+        <dc:title></dc:title>
+      </cc:Work>
+    </rdf:RDF>
+  </metadata>
+  <g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(-11.141665,-21.581365)">
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:22.57777786px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+       x="82.398811"
+       y="34.684521"
+       id="text12"><tspan
+         sodipodi:role="line"
+         x="82.398811"
+         y="34.684521"
+         style="font-size:22.57777786px;stroke-width:0.26458332"
+         id="tspan16">⋈</tspan></text>
+    <text
+       id="text24"
+       y="70.970238"
+       x="58.208336"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:22.57777786px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+       xml:space="preserve"><tspan
+         id="tspan22"
+         style="font-size:22.57777786px;stroke-width:0.26458332"
+         y="70.970238"
+         x="58.208336"
+         sodipodi:role="line">⋈</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:22.57777786px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+       x="30.994051"
+       y="113.30357"
+       id="text28"><tspan
+         sodipodi:role="line"
+         x="30.994051"
+         y="113.30357"
+         style="font-size:22.57777786px;stroke-width:0.26458332"
+         id="tspan32">σ</tspan></text>
+    <flowRoot
+       xml:space="preserve"
+       id="flowRoot38"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:48px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none"
+       transform="scale(0.26458333)"><flowRegion
+         id="flowRegion40"><rect
+           id="rect42"
+           width="108.57143"
+           height="608.57141"
+           x="114.28571"
+           y="525.37683" /></flowRegion><flowPara
+         id="flowPara44"></flowPara></flowRoot>    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:19.75555611px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+       x="32.203571"
+       y="151.1012"
+       id="text57"><tspan
+         sodipodi:role="line"
+         id="tspan55"
+         x="32.203571"
+         y="151.1012"
+         style="font-size:19.75555611px;stroke-width:0.26458332">R</tspan></text>
+    <text
+       id="text61"
+       y="114.05953"
+       x="79.072617"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:19.75555611px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+       xml:space="preserve"><tspan
+         style="font-size:19.75555611px;stroke-width:0.26458332"
+         y="114.05953"
+         x="79.072617"
+         id="tspan59"
+         sodipodi:role="line">S</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:19.75555611px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+       x="111.57857"
+       y="72.482147"
+       id="text65"><tspan
+         sodipodi:role="line"
+         id="tspan63"
+         x="111.57857"
+         y="72.482147"
+         style="font-size:19.75555611px;stroke-width:0.26458332">T</tspan></text>
+    <path
+       style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 66.145832,56.807941 89.296874,36.550779 114.51497,54.740885"
+       id="path69"
+       inkscape:connector-curvature="0" />
+    <path
+       style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 84.335936,96.49544 65.732421,71.690753 37.207031,98.149088"
+       id="path71"
+       inkscape:connector-curvature="0" />
+    <path
+       style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="m 37.207031,115.09896 v 18.60351"
+       id="path73"
+       inkscape:connector-curvature="0" />
+  </g>
+  <g
+     inkscape:groupmode="layer"
+     id="layer2"
+     inkscape:label="Layer 2"
+     transform="translate(-11.141665,-21.581365)">
+    <g
+       id="g893"
+       transform="translate(-5.374349,-3.3072916)"
+       class="fragment">
+      <rect
+         ry="2.4804688"
+         y="138.24998"
+         x="17.016014"
+         height="14.882812"
+         width="53.330074"
+         id="rect884"
+         style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none" />
+      <text
+         id="text888"
+         y="148.02716"
+         x="43.424736"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+         xml:space="preserve"><tspan
+           style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332"
+           y="148.02716"
+           x="43.424736"
+           id="tspan886"
+           sodipodi:role="line">100 Tuples</tspan></text>
+    </g>
+    <g
+       transform="translate(-5.374349,-40.927734)"
+       id="g901"
+       class="fragment">
+      <rect
+         style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none"
+         id="rect895"
+         width="53.330074"
+         height="14.882812"
+         x="17.016014"
+         y="138.24998"
+         ry="2.4804688" />
+      <text
+         xml:space="preserve"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+         x="43.424736"
+         y="148.02716"
+         id="text899"><tspan
+           sodipodi:role="line"
+           id="tspan897"
+           x="43.424736"
+           y="148.02716"
+           style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332">10 Tuples</tspan></text>
+    </g>
+    <g
+       class="fragment"
+       transform="translate(53.776558,-40.927734)"
+       id="g949">
+      <rect
+         style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none"
+         id="rect943"
+         width="53.330074"
+         height="14.882812"
+         x="17.016014"
+         y="138.24998"
+         ry="2.4804688" />
+      <text
+         xml:space="preserve"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+         x="43.424736"
+         y="148.02716"
+         id="text947"><tspan
+           sodipodi:role="line"
+           id="tspan945"
+           x="43.424736"
+           y="148.02716"
+           style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332">100 Tuples</tspan></text>
+    </g>
+    <g
+       class="fragment"
+       id="g941"
+       transform="translate(20.257161,-80.615234)">
+      <rect
+         ry="2.4804688"
+         y="138.24998"
+         x="17.016014"
+         height="14.882812"
+         width="53.330074"
+         id="rect935"
+         style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none" />
+      <text
+         id="text939"
+         y="148.02716"
+         x="43.424736"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+         xml:space="preserve"><tspan
+           style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332"
+           y="148.02716"
+           x="43.424736"
+           id="tspan937"
+           sodipodi:role="line">0 Tuples</tspan></text>
+    </g>
+    <g
+       transform="translate(46.302083,-116.16862)"
+       id="g925"
+       class="fragment">
+      <rect
+         style="fill:#0000ff;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none"
+         id="rect919"
+         width="53.330074"
+         height="14.882812"
+         x="17.016014"
+         y="138.24998"
+         ry="2.4804688" />
+      <text
+         xml:space="preserve"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12.69999981px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#cccccc;fill-opacity:1;stroke:none;stroke-width:0.26458332"
+         x="43.424736"
+         y="148.02716"
+         id="text923"><tspan
+           sodipodi:role="line"
+           id="tspan921"
+           x="43.424736"
+           y="148.02716"
+           style="font-size:8.46666622px;text-align:center;text-anchor:middle;fill:#cccccc;stroke-width:0.26458332">0 Tuples</tspan></text>
+    </g>
+  </g>
+</svg>