Slides

2019-02-22 13:00:24 -05:00 · 2019-02-22 13:00:24 -05:00 · fbd8da12f1
parent 86ecc3de75
commit fbd8da12f1
2 changed files with 478 additions and 186 deletions
--- a/src/teaching/cse-562/2019sp/slide/2019-02-20-Indexing2.html
+++ b/src/teaching/cse-562/2019sp/slide/2019-02-20-Indexing2.html
@ -1,7 +1,7 @@
 ---
 template: templates/cse4562_2019_slides.erb
 title: "Indexing (Part 2)"
-date: February 2, 2019
+date: February 20, 2019
 textbook: "Ch. 14.3"
 ---

@ -120,188 +120,3 @@ textbook: "Ch. 14.3"
  </section>
 </section>

-<section>
-  <!-- 
-    Key challenges: 
-      - Update-heavy workloads.  Can't afford to keep updating the index.
-
-    Things to talk about: 
-      - Tiered vs Leveled
-      - Fence Pointers
-
-  -->
-
-  <section>
-    <h2>Log-Structured Merge Trees</h2>
-  </section>
-
-  <section>
-    <p>Some filesystems (HDFS, S3, SSDs) don't like updates</p>
-
-    <p class="fragment">You don't update data, you rewrite the entire file (or a large fragment of it).</p>
-  </section>
-
-  <section>
-    <p><b>Idea 1:</b> Buffer updates, periodically write out new blocks to a "log".</p>
-
-    <ul>
-      <li class="fragment">Not organized!  Slooooow access</li>
-      <li class="fragment">Grows eternally!  Old values get duplicated</li>
-    </ul>
-  </section>
-
-  <section>
-    <p><b>Idea 2:</b> Keep data on disk sorted.  Buffer updates.  Periodically merge-sort buffer into the data.</p>
-
-    <ul>
-      <li class="fragment">$O(N)$ IOs to merge-sort</li>
-      <li class="fragment">"Write amplification" (each record gets read/written on all buffer merges).</li>
-    </ul>
-  </section>
-
-  <section>
-    <p><b>Idea 3:</b> Keep data on disk sorted, and in multiple "levels".  Buffer updates.</p>
-
-    <ol>
-      <li class="fragment">When buffer full, write to disk as Level 1.</li>
-      <li class="fragment">If Level 1 exists, merge buffer into Level 1 to create Level 2.</li>
-      <li class="fragment">If old Level 2 exists, merge new and old to create Level 3.</li>
-      <li class="fragment">etc...</li>
-    </ol>
-
-    <p class="fragment"><b>Key observation: </b> Level $i$ is $2^{i-1}$ times the size of the buffer (the size of the level doubles with each merge).</p>
-    <p class="fragment"><b>Result: </b> Each record copied <i>at most</i> $\log(N)$ times.</p>
-  </section>
-
-  <section>
-    <h3>Other design choices</h3>
-
-    <dl>
-      <div class="fragment">
-        <dt>Fanout</dt>
-        <dd>Instead of doubling the size of each level, have each level grow by a factor of $K$.  Level $i$ is merged into level $i+1$ when its size grows above $K^{i-1}$.</dd>
-      </div>
-
-      <div class="fragment">
-        <dt>"Tiered" (instead of "Leveled")</dt>
-        <dd>Store each level as $K$ sorted runs instead of proactively merging them.  Merge the runs together when escalating them to the next level.</dd>
-      </div>
-    </dl>
-  </section>
-
-  <section>
-    <h3>Other design choices</h3>
-
-    <dl>
-      <div class="fragment">
-        <dt>Fence Pointers</dt>
-        <dd>Separate each sorted run into blocks, and store the start/end keys for each block (makes it easier to evaluate selection predicates)</dd>
-      </div>
-
-      <div class="fragment">
-        <dt>Bloom Filters</dt>
-        <dd>Some data structures can be used to quickly answer lookups.</dd>
-      </div>
-    </dl>
-  </section>
-
-  <section>
-    <h3>References</h3>
-    <dl style="font-size: 80%;">
-      <dt><a href="https://link-springer-com.gate.lib.buffalo.edu/article/10.1007/s002360050048">"The log-structured merge-tree (LSM-tree)"</a> by O'Neil et. al.</dt>
-      <dd>The original LSM tree paper</dd>
-
-      <dt><a href="https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=2213862">"bLSM: a general purpose log structured merge tree"</a> by Sears et. al.</dt>
-      <dd>LSM Trees with background compaction.  Also a clear summary of LSM trees</dd>
-
-      <dt><a href="http://daslab.seas.harvard.edu/monkey/">Monkey: Optimal Navigable Key-Value Store</a></dt>
-      <dd>A comprehensive overview of the LSM Tree design space.</dd>
-    </dl>
-  </section>
-
-</section>
-
-
-<section>
-  <section>
-    <h3>CDF-Based Indexing</h3>
-    <p class="fragment" style="margin-top: 100px;"><b>"The Case for Learned Index Structures"</b><br/>by Kraska, Beutel, Chi, Dean, Polyzotis</p>
-  </section>
-
-  <section>
-    <svg data-src="graphics/2018-02-23-CDF-Linear.svg"/>
-  </section>
-
-  <section>
-    <svg data-src="graphics/2018-02-23-CDF-LinearApprox.svg"/>
-  </section>
-
-  <section>
-    <h3>Cumulative Distribution Function (CDF)</h3>
-    <img src="graphics/2018-02-23-CDF-Plot.png" />
-    <p>$f(key) \mapsto position$</p>
-    <p class="fragment" style="font-size: 50%">(not exactly true, but close enough for today)</p>
-  </section>
-
-  <section>
-    <h3>Using CDFs to find records</h3>
-    <dl>
-      <dt>Ideal: $f(k) = position$</dt>
-      <dd>$f$ encodes the <b>exact</b> location of a record</dd>
-
-      <dt class="fragment">Ok: $f(k) \approx position$<br/> <span class="fragment">($\left|f(k) - position\right| < \epsilon$)</span></dt>
-      <dd class="fragment">$f$ gets you to within $\epsilon$ of the key</dd>
-      <dd class="fragment">Only need local search on one (or so) leaf pages.</dd>
-    </dl>
-    <p class="fragment"><b>Simplified Use Case:</b> Static data with "infinite" prep time.</p>
-  </section>
-
-  <section>
-    <h3>How to define $f$?</h3>
-    <ul>
-      <li class="fragment">Linear ($f(k) = a\cdot k + b$)</li>
-      <li class="fragment">Polynomial ($f(k) = a\cdot k + b \cdot k^2 + \ldots$)</li>
-      <li class="fragment">Neural Network ($f(k) = $<img src="graphics/Clipart/magic-wand.png" height="100px" style="vertical-align: middle;">)</li>
-    </ul>
-  </section>
-
-  <section>
-    <p>We have infinite prep time, so fit a (tiny) neural network to the CDF.</p>
-  </section>
-
-  <section>
-    <h3>Neural Networks</h3>
-    <ul>
-      <dt class="fragment" data-fragment-index="1">Extremely Generalized Regression</dt>
-      <dd class="fragment" data-fragment-index="1">Essentially a really really really complex, fittable function with a lot of parameters.</dd>
-      <dt class="fragment" data-fragment-index="2">Captures Nonlinearities</dt>
-      <dd class="fragment" data-fragment-index="2">Most regressions can't handle discontinuous functions, which many key spaces have.</dd>
-      <dt class="fragment" data-fragment-index="3">No Branching</dt>
-      <dd class="fragment" data-fragment-index="3"><code>if</code> statements are <b>really</b> expensive on modern processors.</dd>
-      <dd class="fragment" data-fragment-index="4">(Compare to B+Trees with $\log_2 N$ if statements)</dd>
-    </ul>
-  </section>
-
-  <section>
-    <h3>Summary</h3>
-
-    <dl style="font-size: 80%;">
-      <dt>Tree Indexes</dt>
-      <dd>$O(\log N)$ access, supports range queries, easy size changes.</dd>
-
-      <dt>Hash Indexes</dt>
-      <dd>$O(1)$ access, doesn't change size efficiently, only equality tests.</dd>
-
-      <dt>LSM Trees</dt>
-      <dd>$O(K\log(\frac{N}{B}))$ access.  Good for update-unfriendly filesystems.</dd>
-
-      <dt>CDF Indexes</dt>
-      <dd>$O(1)$ access, supports range queries, static data only.</dd>
-    </dl>
-  </section>
-
-</section>
-
-<section>
-  <p><b>Next Class:</b> Using Indexes</p>
-</section>
--- a/src/teaching/cse-562/2019sp/slide/2019-02-22-Indexing3.html
+++ b/src/teaching/cse-562/2019sp/slide/2019-02-22-Indexing3.html
@ -0,0 +1,477 @@
+---
+template: templates/cse4562_2019_slides.erb
+title: "Indexing (Part 3) and Views"
+date: February 22, 2019
+textbook: "Papers and Ch. 8.1-8.2"
+---
+
+
+
+<section>
+  <!-- 
+    Key challenges: 
+      - Update-heavy workloads.  Can't afford to keep updating the index.
+
+    Things to talk about: 
+      - Tiered vs Leveled
+      - Fence Pointers
+
+  -->
+
+  <section>
+    <h2>Log-Structured Merge Trees</h2>
+  </section>
+
+  <section>
+    <p>Some filesystems (HDFS, S3, SSDs) don't like updates</p>
+
+    <p class="fragment">You don't update data, you rewrite the entire file (or a large fragment of it).</p>
+  </section>
+
+  <section>
+    <p><b>Idea 1:</b> Buffer updates, periodically write out new blocks to a "log".</p>
+
+    <ul>
+      <li class="fragment">Not organized!  Slooooow access</li>
+      <li class="fragment">Grows eternally!  Old values get duplicated</li>
+    </ul>
+  </section>
+
+  <section>
+    <p><b>Idea 2:</b> Keep data on disk sorted.  Buffer updates.  Periodically merge-sort buffer into the data.</p>
+
+    <ul>
+      <li class="fragment">$O(N)$ IOs to merge-sort</li>
+      <li class="fragment">"Write amplification" (each record gets read/written on all buffer merges).</li>
+    </ul>
+  </section>
+
+  <section>
+    <p><b>Idea 3:</b> Keep data on disk sorted, and in multiple "levels".  Buffer updates.</p>
+
+    <ol>
+      <li class="fragment">When buffer full, write to disk as Level 1.</li>
+      <li class="fragment">If Level 1 exists, merge buffer into Level 1 to create Level 2.</li>
+      <li class="fragment">If old Level 2 exists, merge new and old to create Level 3.</li>
+      <li class="fragment">etc...</li>
+    </ol>
+
+    <p class="fragment"><b>Key observation: </b> Level $i$ is $2^{i-1}$ times the size of the buffer (the size of the level doubles with each merge).</p>
+    <p class="fragment"><b>Result: </b> Each record copied <i>at most</i> $\log(N)$ times.</p>
+  </section>
+
+  <section>
+    <h3>Other design choices</h3>
+
+    <dl>
+      <div class="fragment">
+        <dt>Fanout</dt>
+        <dd>Instead of doubling the size of each level, have each level grow by a factor of $K$.  Level $i$ is merged into level $i+1$ when its size grows above $K^{i-1}$.</dd>
+      </div>
+
+      <div class="fragment">
+        <dt>"Tiered" (instead of "Leveled")</dt>
+        <dd>Store each level as $K$ sorted runs instead of proactively merging them.  Merge the runs together when escalating them to the next level.</dd>
+      </div>
+    </dl>
+  </section>
+
+  <section>
+    <h3>Other design choices</h3>
+
+    <dl>
+      <div class="fragment">
+        <dt>Fence Pointers</dt>
+        <dd>Separate each sorted run into blocks, and store the start/end keys for each block (makes it easier to evaluate selection predicates)</dd>
+      </div>
+
+      <div class="fragment">
+        <dt>Bloom Filters</dt>
+        <dd>Some data structures can be used to quickly answer lookups.</dd>
+      </div>
+    </dl>
+  </section>
+
+  <section>
+    <h3>References</h3>
+    <dl style="font-size: 80%;">
+      <dt><a href="https://link-springer-com.gate.lib.buffalo.edu/article/10.1007/s002360050048">"The log-structured merge-tree (LSM-tree)"</a> by O'Neil et. al.</dt>
+      <dd>The original LSM tree paper</dd>
+
+      <dt><a href="https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=2213862">"bLSM: a general purpose log structured merge tree"</a> by Sears et. al.</dt>
+      <dd>LSM Trees with background compaction.  Also a clear summary of LSM trees</dd>
+
+      <dt><a href="http://daslab.seas.harvard.edu/monkey/">Monkey: Optimal Navigable Key-Value Store</a></dt>
+      <dd>A comprehensive overview of the LSM Tree design space.</dd>
+    </dl>
+  </section>
+
+</section>
+
+
+<section>
+  <section>
+    <h3>CDF-Based Indexing</h3>
+    <p class="fragment" style="margin-top: 100px;"><b>"The Case for Learned Index Structures"</b><br/>by Kraska, Beutel, Chi, Dean, Polyzotis</p>
+  </section>
+
+  <section>
+    <svg data-src="graphics/2018-02-23-CDF-Linear.svg"/>
+  </section>
+
+  <section>
+    <svg data-src="graphics/2018-02-23-CDF-LinearApprox.svg"/>
+  </section>
+
+  <section>
+    <h3>Cumulative Distribution Function (CDF)</h3>
+    <img src="graphics/2018-02-23-CDF-Plot.png" />
+    <p>$f(key) \mapsto position$</p>
+    <p class="fragment" style="font-size: 50%">(not exactly true, but close enough for today)</p>
+  </section>
+
+  <section>
+    <h3>Using CDFs to find records</h3>
+    <dl>
+      <dt>Ideal: $f(k) = position$</dt>
+      <dd>$f$ encodes the <b>exact</b> location of a record</dd>
+
+      <dt class="fragment">Ok: $f(k) \approx position$<br/> <span class="fragment">$\left|f(k) - position\right| < \epsilon$</span></dt>
+      <dd class="fragment">$f$ gets you to within $\epsilon$ of the key</dd>
+      <dd class="fragment">Only need local search on one (or so) leaf pages.</dd>
+    </dl>
+    <p class="fragment"><b>Simplified Use Case:</b> Static data with "infinite" prep time.</p>
+  </section>
+
+  <section>
+    <h3>How to define $f$?</h3>
+    <ul>
+      <li class="fragment">Linear ($f(k) = a\cdot k + b$)</li>
+      <li class="fragment">Polynomial ($f(k) = a\cdot k + b \cdot k^2 + \ldots$)</li>
+      <li class="fragment">Neural Network ($f(k) = $<img src="graphics/Clipart/magic-wand.png" height="100px" style="vertical-align: middle;">)</li>
+    </ul>
+  </section>
+
+  <section>
+    <p>We have infinite prep time, so fit a (tiny) neural network to the CDF.</p>
+  </section>
+
+  <section>
+    <h3>Neural Networks</h3>
+    <ul>
+      <dt class="fragment" data-fragment-index="1">Extremely Generalized Regression</dt>
+      <dd class="fragment" data-fragment-index="1">Essentially a really really really complex, fittable function with a lot of parameters.</dd>
+      <dt class="fragment" data-fragment-index="2">Captures Nonlinearities</dt>
+      <dd class="fragment" data-fragment-index="2">Most regressions can't handle discontinuous functions, which many key spaces have.</dd>
+      <dt class="fragment" data-fragment-index="3">No Branching</dt>
+      <dd class="fragment" data-fragment-index="3"><code>if</code> statements are <b>really</b> expensive on modern processors.</dd>
+      <dd class="fragment" data-fragment-index="4">(Compare to B+Trees with $\log_2 N$ if statements)</dd>
+    </ul>
+  </section>
+
+  <section>
+    <h3>Summary</h3>
+
+    <dl style="font-size: 80%;">
+      <dt>Tree Indexes</dt>
+      <dd>$O(\log N)$ access, supports range queries, easy size changes.</dd>
+
+      <dt>Hash Indexes</dt>
+      <dd>$O(1)$ access, doesn't change size efficiently, only equality tests.</dd>
+
+      <dt>LSM Trees</dt>
+      <dd>$O(K\log(\frac{N}{B}))$ access.  Good for update-unfriendly filesystems.</dd>
+
+      <dt>CDF Indexes</dt>
+      <dd>$O(1)$ access, supports range queries, static data only.</dd>
+    </dl>
+  </section>
+</section>
+
+<section>
+  <section>
+    <p style="margin: 100px;">
+      $\sigma_C(R)$ <span style="margin: 50px">and</span> $(\ldots \bowtie_C R)$
+    </p>
+  </section>
+
+  <section>
+    <p>Original Query: $\pi_A\left(\sigma_{B = 1 \wedge C < 3}(R)\right)$</p>
+
+    <p>Possible Implementations:<dl>
+      <div>
+        <dt>$\pi_A\left(\sigma_{B = 1 \wedge C < 3}(R)\right)$</dt>
+        <dd class="fragment">Always works... but slow</dd>
+      </div>
+      <div class="fragment">
+        <dt>$\pi_A\left(\sigma_{\wedge B = 1}( IndexScan(R,\;C < 3) ) \right)$</dt>
+        <dd class="fragment">Requires a non-hash index on $C$</dd>
+      </div>
+      <div class="fragment">
+        <dt>$\pi_A\left(\sigma_{\wedge C < 3}( IndexScan(R,\;B=1) ) \right)$</dt>
+        <dd class="fragment">Requires a any index on $B$</dd>
+      </div>
+      <div class="fragment">
+        <dt>$\pi_A\left( IndexScan(R,\;B = 1, C < 3) \right)$</dt>
+        <dd class="fragment">Requires any index on $(B, C)$</dd>
+      </div>
+    </ul></p>
+  </section>
+
+  <section>
+    <h3>Lexical Sort (Non-Hash Only)</h3>
+
+    <p>Sort data on $(A, B, C, \ldots)$</p>
+    <p>First sort on $A$, $B$ is a tiebreaker for $A$,<br/> $C$ is a tiebreaker for $B$, etc...</p>
+
+    <dl>
+      <div class="fragment">
+        <dt>All of the $A$ values are adjacent.</dt>
+        <dd>Supports $\sigma_{A = a}$ or $\sigma_{A \geq b}$</dd>
+      </div>
+      <div class="fragment">
+        <dt>For a specific $A$, all of the $B$ values are adjacent</dt>
+        <dd>Supports $\sigma_{A = a \wedge B = b}$ or $\sigma_{A = a \wedge B \geq b}$</dd>
+      </div>
+      <div class="fragment">
+        <dt>For a specific $(A,B)$, all of the $C$ values are adjacent</dt>
+        <dd>Supports $\sigma_{A = a \wedge B = b \wedge C = c}$ or $\sigma_{A = a \wedge B = b \wedge C \geq c}$</dd>
+      </div>
+      <dt class="fragment">...</dt>
+    </dl>
+
+  </section>
+
+  <section>
+    <h3>For a query $\sigma_{c_1 \wedge \ldots \wedge c_N}(R)$</h3>
+    <ol>
+      <li class="fragment">For every $c_i \equiv (A = a)$: Do you have any index on $A$?</li>
+      <li class="fragment">For every $c_i \in \{\; (A \geq a), (A > a), (A \leq a), (A < a)\;\}$: Do you have a tree index on $A$?</li>
+      <li class="fragment">For every $c_i, c_j$, do you have an appropriate index?</li>
+      <li class="fragment">etc...</li>
+      <li class="fragment">A simple table scan is also an option</li>
+    </ol>
+    <p class="fragment">Which one do we pick?</p>
+    <p class="fragment">(You need to know the cost of each plan)</p>
+  </section>
+
+  <section>
+    <p>These are called "Access Paths"</p>
+  </section>
+
+  <section>
+    <h3>Strategies for Implementing $(\ldots \bowtie_{c} S)$</h3>
+
+    <dl>
+      <dt>Sort/Merge Join</dt>
+      <dd>Sort all of the data upfront, then scan over both sides.</dd>
+
+      <dt>In-Memory Index Join (1-pass Hash; Hash Join)</dt>
+      <dd>Build an in-memory index on one table, scan the other.</dd>
+
+      <dt>Partition Join (2-pass Hash; External Hash Join)</dt>
+      <dd>Partition both sides so that tuples don't join across partitions.</dd>
+
+      <dt class="fragment" data-fragment-index="1">Index Nested Loop Join</dt>
+      <dd class="fragment" data-fragment-index="1">Use an <i>existing</i> index instead of building one.</dd>
+    </dl>
+  </section>
+
+  <section>
+    <h3>Index Nested Loop Join</h3>
+
+    To compute $R \bowtie_{S.B > R.A} S$ with an index on $S.B$
+
+    <ol>
+      <li>Read one row of $R$</li>
+      <li>Get the value of $a = R.A$</li>
+      <li>Start index scan on $S.B > a$</li>
+      <li>Return all rows from the index scan</li>
+      <li>Read the next row of $R$ and repeat</li>
+    </ol>
+  </section>
+
+  <section>
+    <h3>Index Nested Loop Join</h3>
+
+    To compute $R \bowtie_{S.B\;[\theta]\;R.A} S$ with an index on $S.B$
+
+    <ol>
+      <li>Read one row of $R$</li>
+      <li>Get the value of $a = R.A$</li>
+      <li>Start index scan on $S.B\;[\theta]\;a$</li>
+      <li>Return all rows from the index scan</li>
+      <li>Read the next row of $R$ and repeat</li>
+    </ol>
+  </section>
+</section>
+
+<section>
+  <section>
+    <h2>Views</h2>
+  </section>
+
+  <section>
+    <pre><code class="sql">
+      SELECT partkey 
+      FROM lineitem l, orders o
+      WHERE l.orderkey = o.orderkey
+      AND o.orderdate >= DATE(NOW() - '1 Month')
+      ORDER BY shipdate DESC LIMIT 10;
+    </code></pre>
+    <pre><code class="sql">
+      SELECT suppkey, COUNT(*) 
+      FROM lineitem l, orders o
+      WHERE l.orderkey = o.orderkey
+      AND o.orderdate >= DATE(NOW() - '1 Month')
+      GROUP BY suppkey;
+    </code></pre>
+    <pre><code class="sql">
+      SELECT partkey, COUNT(*) 
+      FROM lineitem l, orders o
+      WHERE l.orderkey = o.orderkey
+      AND o.orderdate > DATE(NOW() - '1 Month')
+      GROUP BY partkey;
+    </code></pre>
+
+    <p class="fragment">All of these views share the same business logic!</p>
+  </section>
+
+  <section>
+    <p>Started as a convenience</p>
+
+    <pre><code class="sql">
+      CREATE VIEW salesSinceLastMonth AS
+        SELECT l.*
+        FROM lineitem l, orders o
+        WHERE l.orderkey = o.orderkey
+        AND o.orderdate > DATE(NOW() - '1 Month')
+    </code></pre>
+    <div class="fragment" style="font-size: 70%;">
+    <pre><code class="sql">
+      SELECT partkey FROM salesSinceLastMonth
+      ORDER BY shipdate DESC LIMIT 10;
+    </code></pre>
+    <pre><code class="sql">
+      SELECT suppkey, COUNT(*)
+      FROM salesSinceLastMonth
+      GROUP BY suppkey;
+    </code></pre>
+    <pre><code class="sql">
+      SELECT partkey, COUNT(*)
+      FROM salesSinceLastMonth
+      GROUP BY partkey;
+    </code></pre>
+    </div>
+  </section>
+
+  <section>
+    <p>But also useful for performance</p>
+
+    <pre><code class="sql">
+      CREATE MATERIALIZED VIEW salesSinceLastMonth AS
+        SELECT l.*
+        FROM lineitem l, orders o
+        WHERE l.orderkey = o.orderkey
+        AND o.orderdate > DATE(NOW() - '1 Month')
+    </code></pre>
+
+    <p><i>Materializing</i> the view, or pre-computing and saving the view lets us answer all of the queries on the view faster!</p>
+  </section>
+
+  <section>
+    <p>What if the query doesn't use the view?</p>
+
+    <pre><code class="sql">
+      SELECT l.partkey
+      FROM lineitem l, orders o
+      WHERE l.orderkey = o.orderkey
+      AND o.orderdate > DATE(’2015-03-31’)
+      ORDER BY l.shipdate DESC
+      LIMIT 10;
+    </code></pre>
+    <p class="fragment">Can we detect that a query could be answered with a view?</p>
+  </section>
+
+  <section>
+    <p>(sometimes)</p>
+  </section>
+
+  <section>
+    <table>
+      <tr><th>View Query</th><td style="width: 100px;">&nbsp;</td><th>User Query</th></tr>
+      <tr><td>
+        <code>SELECT $L_v$</code><br/>
+        <code>FROM $R_v$</code><br/>
+        <code>WHERE $C_v$</code>
+      </td><td></td><td>
+        <code>SELECT $L_q$</code><br/>
+        <code>FROM $R_q$</code><br/>
+        <code>WHERE $C_q$</code>
+      </td></tr>
+    </table>
+
+    <p>When are we allowed to rewrite this table?</p>
+  </section>
+
+  <section>
+    <table>
+      <tr><th>View Query</th><td style="width: 100px;">&nbsp;</td><th>User Query</th></tr>
+      <tr><td>
+        <code>SELECT $L_v$</code><br/>
+        <code>FROM $R_v$</code><br/>
+        <code>WHERE $C_v$</code>
+      </td><td></td><td>
+        <code>SELECT $L_q$</code><br/>
+        <code>FROM $R_q$</code><br/>
+        <code>WHERE $C_q$</code>
+      </td></tr>
+    </table>
+    <dl>
+      <dt>$R_V \subseteq R_Q$</dt>
+      <dd>All relations in the view are part of the query join</dd>
+
+      <dt>$C_Q = C_V \wedge C'$</dt>
+      <dd>The view condition is 'weaker' than the query condition</dd>
+
+      <dt>$attrs(C') \cap attrs(R_V) \subseteq L_V$ &nbsp;&nbsp;&nbsp; $L_Q \cap attrs(R_V) \subseteq L_V$</dt>
+      <dd>The view doesn't project away needed attributes</dd>
+    </dl>
+  </section>
+
+  <section>
+    <table>
+      <tr><th>View Query</th><td style="width: 100px;">&nbsp;</td><th>User Query</th></tr>
+      <tr><td>
+        <code>SELECT $L_v$</code><br/>
+        <code>FROM $R_v$</code><br/>
+        <code>WHERE $C_v$</code>
+      </td><td></td><td>
+        <code>SELECT $L_q$</code><br/>
+        <code>FROM $R_q$</code><br/>
+        <code>WHERE $C_q$</code>
+      </td></tr>
+    </table>
+
+    <center>
+      <div style="padding-top: 100px; text-align: left; width: 500px">
+        <code>SELECT $L_Q$</code><br/>
+        <code>FROM $(R_Q - R_V)$, view</code><br/>
+        <code>WHERE $C_Q$</code>
+      </div>
+    </center>
+  </section>
+</section>
+
+<section>
+  <h2>Summary</h2>
+
+  <ul>
+    <li>For each relation, identify candidate indexes</li>
+    <li>For each join, identify candidate indexes</li>
+    <li>Identify candidate views</li>
+    <li>Identify available join, aggregate, sort algorithms</li>
+  </ul>
+  <p>Enumerate <b>all possible</b> plans</p>
+  <p class="fragment">... then how do you pick? (more next class)</p>
+</section>