Website/src/teaching/cse-562/2019sp/slide/2019-05-03-Checkpoint4.erb

355 lines
7.5 KiB
Plaintext

---
template: templates/cse4562_2019_slides.erb
title: Checkpoint 4
date: May 3, 2019
textbook:
dependencies:
- lib/slide_utils.rb
---
<%
require "slide_utils.rb"
%>
<section>
<section>
<h3>A few things first...</h3>
</section>
<section>
<img src="graphics/2019-05-03-DemoDay.png" class="stretch" />
</section>
<section>
<h3>4/562 Databake Off @ 3:00</h3>
<p>RSVP (limited space available) to participate</p>
</section>
<section>
<h3>A note on optimization...</h3>
<p>Lots of interesting strategies used in Checkpoint 3</p>
<ul>
<li>Pre-parsing</li>
<li>Column Stores</li>
<li>Cost-based Opt</li>
<li class="fragment">Hyper-optimize the slowest query</li>
</ul>
</section>
</section>
<section>
<section>
<h2>Checkpoint 4</h2>
<h3>Implement Updates</h3>
<p class="fragment">(lambda-architecture edition)</p>
<p class="fragment">Due May 20</p>
</section>
<section>
<ul>
<li>A stream of inserts, deletes, updates, and queries.</li>
<li>No restarts.</li>
<li>Answer queries as fast as possible.</li>
<li>Make sure query results account for DDL effects.</li>
</ul>
</section>
<section>
<dl>
<dt>Stage 0</dt>
<dd>10 minutes of prep</dd>
<dt>Stage 1</dt>
<dd>Inserts only</dd>
<dt>Stage 2</dt>
<dd>Inserts + Deletes</dd>
<dt>Stage 3</dt>
<dd>Inserts + Deletes + Updates</dd>
</dl>
<p class="fragment">No restarts.</p>
</section>
</section>
<section>
<section>
<h3>Do I need to implement block-based storage?</h3>
<p class="fragment">No (although you can).</p>
<p class="fragment">Ok... so what else can I do?</p>
</section>
<section>
<h3>Classical Databases</h3>
<img src="graphics/2018-02-19-PrimaryVsSecondary.png" />
</section>
<section>
<p><b>Problem 1:</b> More indexes = Slower writes (bad for OLTP)</p>
<p><b>Problem 2:</b> Fewer indexes = Slower reads (bad for OLAP)</p>
</section>
<section>
<p>What if you have both OLAP and OLTP workloads?</p>
</section>
<section>
<p><b>Idea:</b> Weekly / Nightly / Hourly dump<br/>from OLTP System to OLAP system.</p>
<p class="fragment">(Index the data while dumping)</p>
</section>
<section>
<p><b>Problem:</b> Not seeing the freshest data!</p>
</section>
<section>
<p><b>Better Idea:</b> OLTP DB + OLAP DB.</p>
<p class="fragment">OLTP DB has few indexes, but only stores recent updates.</p>
<p class="fragment">OLAP DB has many indexes, and stores everything except recent updates.</p>
<p class="fragment">Periodically migrate updates into OLAP DB.</p>
<p class="fragment">(Lambda Architecture)</p>
</section>
<section>
<h2>Checkpoint 4</h2>
<h3>Suggested Approach: Lambda-Lite</h3>
</section>
</section>
<section>
<section>
<h3>Handling Inserts</h3>
</section>
<section>
<pre><code class="sql">
INSERT INTO FOO(A, B, C) VALUES (1, 2, 3);
</code></pre>
</section>
<section>
<%=
relational_algebra() do
ra_table("Orig")
end
%>
</section>
<section>
<%=
relational_algebra(debug: false) do
ra_union(
ra_table("Orig"),
ra_table("New")
)
end
%>
</section>
</section>
<section>
<section>
<h3>Example</h3>
</section>
<section>
<pre><code class="sql">
SELECT COUNT(*) FROM lineitem WHERE mktsegment = 'BUILDING';
</code></pre>
</section>
<section>
<%=
relational_algebra do
ra_aggregate(nil, "COUNT(*)",
ra_select("mktsegment = 'BUILDING'",
ra_table("lineitem")
)
)
end
%>
</section>
<section>
<%=
relational_algebra do
ra_aggregate(nil, "COUNT(*)",
ra_select("mktsegment = 'BUILDING'",
ra_union(
ra_table("lineitem"),
ra_table("inserts")
)
)
)
end
%>
</section>
</section>
<section>
<section>
<h3>Handling Deletes</h3>
</section>
<section>
<pre><code class="sql">
DELETE FROM FOO WHERE A > 5;
</code></pre>
</section>
<section>
<%=
relational_algebra do
ra_table("Orig")
end
%>
</section>
<section>
<%=
relational_algebra do
ra_diff(
ra_table("Orig"),
ra_table("New")
)
end
%>
<p class="fragment">... but that's not quite how SQL Delete works.</p>
</section>
<section>
<pre><code class="sql">
DELETE FROM FOO WHERE A > 5;
</code></pre>
<div class="fragment">
<%=
relational_algebra do
ra_select("A ≤ 5",
ra_table("FOO")
)
end
%>
</div>
</section>
<section>
<pre><code class="sql">
DELETE FROM Orig WHERE Something;
</code></pre>
<%=
relational_algebra do
ra_select("NOT Something",
ra_table("Orig")
)
end
%>
</section>
</section>
<section>
<section>
<h3>Example</h3>
</section>
<section>
<pre><code class="sql">
INSERT INTO lineitem(...) VALUES (...);
INSERT INTO lineitem(...) VALUES (...);
DELETE FROM lineitem WHERE shipdate BETWEEN date(1997-10-01)
AND date(1997-10-30);
SELECT COUNT(*) FROM lineitem WHERE mktsegment = 'BUILDING';
</code></pre>
</section>
<section>
<%=
relational_algebra do
ra_aggregate(nil, "COUNT(*)",
ra_select("mktsegment = 'BUILDING'",
ra_table("lineitem")
)
)
end
%>
</section>
<section>
<%=
relational_algebra do
ra_aggregate(nil, "COUNT(*)",
ra_select("mktsegment = 'BUILDING'",
ra_union(
ra_table("lineitem"),
ra_table("inserts")
)
)
)
end
%>
</section>
<section>
<%=
relational_algebra do
ra_aggregate(nil, "COUNT(*)",
ra_select("mktsegment = 'BUILDING'",
ra_select("shipdate NOT BETWEEN ...",
ra_union(
ra_table("lineitem"),
ra_table("inserts")
)
)
)
)
end
%>
</section>
</section>
<section>
<section>
<h3>Handling Updates</h3>
</section>
<section>
<pre><code class="sql">
UPDATE Foo SET A = 1, B = 2 WHERE C = 3;
</code></pre>
</section>
<section>
<pre><code class="sql">
UPDATE Foo SET A = 1, B = 2 WHERE C = 3;
</code></pre>
<%=
relational_algebra do
ra_union(
ra_select( "C = 3",
ra_project( { A: "1", B: "2", C: "C" },
ra_table("Foo")
)
),
ra_select( "C ≠ 3",
ra_table("Foo")
)
)
end
%>
</section>
<section>
<pre><code class="sql">
UPDATE Foo SET A = 1, B = 2 WHERE C = 3;
</code></pre>
<%=
relational_algebra do
ra_project( { A: "CASE WHEN C = 3 THEN 1 ELSE A END", B: "CASE ...", C: "C"},
ra_table("Foo")
)
end
%>
<pre class="fragment "><code class="sql">
SELECT CASE WHEN C = 3 THEN 1 ELSE A END AS A,
CASE WHEN C = 3 THEN 2 ELSE B END AS B,
C AS C
FROM Foo;
</code></pre>
</section>
</section>
<section>
<h3>Final Advice</h3>
<ul>
<li class="fragment">This isn't the only way to implement updates.</li>
<li class="fragment">Optimizer performance is crucial!</li>
<li class="fragment">Consider periodically pausing to collapse updates</li>
</ul>
</section>