--- template: templates/cse4562_2019_slides.erb title: Checkpoint 4 date: May 3, 2019 textbook: dependencies: - lib/slide_utils.rb --- <% require "slide_utils.rb" %>

A few things first...

4/562 Databake Off @ 3:00

RSVP (limited space available) to participate

A note on optimization...

Lots of interesting strategies used in Checkpoint 3

Checkpoint 4

Implement Updates

(lambda-architecture edition)

Due May 20

Stage 0
10 minutes of prep
Stage 1
Inserts only
Stage 2
Inserts + Deletes
Stage 3
Inserts + Deletes + Updates

No restarts.

Do I need to implement block-based storage?

No (although you can).

Ok... so what else can I do?

Classical Databases

Problem 1: More indexes = Slower writes (bad for OLTP)

Problem 2: Fewer indexes = Slower reads (bad for OLAP)

What if you have both OLAP and OLTP workloads?

Idea: Weekly / Nightly / Hourly dump
from OLTP System to OLAP system.

(Index the data while dumping)

Problem: Not seeing the freshest data!

Better Idea: OLTP DB + OLAP DB.

OLTP DB has few indexes, but only stores recent updates.

OLAP DB has many indexes, and stores everything except recent updates.

Periodically migrate updates into OLAP DB.

(Lambda Architecture)

Checkpoint 4

Suggested Approach: Lambda-Lite

Handling Inserts


              INSERT INTO FOO(A, B, C) VALUES (1, 2, 3);
    
<%= relational_algebra() do ra_table("Orig") end %>
<%= relational_algebra(debug: false) do ra_union( ra_table("Orig"), ra_table("New") ) end %>

Example


      SELECT COUNT(*) FROM lineitem WHERE mktsegment = 'BUILDING';
    
<%= relational_algebra do ra_aggregate(nil, "COUNT(*)", ra_select("mktsegment = 'BUILDING'", ra_table("lineitem") ) ) end %>
<%= relational_algebra do ra_aggregate(nil, "COUNT(*)", ra_select("mktsegment = 'BUILDING'", ra_union( ra_table("lineitem"), ra_table("inserts") ) ) ) end %>

Handling Deletes


                  DELETE FROM FOO WHERE A > 5;
    
<%= relational_algebra do ra_table("Orig") end %>
<%= relational_algebra do ra_diff( ra_table("Orig"), ra_table("New") ) end %>

... but that's not quite how SQL Delete works.


                      DELETE FROM FOO WHERE A > 5;
    
<%= relational_algebra do ra_select("A ≤ 5", ra_table("FOO") ) end %>

                      DELETE FROM Orig WHERE Something;
    
<%= relational_algebra do ra_select("NOT Something", ra_table("Orig") ) end %>

Example


    INSERT INTO lineitem(...) VALUES (...);
    INSERT INTO lineitem(...) VALUES (...);
    DELETE FROM lineitem WHERE shipdate BETWEEN date(1997-10-01) 
                                            AND date(1997-10-30);
    SELECT COUNT(*) FROM lineitem WHERE mktsegment = 'BUILDING';
    
<%= relational_algebra do ra_aggregate(nil, "COUNT(*)", ra_select("mktsegment = 'BUILDING'", ra_table("lineitem") ) ) end %>
<%= relational_algebra do ra_aggregate(nil, "COUNT(*)", ra_select("mktsegment = 'BUILDING'", ra_union( ra_table("lineitem"), ra_table("inserts") ) ) ) end %>
<%= relational_algebra do ra_aggregate(nil, "COUNT(*)", ra_select("mktsegment = 'BUILDING'", ra_select("shipdate NOT BETWEEN ...", ra_union( ra_table("lineitem"), ra_table("inserts") ) ) ) ) end %>

Handling Updates


            UPDATE Foo SET A = 1, B = 2 WHERE C = 3;
    

            UPDATE Foo SET A = 1, B = 2 WHERE C = 3;
    
<%= relational_algebra do ra_union( ra_select( "C = 3", ra_project( { A: "1", B: "2", C: "C" }, ra_table("Foo") ) ), ra_select( "C ≠ 3", ra_table("Foo") ) ) end %>

            UPDATE Foo SET A = 1, B = 2 WHERE C = 3;
    
<%= relational_algebra do ra_project( { A: "CASE WHEN C = 3 THEN 1 ELSE A END", B: "CASE ...", C: "C"}, ra_table("Foo") ) end %>

      SELECT CASE WHEN C = 3 THEN 1 ELSE A END AS A,
             CASE WHEN C = 3 THEN 2 ELSE B END AS B,
             C AS C
      FROM Foo;
    

Final Advice