From e96830bad65485b90622789b185da589882c6381 Mon Sep 17 00:00:00 2001 From: Oliver Kennedy Date: Wed, 15 Mar 2017 23:24:26 -0400 Subject: [PATCH] Checkpoint 2 --- .../slides/2017-03-16-Checkpoint 2.html | 231 ++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100755 src/teaching/cse-562/2017sp/slides/2017-03-16-Checkpoint 2.html diff --git a/src/teaching/cse-562/2017sp/slides/2017-03-16-Checkpoint 2.html b/src/teaching/cse-562/2017sp/slides/2017-03-16-Checkpoint 2.html new file mode 100755 index 00000000..5e64f231 --- /dev/null +++ b/src/teaching/cse-562/2017sp/slides/2017-03-16-Checkpoint 2.html @@ -0,0 +1,231 @@ + + + + + + + CSE 562 - Database Systems + + + + + + + + + + + + + + + + + + + + + +
+ + + + + +
+
+

CSE 562

+

March 16 - Checkpoint 2

+
+ +
+

Checkpoint 2

+ + + + + + sif$ + + javac -cp build:*.jar -D build {all .java files} + sif$ + + + + java -cp build:jsqlparser.jar:evallib.jar dubstep.Main
 - --preload --in-mem + $> + + + CREATE TABLE R( + A int, B int, C int, + PRIMARY KEY (A), + INDEX B_INDEX (B) + ); + $> + + + SELECT A, SUM(C) FROM R WHERE B < 5 GROUP BY A; + 1|5 + ... + $> + + +
+ +
+

Checkpoint 2

+
    +
  • Two parts, one in-memory, one with more data than memory.
  • +
  • Preprocessing Step (You get 2 or 5 minutes for each CREATE TABLE)
  • +
  • New SQL features (GROUP-BY, LIMIT, SORT, From-Nesting)
  • +
  • Create table now includes "hints" (PRIMARY KEY, INDEX)
  • +
  • Tighter Constraints (Bigger data and less time/query)
  • +
+
+
+ +
+
+
+
--in-mem
+
Java will be allowed to use 1GB of heap space (-Xmx1g), and the CSV representation of all input data data will be under 50MB (similar size to checkpoint 1's 'Big Data').
+
--on-disk
+
Java will be allowed to use 150MB of heap space (-Xmx150m), and the textual representation of your data will be over 100MB.
+
+

As a general guideline, Java becomes incredibly slow once you hit 50% memory usage as you spend more time in the garbage collector than in your code.

+
+ +
+
+
--preload
+
Signals that CREATE TABLE statements will be allowed to take 2 minutes in in-memory mode, or 5 minutes in on-disk mode. This is a good opportunity to sort your data, and/or create the index structures suggested in the statement body.
+
+

Professor's note: For an illustration of the potential benefits of a longer CREATE TABLE step, have a look at the leaderboards for Checkpoint 1.

+
+
+ +
+
+

New SQL Features: Sort

+
+
ORDER BY col1 asc/desc, col2 asc/desc, ...
+
Sort the data on col1 (using col2, col3, ... as tiebreakers) in ascending or descending order.
+
+
    +
  • You will likely need 2 implementations:
      +
    • In-Memory (Reference Impl Uses Java's Collections.sort)
    • +
    • On-Disk (Reference Impl Uses 2-Pass Sort)
    • +
  • +
+
+ +
+

New SQL Features: Limit

+
+
LIMIT N
+
Return only the first N rows. If the data is sorted, return the first rows according to the sort order. If not, return an arbitrary N rows.
+
+

JSQLParser also supports more expressive limit clauses (e.g., including offsets). You will not be required to support anything more complex than LIMIT N.

+
+ +
+

New SQL Features: Group-By

+
+
SELECT A, B, SUM(C), ... FROM ... GROUP BY A, B
+
Group the data by the A and B columns and apply the aggregate functions to each group.
+
+
+ +
+

New SQL Features: From Nesting

+
+
SELECT * FROM (SELECT A, B FROM R) q
+
You're still not going to get more than one FromItem, but now it can be a SELECT query instead of a table.
+
+

If you used iterators for checkpoint 1, you should just be able to replace a Table Scan iterator with the iterator you construct for the nested query

+
+
+ +
+
+

Handling Tight Constraints

+
+
Index Scans
+
You'll need to detect when opportunities to use them arise. Remember that your WHERE clause may be an AND of different options, and you may have multiple indices that you can use.
+
Selection Pushdown
+
Selection Pushdown is ALWAYS good. This is most likely to arise in the context of nested subqueries, but if you implement it in a general way you'll benefit in Checkpoint 3 as well.
+
+
+ +
+

Pattern Matching

+ Remember pattern matching on Statement and Expression objects? Try it on Iterators! +

+Iterator optimize(Iterator query) {
+  if(query instanceof Filter){
+    Filter f = (Filter)query;
+    if(f.input instanceof Project){
+      Project p (Project)f.input;
+      // replace query with a new pair of iterators where 
+      // the Project uses the Filter as an input. 
+    }
+  }
+  // Recur
+  query.input = optimize(query.input);
+  return query;
+}
+		
+
+
+ +
+

Questions?

+
+ + + + + + + + + +
+ + + + + + + +