Checkpoint 2

This commit is contained in:
Oliver Kennedy 2017-03-17 18:34:02 -04:00
parent f3e68a9464
commit e5977f88cd
2 changed files with 32 additions and 3 deletions

View file

@ -1,6 +1,7 @@
---
title: CSE-562; Project 1
---
<h1>Checkpoint 1</h1>
<ul>
<li><strong>Overview</strong>: Answer Select/Project/Aggregate Queries
<li><strong>Deadline</strong>: Friday, March 10</li>

View file

@ -1,9 +1,10 @@
---
title: CSE-562; Project 2
---
<h1>Checkpoint 2</h1>
<ul>
<li><strong>Overview</strong>: Add Sort/Limit/Join
<li><strong>Deadline</strong>: TBD</li>
<li><strong>Overview</strong>: New SQL features, Limited Memory, Faster Performance
<li><strong>Deadline</strong>: April 13</li>
<li><strong>Grade</strong>: 15% of Project Component
<ul>
<li>5% Correctness</li>
@ -13,4 +14,31 @@ title: CSE-562; Project 2
</li>
</ul>
<div style="text-color: red">In Progress</div>
<p>This project follows the same outline as Checkpoint 1. Your code gets SQL queries and is expected to answer them. There are a few key differences:
<ul>
<li>Queries may now include a <tt>ORDER BY</tt> clause. Because <b>sort</b> is a blocking, or 2-pass operator, you will need to handle both the case where you can fit everything into memory and the case where you can not.</li>
<li>Queries may now include a <tt>LIMIT</tt> clause, a <tt>GROUP BY</tt> clause, and/or a <tt>FROM</tt>-nested subquery.</li>
<li>You get more time to process <tt>CREATE TABLE</tt> statements.</li>
<li><tt>CREATE TABLE</tt> statements may now include <tt>INDEX</tt> and/or <tt>PRIMARY KEY</tt> directives.</li>
<li>You will be expected to process queries faster.</li>
</ul>
</p>
<h2>Sorting and Grouping Data</h2>
<p>Sort is a blocking operator. Before it emits even one row, it needs to see the entire dataset. If you have enough memory to hold the entire input to be sorted, then you can just use Java's built-in <a href="http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#sort(java.util.List,%20java.util.Comparator)">Collections.sort</a> method. However, for at least a few queries you will likely not have enough memory to keep everything available. In that case, a good option is to use the 2-pass sort algorithm that we discussed in class.</p>
<p>Group-by aggregates are also a blocking operator. If you run out of memory for the groups, you will need to implement a memory-aware grouping operator. One idea is to re-use the sort operator to group values together and use the sorted grouping technique that we discussed in class.</p>
<h2>Preprocessing</h2>
<p>Your code will be tested in 2 phases. In the first phase, you will have 1GB of memory and 2 minutes with each <tt>CREATE TABLE</tt> statement. In the second phase, you will have 150MB of memory and 5 minutes with each <tt>CREATE TABLE</tt> statement. The reference implementation uses this time to build indexes over the data in-memory and/or on-disk, depending on phase. Students in prior years have come up with other creative ways to use this time.</p>
<p><tt>CREATE TABLE</tt> statements will include index suggestions, both via unique <tt>PRIMARY KEY</tt> attributes and non-unique <tt>INDEX</tt> fields. You can get access to both through the <a href="http://doc.odin.cse.buffalo.edu/jsqlparser/net/sf/jsqlparser/statement/create/table/CreateTable.html#getIndexes--">getIndexes()</a> method of <a href="http://doc.odin.cse.buffalo.edu/jsqlparser/net/sf/jsqlparser/statement/create/table/CreateTable.html">CreateTable</a></p>
<h2>Grading Workflow</h2>
<p>As before, all .java files in the src directory at the root of your repository will be compiled (and linked against JSQLParser). Also as before, the class <tt>dubstep.Main</tt> will be invoked with no arguments, and a stream of <b>semicolon-delimited</b> queries will be printed to System.in (after you print out a prompt)</p>
<pre>
bash&gt; <span style="color: red">java -cp build:jsqlparser.jar dubstep.Main -</span>
$> <span style="color: red">CREATE TABLE R(A int, B int, C int);</span>
$> <span style="color: red">SELECT B, C FROM R WHERE A = 1;</span>
</pre>