From 058d21f24f9640870d204e0ffedeb8ffd1190771 Mon Sep 17 00:00:00 2001 From: Gokhan Kul Date: Tue, 20 Feb 2018 22:45:26 -0500 Subject: [PATCH] Checkpoint 1 page is updated with the build instructions --- src/teaching/cse-562/2018sp/checkpoint1.erb | 29 ++++++++++++--------- 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/src/teaching/cse-562/2018sp/checkpoint1.erb b/src/teaching/cse-562/2018sp/checkpoint1.erb index 57cb6e54..1ed1c1a8 100644 --- a/src/teaching/cse-562/2018sp/checkpoint1.erb +++ b/src/teaching/cse-562/2018sp/checkpoint1.erb @@ -34,7 +34,7 @@ Your task is to answer these queries as they arrive.

Volcano-Style Computation (Iterators)

Let's take a look at the script we've used as an example in class. -
with open('data.csv', 'r') as f:
+
with open('data.dat', 'r') as f:
  for line in f:
    fields = split(",", line)
    if(fields[2] != "Ensign" and int(fields[3]) > 25):
@@ -42,7 +42,7 @@ Let's take a look at the script we've used as an example in class.

This script is basically a form of pattern 3 above -

SELECT fields[1] FROM 'data.csv' 
+
SELECT fields[1] FROM 'data.dat' 
 WHERE fields[2] != "Ensign" AND CAST(fields[3] AS int) > 25
 

@@ -58,11 +58,11 @@ WHERE fields[2] != "Ensign" AND CAST(fields[3] AS int) > 25

This is nice and simple, but the code is very specific to pattern 3. That's something that will lead us into trouble. To see a simple example of the sort of problems we're going to run into, let's come up with an example of pattern 5:

-
SELECT height + weight FROM 'data.csv' WHERE rank != 'Ensign'
+
SELECT height + weight FROM 'data.dat' WHERE rank != 'Ensign'
That is, we're asking for the sum of height and weight of each non-ensign in our example table. An equivalent script would be...

total = 0

-with open('data.csv', 'r') as f:
+with open('data.dat', 'r') as f:
  for line in f:
    fields = split(",", line)
    if fields[2] != 'Ensign':
@@ -207,7 +207,7 @@ Eval eval = new Eval(){ -

In addition to the schema, you will find a corresponding [tablename].csv file in the data directory. The name of the table corresponds to the table names given in the CREATE TABLE statements your code receives. For example, let's say that you see the following statement in your query file:

+

In addition to the schema, you will find a corresponding [tablename].dat file in the data directory. The name of the table corresponds to the table names given in the CREATE TABLE statements your code receives. For example, let's say that you see the following statement in your query file:

CREATE TABLE R(A int, B int, C int);

That means that the data directory contains a data file called 'R.dat' that might look like this:

1|1|5
@@ -216,18 +216,20 @@ Eval eval = new Eval(){
 

Each line of text (see BufferedReader.readLine()) corresponds to one row of data. Each record is delimited by a vertical pipe '|' character.  Integers and floats are stored in a form recognized by Java’s Long.parseLong() and Double.parseDouble() methods. Dates are stored in YYYY-MM-DD form, where YYYY is the 4-digit year, MM is the 2-digit month number, and DD is the 2-digit date. Strings are stored unescaped and unquoted and are guaranteed to contain no vertical pipe characters.

Grading Workflow

-

All .java files in the src directory at the root of your repository will be compiled (and linked against JSQLParser). As before, the class edu.buffalo.www.cse4562.Main will be invoked with no arguments, and a stream of semicolon-delimited queries will be printed to System.in (after you print out a prompt)

+

All .java files in the src directory at the root of your repository will be compiled (and linked against JSQLParser). A main file that you can take as an example is given here. As before, the class edu.buffalo.www.cse4562.Main will be invoked with no arguments, and a stream of semicolon-delimited queries will be printed to System.in (after you print out a prompt)

For example (red text is entered by the user/grader):

bash> ls data
-R.csv
-S.csv
-T.csv
-bash> cat data/R.csv
+R.dat
+S.dat
+T.dat
+bash> cat data/R.dat
 1|1|5
 1|2|6
 2|3|7
-bash> java -cp build:jsqlparser.jar edu.buffalo.www.cse4562.Main -
+bash> find {code root directory} -name \*.java -print > compile.list
+bash> javac -cp {libs location}/commons-csv-1.5.jar:{libs location}/evallib-1.0.jar:{libs location}/jsqlparser-1.0.0.jar -d {compiled directory name} @compile.list
+bash> java -cp {compiled directory name}/src/:{libs location}/commons-csv-1.5.jar:{libs location}/evallib-1.0.jar:{libs location}/jsqlparser-1.0.0.jar edu.buffalo.www.cse4562.Main --data data/
 $> CREATE TABLE R(A int, B int, C int);
 $> SELECT B, C FROM R WHERE A = 1;
 1|5
@@ -238,7 +240,7 @@ $> SELECT A + B AS Q FROM R;
 5
 
-

For this project, we will issue 5 queries to your program excluding CREATE TABLE queries. 3 of these queries will NOT be timed, and they will evaluated based on the correctness of the query results. Answering each query successfully will bring you 1 point each. An example file you will read the data from is given here. This file is the same size and has the same structure with what we will use to evaluate these three queries. The remaining two queries will be timed, and they will run on a file that has 4000 tuples (~200 KB). You will get 2 point for each query if you can return correct results. You will receive additional 1.5 points for each query for matching or beating the reference implementation timewise. Also keep in mind that for ALL queries, the grader will time out and exit after 5 minutes. +

For this project, we will issue 5 queries to your program excluding CREATE TABLE queries. 3 of these queries will NOT be timed, and they will evaluated based on the correctness of the query results. Answering each query successfully will bring you 1 point each. An example file you will read the data from is given here. This file is the same size and has the same structure with what we will use to evaluate these three queries. The remaining two queries will be timed, and they will run on a file that has 4000 tuples (~200 KB). You will get 2 point for each query if you can return correct results. You will receive additional 1.5 points for each query for matching or beating the reference implementation timewise. Also keep in mind that for ALL queries, the grader will time out and exit after 5 minutes.