diff --git a/Rakefile b/Rakefile index 39879df6..200630f9 100644 --- a/Rakefile +++ b/Rakefile @@ -8,6 +8,7 @@ require "cv.rb" require "nsfcp.rb" require "nsfconflicts.rb" require "bootstrap_markdown.rb" +require "disqus.rb" include GemSmith $db = JDB.new("db") diff --git a/lib/disqus.rb b/lib/disqus.rb new file mode 100644 index 00000000..9fa858b9 --- /dev/null +++ b/lib/disqus.rb @@ -0,0 +1,22 @@ + +module Disqus + + def Disqus::embed(url, identifier) + return "
+ + + " + end + +end \ No newline at end of file diff --git a/src/teaching/cse-662/2017fa/feedback/01-cracking.erb b/src/teaching/cse-662/2017fa/feedback/01-cracking.erb new file mode 100644 index 00000000..3346b3da --- /dev/null +++ b/src/teaching/cse-662/2017fa/feedback/01-cracking.erb @@ -0,0 +1,13 @@ +

Reading Assignment 1: Adaptive Indexing

+ +
+
Paper
+
Database Cracking
+
+ +

All students should provide one short paragraph identifying at least one strength and at least one weakness of the approach described in the week's reading.

+ +<%= Disqus::embed( + "http://odin.cse.buffalo.edu/teaching/cse-662/2017fa/group_formation.html", + "cse662.2017fa.group_formation" +) %> \ No newline at end of file diff --git a/src/teaching/cse-662/2017fa/group_formation.erb b/src/teaching/cse-662/2017fa/group_formation.erb new file mode 100644 index 00000000..8808f827 --- /dev/null +++ b/src/teaching/cse-662/2017fa/group_formation.erb @@ -0,0 +1,6 @@ +

Group Formation Thread

+ +<%= Disqus::embed( + "http://odin.cse.buffalo.edu/teaching/cse-662/2017fa/group_formation.html", + "cse662.2017fa.group_formation" +) %> \ No newline at end of file diff --git a/src/teaching/cse-662/2017fa/index.md b/src/teaching/cse-662/2017fa/index.md index 3bb5c6d7..137daa32 100644 --- a/src/teaching/cse-662/2017fa/index.md +++ b/src/teaching/cse-662/2017fa/index.md @@ -73,12 +73,12 @@ After the taking the course, students should be able to: ## Course Schedule -* **Aug. 28** : Introduction ([overview](2017-08-28-Introduction.html)) +* **Aug. 28** : Introduction ( [group formation](group_formation.html) | slides ) * **Aug. 30** : Project Seeds - Mimir * **Sept. 01** : Project Seeds - JITDs & PocketData * **Sept. 04** : Database Cracking ( [Cracking](http://stratos.seas.harvard.edu/files/IKM_CIDR07.pdf) ) * **Sept. 06** : Functional Data Structures -* **Sept. 12** : Just-in-Time Data Structures ( [JITDs])(http://odin.cse.buffalo.edu/papers/2015/CIDR-jitd-final.pdf) ) +* **Sept. 12** : Just-in-Time Data Structures ( [JITDs](http://odin.cse.buffalo.edu/papers/2015/CIDR-jitd-final.pdf) ) * **Sept. 8** : Incomplete Databases 1 * **Sept. 11** : Incomplete Databases 2 * **Sept. 13** : Incomplete Databases 3 @@ -120,7 +120,7 @@ There are a number of reasons that data might go bad: sensor errors, data entry 3. Using Mimir to warning users when a query result depends on a tuple that participates in a violation 4. Suggesting and ranking modifications that repair violations -###### Background material: +###### Background Material: * [Sampling from Repairs](https://cs.uwaterloo.ca/~ilyas/papers/BeskalesVLDBJ2014.pdf) * [Qualitative Data Cleaning](http://dl.acm.org/citation.cfm?id=3007320) @@ -138,7 +138,7 @@ Most probabilistic database systems aim to produce all possible results. A few, Perhaps counterintuitively, our preliminary implementations of the [Interleave](https://github.com/UBOdin/mimir/blob/master/src/main/scala/mimir/exec/mode/SampleRows.scala) and [Tuple Bundle](https://github.com/UBOdin/mimir/blob/master/src/main/scala/mimir/exec/mode/TupleBundle.scala) algorithms suggest that none of these approaches will be the best in all cases. For example, in a simple select-aggregate query, tuple-bundles are the most efficient. Conversely, if you're joining on an attribute with different values in each possible world, interleave will be faster. We suspect that there are some cases where Naive will win out as well. The aim of this project is to implement a query optimizer for sampling-based probabilistic database queries. If I hand you a query, you tell me which strategy is fastest for that query. As an optional extension, you may be able to interleave different strategies, each evaluating a different part of the query. -###### Background material: +###### Background Material: * [MCDB](http://dl.acm.org/citation.cfm?id=1376686) * [BlinkDB](http://blinkdb.org/) @@ -162,9 +162,11 @@ If you ask "Why is this result so low", the system can look at the above constra ``` SELECT COUNT(*) FROM Publications WHERE author = 'Alice' AND venue = 'ICDE' AND year = 2017; ``` -The aim of this project would be to implement a simple frontend to an existing database system (Spark, SQLite, or Oracle) that accepts a set of constrants and answers questions like this. This project is part of ongoing joint work with Boris Glavic and Sudeepa Roy. +The aim of this project would be to implement a simple frontend to an existing database system (Spark, SQLite, or Oracle) that accepts a set of constrants and answers questions like this. -###### Background material: +(This project is part of ongoing joint work with Boris Glavic and Sudeepa Roy) + +###### Background Material: * [Causality and Explanations in Databases](https://users.cs.duke.edu/~sudeepa/vldb2014-Tutorial-causality-explanations.pdf) * [DBExplain](https://cudbg.github.io/lab/dbexplain) @@ -173,7 +175,16 @@ The aim of this project would be to implement a simple frontend to an existing d #### Adaptive Multidimensional Indexing -(Summary In Progress) +Indexes work by reducing the effort required to locate specific data records. For example, in a tree index, if the range of records in a given subtree doesn't overlap with the query, the entire subtree can be ruled out (or ruled in). Not surprisingly, this means that data partitioning plays a large role in how effective the index is. The fewer partitions lie on query boundaries, the less work is required to respond to those queries. + +Partitioning is especially a problem in 2-dimensional (and 3-, 4-, etc... dimensional) indexes, where there are always two entirely orthogonal dimensions to partition on. Accordingly, there's a wide range of techniques for organizing 2-dimensional data, including a family of indexes based on R-Trees. The aim of this project is to develop a "dynamic" r-like tree structure that adaptively partitions its contents, and (if time permits) that adapts its partition boundaries to changing workloads. + +###### Background Material: + +* [Database Cracking](http://stratos.seas.harvard.edu/files/IKM_CIDR07.pdf) +* [The R*-tree: an efficient and robust access method for points and rectangles](http://dl.acm.org/citation.cfm?id=98741) +* [The Big Red Data Spatial Indexing Project](http://www.cs.cornell.edu/database/spatial-indexing/) + #### Mimir on SparkSQL