Feedback and slides

This commit is contained in:
Oliver Kennedy 2017-08-30 16:50:53 -04:00
parent c2bcd76f34
commit 03b15131dc
4 changed files with 31 additions and 18 deletions

View file

@ -8,6 +8,6 @@
<p>All students should provide one short paragraph identifying at least one strength and at least one weakness of the approach described in the week's reading.</p>
<%= Disqus::embed(
"http://odin.cse.buffalo.edu/teaching/cse-662/2017fa/group_formation.html",
"cse662.2017fa.group_formation"
"http://odin.cse.buffalo.edu/teaching/cse-662/2017fa/feedback/01-cracking.html",
"cse662.2017fa.feedback.01"
) %>

View file

@ -0,0 +1,13 @@
<h2>Reading Assignment 1: Adaptive Indexing</h2>
<dl>
<dt>Paper</dt>
<dd><a href="http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686">MCDB</a></dd>
</dl>
<p>All students should provide one short paragraph identifying at least one strength and at least one weakness of the approach described in the week's reading.</p>
<%= Disqus::embed(
"http://odin.cse.buffalo.edu/teaching/cse-662/2017fa/feedback/02-mcdb.html",
"cse662.2017fa.feedback.02"
) %>

View file

@ -76,26 +76,26 @@ After the taking the course, students should be able to:
* **Aug. 28** : Introduction [ [slides](slides/2017-08-28-Intro.pdf) | [form groups](group_formation.html) ]
* **Aug. 30** : Project Seeds [ [slides](slides/2017-08-30-Seeds.pdf) ]
* **Sept. 01** : Functional Data Structures
* **Sept. 01** : Functional Data Structures [ [slides](slides/2017-09-01-FunctionalDataStructures.pdf) ]
* **Sept. 04** : **No Class, Labor Day**
* **Sept. 06** : Database Cracking [ [paper](http://stratos.seas.harvard.edu/files/IKM_CIDR07.pdf) | [feedback](feedback/01-cracking.html) ]
* **Sept. 08** : Just-in-Time Data Structures [ [paper](http://odin.cse.buffalo.edu/papers/2015/CIDR-jitd-final.pdf) ]
* **Sept. 11** : Incomplete Databases 1
* **Sept. 11** : Incomplete Databases 1 [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686) | [feedback](feedback/02-mcdb.html) ]
* **Sept. 13** : Incomplete Databases 2
* **Sept. 15** : Mimir [ [paper](http://odin.cse.buffalo.edu/papers/2015/VLDB-lenses-final.pdf) ]
* **Sept. 18** : MayBMS [ [paper](http://maybms.sourceforge.net/download/INFOSYS-TR-2007-2.pdf) ]
* **Sept. 20** : Sampling From Probabilistic Queries [ [paper](http://dl.acm.org/citation.cfm?id=1376686) ]
* **Sept. 20** : Sampling From Probabilistic Queries [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686) ]
* **Sept. 22** : Probabilistic Constraint Repair [ [paper](https://cs.uwaterloo.ca/~ilyas/papers/BeskalesVLDBJ2014.pdf) ]
* **Sept. 25** : R-Trees and Multidimensional Indexing [ [paper](http://dl.acm.org/citation.cfm?id=98741) ]
* **Sept. 25** : R-Trees and Multidimensional Indexing [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=98741) ]
* **Checkpoint 1 report due by 11:59 PM Sept. 26**
* **Sept. 27 - Sept. 29** : Student Project Presentations
* **Oct. 02** : BloomL [ [paper-1](http://cidrdb.org/cidr2011/Papers/CIDR11_Paper35.pdf), [paper-2](http://dl.acm.org/citation.cfm?id=2391230) ]
* **Oct. 02** : BloomL [ [paper-1](http://cidrdb.org/cidr2011/Papers/CIDR11_Paper35.pdf), [paper-2](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=2391230) ]
* **Oct. 04 - Oct. 6** : *Oliver Away* (Content TBD)
* **Oct. 09** : NoDB [ [paper](http://www.vldb.org/pvldb/vol7/p1119-karpathiotakis.pdf) ]
* **Oct. 11 - Oct. 13** : Student Project Presentations
* **Oct. 16** : Lazy Transactions [ [paper](http://dl.acm.org/citation.cfm?id=2610529) ]
* **Oct. 16** : Lazy Transactions [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=2610529) ]
* **Oct. 18** : Streaming [ [paper](http://www.cs.cornell.edu/johannes/papers/2007/2007-CIDR-Cayuga.pdf) ]
* **Oct. 20** : Scan Sharing [ [paper](http://dl.acm.org/citation.cfm?id=1807326) ]
* **Oct. 20** : Scan Sharing [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1807326) ]
* **Checkpoint 2 report due by 11:59 PM Oct. 22**
* **Oct. 23 - Oct. 27** : Checkpoint 2 Reviews
* **Oct. 30** : Declarative Games [ [paper](https://infoscience.epfl.ch/record/166858/files/31-sigmod2007_games.pdf) ]
@ -127,24 +127,24 @@ There are a number of reasons that data might go bad: sensor errors, data entry
###### Background Material:
* [Sampling from Repairs](https://cs.uwaterloo.ca/~ilyas/papers/BeskalesVLDBJ2014.pdf)
* [Qualitative Data Cleaning](http://dl.acm.org/citation.cfm?id=3007320)
* [Qualitative Data Cleaning](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=3007320)
* [Mimir Website](http://mimirdb.info)
* [Mimir on GitHub](https://github.com/UBOdin/mimir)
* [Mimir Concepts](https://github.com/ubodin/mimir/wiki/Concepts)
#### Query Sampling Optimizer
Most probabilistic database systems aim to produce all possible results. A few, most notably [MCDB](http://dl.acm.org/citation.cfm?id=1376686), instead generate samples of possible results. The basic idea is to split the database into a fixed number (N) of _possible worlds_, and run the query on all N possible worlds in parallel. There are actually a few different ways to do this. Three relatively common examples include:
Most probabilistic database systems aim to produce all possible results. A few, most notably [MCDB](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686), instead generate samples of possible results. The basic idea is to split the database into a fixed number (N) of _possible worlds_, and run the query on all N possible worlds in parallel. There are actually a few different ways to do this. Three relatively common examples include:
* **Naive**: Literally run N copies of the query and union the results at the end.
* **Interleave**: Tag each tuple with the possible world that it comes from, and then just run one query. Make sure the query ensures that tuples from different possible worlds can't interact (i.e., Joins always happen between tuples from the same world and the world becomes another group-by column)
* **Tuple Bundle**: Create mega-tuples, that represent alternative versions of the same tuple in different possible worlds. If an attribute value is the same in all possible worlds store only one copy of it. (See [MCDB](http://dl.acm.org/citation.cfm?id=1376686))
* **Tuple Bundle**: Create mega-tuples, that represent alternative versions of the same tuple in different possible worlds. If an attribute value is the same in all possible worlds store only one copy of it. (See [MCDB](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686))
Perhaps counterintuitively, our preliminary implementations of the [Interleave](https://github.com/UBOdin/mimir/blob/master/src/main/scala/mimir/exec/mode/SampleRows.scala) and [Tuple Bundle](https://github.com/UBOdin/mimir/blob/master/src/main/scala/mimir/exec/mode/TupleBundle.scala) algorithms suggest that none of these approaches will be the best in all cases. For example, in a simple select-aggregate query, tuple-bundles are the most efficient. Conversely, if you're joining on an attribute with different values in each possible world, interleave will be faster. We suspect that there are some cases where Naive will win out as well. The aim of this project is to implement a query optimizer for sampling-based probabilistic database queries. If I hand you a query, you tell me which strategy is fastest for that query. As an optional extension, you may be able to interleave different strategies, each evaluating a different part of the query.
###### Background Material:
* [MCDB](http://dl.acm.org/citation.cfm?id=1376686)
* [MCDB](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686)
* [BlinkDB](http://blinkdb.org/)
* [Mimir Website](http://mimirdb.info)
* [Mimir on GitHub](https://github.com/UBOdin/mimir)
@ -184,10 +184,10 @@ Fundamentally, the aim of this project is to outline a range of different workfl
###### Background Material:
* [C-Tables](http://dl.acm.org/citation.cfm?id=1886)
* [Data Polygamy](http://dl.acm.org/citation.cfm?id=2915245)
* [MauveDB](http://dl.acm.org/citation.cfm?id=1142483)
* [Indexing Uncertain Data](http://dl.acm.org/citation.cfm?id=1559816)
* [C-Tables](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1886)
* [Data Polygamy](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=2915245)
* [MauveDB](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1142483)
* [Indexing Uncertain Data](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1559816)
#### Garbage Collection in Embedded Databases
@ -210,7 +210,7 @@ Partitioning is especially a problem in 2-dimensional (and 3-, 4-, etc... dimens
###### Background Material:
* [Database Cracking](http://stratos.seas.harvard.edu/files/IKM_CIDR07.pdf)
* [The R*-tree: an efficient and robust access method for points and rectangles](http://dl.acm.org/citation.cfm?id=98741)
* [The R*-tree: an efficient and robust access method for points and rectangles](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=98741)
* [The Big Red Data Spatial Indexing Project](http://www.cs.cornell.edu/database/spatial-indexing/)