Feedback and slides
This commit is contained in:
parent
c2bcd76f34
commit
03b15131dc
|
@ -8,6 +8,6 @@
|
|||
<p>All students should provide one short paragraph identifying at least one strength and at least one weakness of the approach described in the week's reading.</p>
|
||||
|
||||
<%= Disqus::embed(
|
||||
"http://odin.cse.buffalo.edu/teaching/cse-662/2017fa/group_formation.html",
|
||||
"cse662.2017fa.group_formation"
|
||||
"http://odin.cse.buffalo.edu/teaching/cse-662/2017fa/feedback/01-cracking.html",
|
||||
"cse662.2017fa.feedback.01"
|
||||
) %>
|
13
src/teaching/cse-662/2017fa/feedback/02-mcdb.erb
Normal file
13
src/teaching/cse-662/2017fa/feedback/02-mcdb.erb
Normal file
|
@ -0,0 +1,13 @@
|
|||
<h2>Reading Assignment 1: Adaptive Indexing</h2>
|
||||
|
||||
<dl>
|
||||
<dt>Paper</dt>
|
||||
<dd><a href="http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686">MCDB</a></dd>
|
||||
</dl>
|
||||
|
||||
<p>All students should provide one short paragraph identifying at least one strength and at least one weakness of the approach described in the week's reading.</p>
|
||||
|
||||
<%= Disqus::embed(
|
||||
"http://odin.cse.buffalo.edu/teaching/cse-662/2017fa/feedback/02-mcdb.html",
|
||||
"cse662.2017fa.feedback.02"
|
||||
) %>
|
|
@ -76,26 +76,26 @@ After the taking the course, students should be able to:
|
|||
|
||||
* **Aug. 28** : Introduction [ [slides](slides/2017-08-28-Intro.pdf) | [form groups](group_formation.html) ]
|
||||
* **Aug. 30** : Project Seeds [ [slides](slides/2017-08-30-Seeds.pdf) ]
|
||||
* **Sept. 01** : Functional Data Structures
|
||||
* **Sept. 01** : Functional Data Structures [ [slides](slides/2017-09-01-FunctionalDataStructures.pdf) ]
|
||||
* **Sept. 04** : **No Class, Labor Day**
|
||||
* **Sept. 06** : Database Cracking [ [paper](http://stratos.seas.harvard.edu/files/IKM_CIDR07.pdf) | [feedback](feedback/01-cracking.html) ]
|
||||
* **Sept. 08** : Just-in-Time Data Structures [ [paper](http://odin.cse.buffalo.edu/papers/2015/CIDR-jitd-final.pdf) ]
|
||||
* **Sept. 11** : Incomplete Databases 1
|
||||
* **Sept. 11** : Incomplete Databases 1 [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686) | [feedback](feedback/02-mcdb.html) ]
|
||||
* **Sept. 13** : Incomplete Databases 2
|
||||
* **Sept. 15** : Mimir [ [paper](http://odin.cse.buffalo.edu/papers/2015/VLDB-lenses-final.pdf) ]
|
||||
* **Sept. 18** : MayBMS [ [paper](http://maybms.sourceforge.net/download/INFOSYS-TR-2007-2.pdf) ]
|
||||
* **Sept. 20** : Sampling From Probabilistic Queries [ [paper](http://dl.acm.org/citation.cfm?id=1376686) ]
|
||||
* **Sept. 20** : Sampling From Probabilistic Queries [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686) ]
|
||||
* **Sept. 22** : Probabilistic Constraint Repair [ [paper](https://cs.uwaterloo.ca/~ilyas/papers/BeskalesVLDBJ2014.pdf) ]
|
||||
* **Sept. 25** : R-Trees and Multidimensional Indexing [ [paper](http://dl.acm.org/citation.cfm?id=98741) ]
|
||||
* **Sept. 25** : R-Trees and Multidimensional Indexing [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=98741) ]
|
||||
* **Checkpoint 1 report due by 11:59 PM Sept. 26**
|
||||
* **Sept. 27 - Sept. 29** : Student Project Presentations
|
||||
* **Oct. 02** : BloomL [ [paper-1](http://cidrdb.org/cidr2011/Papers/CIDR11_Paper35.pdf), [paper-2](http://dl.acm.org/citation.cfm?id=2391230) ]
|
||||
* **Oct. 02** : BloomL [ [paper-1](http://cidrdb.org/cidr2011/Papers/CIDR11_Paper35.pdf), [paper-2](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=2391230) ]
|
||||
* **Oct. 04 - Oct. 6** : *Oliver Away* (Content TBD)
|
||||
* **Oct. 09** : NoDB [ [paper](http://www.vldb.org/pvldb/vol7/p1119-karpathiotakis.pdf) ]
|
||||
* **Oct. 11 - Oct. 13** : Student Project Presentations
|
||||
* **Oct. 16** : Lazy Transactions [ [paper](http://dl.acm.org/citation.cfm?id=2610529) ]
|
||||
* **Oct. 16** : Lazy Transactions [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=2610529) ]
|
||||
* **Oct. 18** : Streaming [ [paper](http://www.cs.cornell.edu/johannes/papers/2007/2007-CIDR-Cayuga.pdf) ]
|
||||
* **Oct. 20** : Scan Sharing [ [paper](http://dl.acm.org/citation.cfm?id=1807326) ]
|
||||
* **Oct. 20** : Scan Sharing [ [paper](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1807326) ]
|
||||
* **Checkpoint 2 report due by 11:59 PM Oct. 22**
|
||||
* **Oct. 23 - Oct. 27** : Checkpoint 2 Reviews
|
||||
* **Oct. 30** : Declarative Games [ [paper](https://infoscience.epfl.ch/record/166858/files/31-sigmod2007_games.pdf) ]
|
||||
|
@ -127,24 +127,24 @@ There are a number of reasons that data might go bad: sensor errors, data entry
|
|||
###### Background Material:
|
||||
|
||||
* [Sampling from Repairs](https://cs.uwaterloo.ca/~ilyas/papers/BeskalesVLDBJ2014.pdf)
|
||||
* [Qualitative Data Cleaning](http://dl.acm.org/citation.cfm?id=3007320)
|
||||
* [Qualitative Data Cleaning](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=3007320)
|
||||
* [Mimir Website](http://mimirdb.info)
|
||||
* [Mimir on GitHub](https://github.com/UBOdin/mimir)
|
||||
* [Mimir Concepts](https://github.com/ubodin/mimir/wiki/Concepts)
|
||||
|
||||
#### Query Sampling Optimizer
|
||||
|
||||
Most probabilistic database systems aim to produce all possible results. A few, most notably [MCDB](http://dl.acm.org/citation.cfm?id=1376686), instead generate samples of possible results. The basic idea is to split the database into a fixed number (N) of _possible worlds_, and run the query on all N possible worlds in parallel. There are actually a few different ways to do this. Three relatively common examples include:
|
||||
Most probabilistic database systems aim to produce all possible results. A few, most notably [MCDB](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686), instead generate samples of possible results. The basic idea is to split the database into a fixed number (N) of _possible worlds_, and run the query on all N possible worlds in parallel. There are actually a few different ways to do this. Three relatively common examples include:
|
||||
|
||||
* **Naive**: Literally run N copies of the query and union the results at the end.
|
||||
* **Interleave**: Tag each tuple with the possible world that it comes from, and then just run one query. Make sure the query ensures that tuples from different possible worlds can't interact (i.e., Joins always happen between tuples from the same world and the world becomes another group-by column)
|
||||
* **Tuple Bundle**: Create mega-tuples, that represent alternative versions of the same tuple in different possible worlds. If an attribute value is the same in all possible worlds store only one copy of it. (See [MCDB](http://dl.acm.org/citation.cfm?id=1376686))
|
||||
* **Tuple Bundle**: Create mega-tuples, that represent alternative versions of the same tuple in different possible worlds. If an attribute value is the same in all possible worlds store only one copy of it. (See [MCDB](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686))
|
||||
|
||||
Perhaps counterintuitively, our preliminary implementations of the [Interleave](https://github.com/UBOdin/mimir/blob/master/src/main/scala/mimir/exec/mode/SampleRows.scala) and [Tuple Bundle](https://github.com/UBOdin/mimir/blob/master/src/main/scala/mimir/exec/mode/TupleBundle.scala) algorithms suggest that none of these approaches will be the best in all cases. For example, in a simple select-aggregate query, tuple-bundles are the most efficient. Conversely, if you're joining on an attribute with different values in each possible world, interleave will be faster. We suspect that there are some cases where Naive will win out as well. The aim of this project is to implement a query optimizer for sampling-based probabilistic database queries. If I hand you a query, you tell me which strategy is fastest for that query. As an optional extension, you may be able to interleave different strategies, each evaluating a different part of the query.
|
||||
|
||||
###### Background Material:
|
||||
|
||||
* [MCDB](http://dl.acm.org/citation.cfm?id=1376686)
|
||||
* [MCDB](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1376686)
|
||||
* [BlinkDB](http://blinkdb.org/)
|
||||
* [Mimir Website](http://mimirdb.info)
|
||||
* [Mimir on GitHub](https://github.com/UBOdin/mimir)
|
||||
|
@ -184,10 +184,10 @@ Fundamentally, the aim of this project is to outline a range of different workfl
|
|||
|
||||
###### Background Material:
|
||||
|
||||
* [C-Tables](http://dl.acm.org/citation.cfm?id=1886)
|
||||
* [Data Polygamy](http://dl.acm.org/citation.cfm?id=2915245)
|
||||
* [MauveDB](http://dl.acm.org/citation.cfm?id=1142483)
|
||||
* [Indexing Uncertain Data](http://dl.acm.org/citation.cfm?id=1559816)
|
||||
* [C-Tables](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1886)
|
||||
* [Data Polygamy](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=2915245)
|
||||
* [MauveDB](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1142483)
|
||||
* [Indexing Uncertain Data](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=1559816)
|
||||
|
||||
#### Garbage Collection in Embedded Databases
|
||||
|
||||
|
@ -210,7 +210,7 @@ Partitioning is especially a problem in 2-dimensional (and 3-, 4-, etc... dimens
|
|||
###### Background Material:
|
||||
|
||||
* [Database Cracking](http://stratos.seas.harvard.edu/files/IKM_CIDR07.pdf)
|
||||
* [The R*-tree: an efficient and robust access method for points and rectangles](http://dl.acm.org/citation.cfm?id=98741)
|
||||
* [The R*-tree: an efficient and robust access method for points and rectangles](http://dl.acm.org.gate.lib.buffalo.edu/citation.cfm?id=98741)
|
||||
* [The Big Red Data Spatial Indexing Project](http://www.cs.cornell.edu/database/spatial-indexing/)
|
||||
|
||||
|
||||
|
|
Binary file not shown.
Loading…
Reference in a new issue