Matei Zaharia
fd1d255821
Stop objectifying various trackers, caches, etc.
2011-05-17 12:41:13 -07:00
Matei Zaharia
4db50e26c7
Fixed unit tests by making them clean up the SparkContext after use and
...
thus clean up the various singletons (RDDCache, MapOutputTracker, etc).
This isn't perfect yet (ideally we shouldn't use singleton objects at
all) but we can fix that later.
2011-05-13 12:03:58 -07:00
Matei Zaharia
aca8150c52
Ensure that AddedToCache messages make it home before tasks finish
2011-05-13 11:43:52 -07:00
Matei Zaharia
16c886a581
Optimization for count()
2011-05-13 10:41:34 -07:00
Matei Zaharia
4b1f0f1ce4
Merge pull request #48 from ankurdave/bagel-new
...
Bagel: Large-scale graph processing on Spark
2011-05-12 21:34:38 -07:00
Ankur Dave
f40a0898a7
Rename bagel to spark.bagel and Pregel to Bagel
2011-05-09 15:23:21 -07:00
Matei Zaharia
7e20648914
Upgraded to SBT 0.7.5
2011-05-09 14:48:39 -07:00
Matei Zaharia
4bedf5b13a
Merge pull request #47 from ankurdave/cache-to-disk
...
Merging in Ankur's code for a cache that spills to disk
2011-05-09 14:22:56 -07:00
Ankur Dave
c1104058c6
Move shortest path and PageRank to bagel.examples
2011-05-03 18:53:58 -07:00
Ankur Dave
563c5e717c
Refactor and add aggregator support
...
Refactored out the agg() and comp() methods from Pregel.run.
Defined an implicit conversion to allow applications that don't use
aggregators to avoid including a null argument for the result of the
aggregator in the compute function.
2011-05-03 15:40:45 -07:00
Ankur Dave
c18fa3ebc6
Package combiner functions into a trait
2011-05-03 15:40:41 -07:00
Ankur Dave
1c8ca0ebe1
Add Bagel test suite
...
Note: This test suite currently fails for the same reason that the
Spark Core test suite fails: Spark currently seems to have a bug where
any test after the first one fails.
2011-05-03 15:40:31 -07:00
Ankur Dave
c5b3ea755f
Clean up Bagel source and interface
2011-05-03 15:40:01 -07:00
Ankur Dave
19122af787
Update ShortestPath to work with controllable partitioning
2011-05-03 15:39:39 -07:00
Ankur Dave
45ec9db8af
Add Bagel classpath to run script
2011-05-03 15:39:21 -07:00
Ankur Dave
62ef620354
Clean up Pregel.run, add logging
2011-05-03 15:38:01 -07:00
Ankur Dave
c0736f6f68
Add Bagel, an implementation of Pregel on Spark
2011-05-03 15:37:08 -07:00
Ankur Dave
a4c04f3f6f
Error handling for disk I/O in DiskSpillingCache
...
Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.
2011-04-27 23:23:29 -07:00
Ankur Dave
12ff0d2dc3
Bring an entry back into memory after fetching it from disk
2011-04-27 22:59:05 -07:00
Ankur Dave
e30313aa2c
Added DiskSpillingCache
...
DiskSpillingCache is a BoundedMemoryCache that spills entries to disk
when it runs out of space. Currently the implementation is very
simple. In particular, it's missing the following features:
- Error handling for disk I/O, including checking of disk space levels
- Bringing an entry back into memory after fetching it from disk
In addition, here are some features that aren't critical but should be
implemented soon:
- Spilling based on a user-set priority in addition to LRU
- Caching into a subdirectory of spark.DiskSpillingCache.cacheDir
rather than the root directory
2011-04-27 22:32:35 -07:00
Matei Zaharia
94ba95bcb2
Added flatMapValues
2011-04-12 19:51:58 -07:00
Matei Zaharia
d840fa8d0c
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-09 00:40:07 -08:00
root
ff5b13799a
Some tweaks to make Kryo cache work better
2011-03-09 03:31:50 -05:00
Matei Zaharia
7febdfbe29
Better reuse of buffers in Kryo serialization
2011-03-08 12:36:36 -08:00
Matei Zaharia
8ee3ec29ee
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-08 11:58:19 -08:00
Matei Zaharia
7408230bfa
Updated modified Kryo to use objenesis
2011-03-08 11:58:08 -08:00
Matei Zaharia
ab1216cb14
Register None and Nil properly
2011-03-08 11:52:58 -08:00
Matei Zaharia
d39f5dd15e
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-08 10:28:50 -08:00
Matei Zaharia
4f0d0a7b73
stuff
2011-03-08 10:28:26 -08:00
Matei Zaharia
8b6f3db415
Merge remote branch 'origin/custom-serialization' into new-rdds
2011-03-07 19:20:28 -08:00
Matei Zaharia
38f6bce33d
Added SerializingCache
2011-03-07 19:16:24 -08:00
Matei Zaharia
6316c7979d
Remove some logging
2011-03-07 18:56:36 -08:00
Matei Zaharia
e7b4b047a6
Added pluggable serializers and Kryo serialization
2011-03-07 18:41:53 -08:00
Matei Zaharia
467f056e29
Remove commented code
2011-03-06 23:38:41 -08:00
Matei Zaharia
bce95b8458
Finished cogroup stuff
2011-03-06 23:38:16 -08:00
Matei Zaharia
04c2d6a60c
stuff
2011-03-06 19:27:03 -08:00
Matei Zaharia
0fb691dd28
Various fixes to get MesosScheduler working with new RDDs
2011-03-06 16:16:38 -08:00
Matei Zaharia
1df5a65a01
Pass cache locations correctly to DAGScheduler.
2011-03-06 12:16:38 -08:00
Matei Zaharia
e1436f1eaa
Merge remote branch 'origin/master' into new-rdds
2011-03-06 11:11:47 -08:00
Matei Zaharia
370b95816f
Added sampling for large arrays in SizeEstimator
2011-03-06 11:11:20 -08:00
Matei Zaharia
a789e9aaea
Merge remote branch 'origin/master' into new-rdds
2011-03-01 10:33:37 -08:00
Matei Zaharia
021c50a8d4
Remove unnecessary lock which was there to work around a bug in
...
Configuration in Hadoop 0.20.0
2011-03-01 10:28:38 -08:00
Matei Zaharia
adaba4d550
Removed old slf4j jars that came with Hadoop
2011-03-01 10:28:21 -08:00
Matei Zaharia
447debb771
Updated Hadoop to 0.20.2 to include some bug fixes
2011-03-01 10:27:48 -08:00
Matei Zaharia
9e59afd710
More work on new RDD design
2011-02-27 19:15:52 -08:00
Matei Zaharia
f38f86d59e
More stuff
2011-02-27 14:27:12 -08:00
Matei Zaharia
2e6023f2bf
stuff
2011-02-26 23:41:44 -08:00
Matei Zaharia
309367c477
Initial work towards new RDD design
2011-02-26 23:15:33 -08:00
Matei Zaharia
dc24aecd8f
Close record readers in HadoopFile after finishing a split
2011-02-10 12:07:48 -08:00
Matei Zaharia
62f1c6f5a8
Remove build.properties from version control
2011-02-09 11:52:56 -08:00