Commit graph

20235 commits

Author SHA1 Message Date
Matei Zaharia bea3a33012 doc tweak 2011-05-22 16:03:41 -07:00
Matei Zaharia 9bde5a54cb class loader fix 2011-05-22 16:00:41 -07:00
Matei Zaharia 91c07a33d9 Various fixes to serialization 2011-05-21 22:50:08 -07:00
Matei Zaharia f61b61c4ac Merge branch 'master' into new-rdds 2011-05-21 21:25:58 -07:00
Matei Zaharia 24a1e7f838 Scheduler can now recover from lost map outputs 2011-05-20 00:19:53 -07:00
Matei Zaharia 82329b0b28 Updated scheduler to support running on just some partitions of final RDD 2011-05-19 12:47:09 -07:00
Matei Zaharia 328e51b693 Various minor fixes 2011-05-19 11:19:25 -07:00
Matei Zaharia fd1d255821 Stop objectifying various trackers, caches, etc. 2011-05-17 12:41:13 -07:00
Matei Zaharia 4db50e26c7 Fixed unit tests by making them clean up the SparkContext after use and
thus clean up the various singletons (RDDCache, MapOutputTracker, etc).
This isn't perfect yet (ideally we shouldn't use singleton objects at
all) but we can fix that later.
2011-05-13 12:03:58 -07:00
Matei Zaharia aca8150c52 Ensure that AddedToCache messages make it home before tasks finish 2011-05-13 11:43:52 -07:00
Matei Zaharia 16c886a581 Optimization for count() 2011-05-13 10:41:34 -07:00
Matei Zaharia 4b1f0f1ce4 Merge pull request #48 from ankurdave/bagel-new
Bagel: Large-scale graph processing on Spark
2011-05-12 21:34:38 -07:00
Ankur Dave f40a0898a7 Rename bagel to spark.bagel and Pregel to Bagel 2011-05-09 15:23:21 -07:00
Matei Zaharia 7e20648914 Upgraded to SBT 0.7.5 2011-05-09 14:48:39 -07:00
Matei Zaharia 4bedf5b13a Merge pull request #47 from ankurdave/cache-to-disk
Merging in Ankur's code for a cache that spills to disk
2011-05-09 14:22:56 -07:00
Ankur Dave c1104058c6 Move shortest path and PageRank to bagel.examples 2011-05-03 18:53:58 -07:00
Ankur Dave 563c5e717c Refactor and add aggregator support
Refactored out the agg() and comp() methods from Pregel.run.

Defined an implicit conversion to allow applications that don't use
aggregators to avoid including a null argument for the result of the
aggregator in the compute function.
2011-05-03 15:40:45 -07:00
Ankur Dave c18fa3ebc6 Package combiner functions into a trait 2011-05-03 15:40:41 -07:00
Ankur Dave 1c8ca0ebe1 Add Bagel test suite
Note: This test suite currently fails for the same reason that the
Spark Core test suite fails: Spark currently seems to have a bug where
any test after the first one fails.
2011-05-03 15:40:31 -07:00
Ankur Dave c5b3ea755f Clean up Bagel source and interface 2011-05-03 15:40:01 -07:00
Ankur Dave 19122af787 Update ShortestPath to work with controllable partitioning 2011-05-03 15:39:39 -07:00
Ankur Dave 45ec9db8af Add Bagel classpath to run script 2011-05-03 15:39:21 -07:00
Ankur Dave 62ef620354 Clean up Pregel.run, add logging 2011-05-03 15:38:01 -07:00
Ankur Dave c0736f6f68 Add Bagel, an implementation of Pregel on Spark 2011-05-03 15:37:08 -07:00
Mosharaf Chowdhury db7a2c4897 Issue #42 fixed. 2011-04-28 14:30:48 -07:00
Ankur Dave a4c04f3f6f Error handling for disk I/O in DiskSpillingCache
Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.
2011-04-27 23:23:29 -07:00
Ankur Dave 12ff0d2dc3 Bring an entry back into memory after fetching it from disk 2011-04-27 22:59:05 -07:00
Ankur Dave e30313aa2c Added DiskSpillingCache
DiskSpillingCache is a BoundedMemoryCache that spills entries to disk
when it runs out of space. Currently the implementation is very
simple. In particular, it's missing the following features:

- Error handling for disk I/O, including checking of disk space levels
- Bringing an entry back into memory after fetching it from disk

In addition, here are some features that aren't critical but should be
implemented soon:

- Spilling based on a user-set priority in addition to LRU
- Caching into a subdirectory of spark.DiskSpillingCache.cacheDir
  rather than the root directory
2011-04-27 22:32:35 -07:00
Mosharaf Chowdhury 60d1121343 Refactoring: daemonThreadFactories have all been moved to the Utils
object instead of having multiple copies in Broadcast and Shuffle
objects.
2011-04-27 22:13:01 -07:00
Mosharaf Chowdhury e898e108a3 Cleanup + refactoring... 2011-04-27 22:00:24 -07:00
Mosharaf Chowdhury 0567646180 Shuffle is also working from its own subpackage. 2011-04-27 21:11:41 -07:00
Mosharaf Chowdhury 2742de707a Removed some shuffle implementations. Remaining ones all use local files
to write map outputs.
2011-04-27 20:53:43 -07:00
Mosharaf Chowdhury 9d78779257 Merge branch 'mos-shuffle-tracked' into mos-bt
Conflicts:
	core/src/main/scala/spark/Broadcast.scala
2011-04-27 20:47:07 -07:00
Mosharaf Chowdhury ac7e066383 Merge branch 'master' into mos-shuffle-tracked
Conflicts:
	.gitignore
	core/src/main/scala/spark/LocalFileShuffle.scala
	src/scala/spark/BasicLocalFileShuffle.scala
	src/scala/spark/Broadcast.scala
	src/scala/spark/LocalFileShuffle.scala
2011-04-27 14:35:03 -07:00
Mosharaf Chowdhury 4e4c41026c Added support for custom classes. (from 49ea48) 2011-04-27 12:30:16 -07:00
Mosharaf Chowdhury 65848da8df Refacoring... 2011-04-26 17:41:31 -07:00
Mosharaf Chowdhury b8ab7862b8 Moved broadcast-related code to separate directory under spark.broadcast
package.
2011-04-26 17:22:52 -07:00
Mosharaf Chowdhury e31007248c Merge branch 'master' into mos-bt 2011-04-26 12:04:14 -07:00
Mosharaf Chowdhury 9257a55e3a Refactoring... 2011-04-26 11:45:36 -07:00
Mosharaf Chowdhury 9d2d533493 Temporary fix for issue #42. 2011-04-21 17:40:26 -07:00
Timothy Hunter 5c9535228a fixed small bug when classpath has some strange formatting 2011-04-18 17:12:29 -07:00
Mosharaf Chowdhury a8f47a62b9 Renamed MaxRxPeers to MaxTxPeers to MaxTxSlots and MaxRxSlots
respectively for clarity (most probably they were misunderstood and
misused)
2011-04-13 16:24:19 -07:00
Matei Zaharia 94ba95bcb2 Added flatMapValues 2011-04-12 19:51:58 -07:00
Mosharaf Chowdhury b67a968b5d hasBlocks is now AtomicInteger (even though it was ok) 2011-04-02 22:03:18 -07:00
Mosharaf Chowdhury 5bf3c83b13 BroadcastSuperTracker (right now for BT) is contacted over TCP instead
of direct procedure call.

Need to do the same for others and consolidate all broadcast mechanisms.
2011-04-01 19:31:28 -07:00
Mosharaf Chowdhury 733a130108 Formatting... 2011-04-01 14:51:24 -07:00
Mosharaf Chowdhury 4636aea598 Formatting... 2011-04-01 14:49:59 -07:00
Mosharaf Chowdhury addd569e52 Each broadcasted variable can have different blockSize. Corresponding
logic to adapt blockSize based on network condition is not yet
implemented.

Formatting + consolidation.
2011-03-31 14:51:46 -07:00
Mosharaf Chowdhury 815f3411ec Consolidated Broadcast config params. 2011-03-30 16:45:51 -07:00
Mosharaf Chowdhury a18a28b08e Removed gossip-related code that were already commented out.
More formatting.
2011-03-30 14:22:09 -07:00