Commit graph

7072 commits

Author SHA1 Message Date
Matei Zaharia 328e51b693 Various minor fixes 2011-05-19 11:19:25 -07:00
Matei Zaharia fd1d255821 Stop objectifying various trackers, caches, etc. 2011-05-17 12:41:13 -07:00
Matei Zaharia 4db50e26c7 Fixed unit tests by making them clean up the SparkContext after use and
thus clean up the various singletons (RDDCache, MapOutputTracker, etc).
This isn't perfect yet (ideally we shouldn't use singleton objects at
all) but we can fix that later.
2011-05-13 12:03:58 -07:00
Matei Zaharia aca8150c52 Ensure that AddedToCache messages make it home before tasks finish 2011-05-13 11:43:52 -07:00
Matei Zaharia 16c886a581 Optimization for count() 2011-05-13 10:41:34 -07:00
Mosharaf Chowdhury db7a2c4897 Issue #42 fixed. 2011-04-28 14:30:48 -07:00
Ankur Dave a4c04f3f6f Error handling for disk I/O in DiskSpillingCache
Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.
2011-04-27 23:23:29 -07:00
Ankur Dave 12ff0d2dc3 Bring an entry back into memory after fetching it from disk 2011-04-27 22:59:05 -07:00
Ankur Dave e30313aa2c Added DiskSpillingCache
DiskSpillingCache is a BoundedMemoryCache that spills entries to disk
when it runs out of space. Currently the implementation is very
simple. In particular, it's missing the following features:

- Error handling for disk I/O, including checking of disk space levels
- Bringing an entry back into memory after fetching it from disk

In addition, here are some features that aren't critical but should be
implemented soon:

- Spilling based on a user-set priority in addition to LRU
- Caching into a subdirectory of spark.DiskSpillingCache.cacheDir
  rather than the root directory
2011-04-27 22:32:35 -07:00
Mosharaf Chowdhury 60d1121343 Refactoring: daemonThreadFactories have all been moved to the Utils
object instead of having multiple copies in Broadcast and Shuffle
objects.
2011-04-27 22:13:01 -07:00
Mosharaf Chowdhury e898e108a3 Cleanup + refactoring... 2011-04-27 22:00:24 -07:00
Mosharaf Chowdhury 0567646180 Shuffle is also working from its own subpackage. 2011-04-27 21:11:41 -07:00
Mosharaf Chowdhury 2742de707a Removed some shuffle implementations. Remaining ones all use local files
to write map outputs.
2011-04-27 20:53:43 -07:00
Mosharaf Chowdhury 9d78779257 Merge branch 'mos-shuffle-tracked' into mos-bt
Conflicts:
	core/src/main/scala/spark/Broadcast.scala
2011-04-27 20:47:07 -07:00
Mosharaf Chowdhury ac7e066383 Merge branch 'master' into mos-shuffle-tracked
Conflicts:
	.gitignore
	core/src/main/scala/spark/LocalFileShuffle.scala
	src/scala/spark/BasicLocalFileShuffle.scala
	src/scala/spark/Broadcast.scala
	src/scala/spark/LocalFileShuffle.scala
2011-04-27 14:35:03 -07:00
Mosharaf Chowdhury 4e4c41026c Added support for custom classes. (from 49ea48) 2011-04-27 12:30:16 -07:00
Mosharaf Chowdhury 65848da8df Refacoring... 2011-04-26 17:41:31 -07:00
Mosharaf Chowdhury b8ab7862b8 Moved broadcast-related code to separate directory under spark.broadcast
package.
2011-04-26 17:22:52 -07:00
Mosharaf Chowdhury e31007248c Merge branch 'master' into mos-bt 2011-04-26 12:04:14 -07:00
Mosharaf Chowdhury 9257a55e3a Refactoring... 2011-04-26 11:45:36 -07:00
Mosharaf Chowdhury 9d2d533493 Temporary fix for issue #42. 2011-04-21 17:40:26 -07:00
Timothy Hunter 5c9535228a fixed small bug when classpath has some strange formatting 2011-04-18 17:12:29 -07:00
Mosharaf Chowdhury a8f47a62b9 Renamed MaxRxPeers to MaxTxPeers to MaxTxSlots and MaxRxSlots
respectively for clarity (most probably they were misunderstood and
misused)
2011-04-13 16:24:19 -07:00
Matei Zaharia 94ba95bcb2 Added flatMapValues 2011-04-12 19:51:58 -07:00
Mosharaf Chowdhury b67a968b5d hasBlocks is now AtomicInteger (even though it was ok) 2011-04-02 22:03:18 -07:00
Mosharaf Chowdhury 5bf3c83b13 BroadcastSuperTracker (right now for BT) is contacted over TCP instead
of direct procedure call.

Need to do the same for others and consolidate all broadcast mechanisms.
2011-04-01 19:31:28 -07:00
Mosharaf Chowdhury 733a130108 Formatting... 2011-04-01 14:51:24 -07:00
Mosharaf Chowdhury 4636aea598 Formatting... 2011-04-01 14:49:59 -07:00
Mosharaf Chowdhury addd569e52 Each broadcasted variable can have different blockSize. Corresponding
logic to adapt blockSize based on network condition is not yet
implemented.

Formatting + consolidation.
2011-03-31 14:51:46 -07:00
Mosharaf Chowdhury 815f3411ec Consolidated Broadcast config params. 2011-03-30 16:45:51 -07:00
Mosharaf Chowdhury a18a28b08e Removed gossip-related code that were already commented out.
More formatting.
2011-03-30 14:22:09 -07:00
Mosharaf Chowdhury 43aceafd70 Formatting... 2011-03-30 12:18:50 -07:00
Mosharaf Chowdhury 73b165220d Random is the default choice; rarestFirst didn't work well in
experiments.
2011-03-29 13:06:43 -07:00
Matei Zaharia d840fa8d0c Merge remote branch 'origin/custom-serialization' into new-rdds 2011-03-09 00:40:07 -08:00
root ff5b13799a Some tweaks to make Kryo cache work better 2011-03-09 03:31:50 -05:00
Matei Zaharia 7febdfbe29 Better reuse of buffers in Kryo serialization 2011-03-08 12:36:36 -08:00
Matei Zaharia 8ee3ec29ee Merge remote branch 'origin/custom-serialization' into new-rdds 2011-03-08 11:58:19 -08:00
Matei Zaharia 7408230bfa Updated modified Kryo to use objenesis 2011-03-08 11:58:08 -08:00
Matei Zaharia ab1216cb14 Register None and Nil properly 2011-03-08 11:52:58 -08:00
Matei Zaharia d39f5dd15e Merge remote branch 'origin/custom-serialization' into new-rdds 2011-03-08 10:28:50 -08:00
Matei Zaharia 4f0d0a7b73 stuff 2011-03-08 10:28:26 -08:00
Matei Zaharia 8b6f3db415 Merge remote branch 'origin/custom-serialization' into new-rdds 2011-03-07 19:20:28 -08:00
Matei Zaharia 38f6bce33d Added SerializingCache 2011-03-07 19:16:24 -08:00
Matei Zaharia 6316c7979d Remove some logging 2011-03-07 18:56:36 -08:00
Matei Zaharia e7b4b047a6 Added pluggable serializers and Kryo serialization 2011-03-07 18:41:53 -08:00
Matei Zaharia 467f056e29 Remove commented code 2011-03-06 23:38:41 -08:00
Matei Zaharia bce95b8458 Finished cogroup stuff 2011-03-06 23:38:16 -08:00
Matei Zaharia 04c2d6a60c stuff 2011-03-06 19:27:03 -08:00
Matei Zaharia 0fb691dd28 Various fixes to get MesosScheduler working with new RDDs 2011-03-06 16:16:38 -08:00
Matei Zaharia 1df5a65a01 Pass cache locations correctly to DAGScheduler. 2011-03-06 12:16:38 -08:00
Matei Zaharia e1436f1eaa Merge remote branch 'origin/master' into new-rdds 2011-03-06 11:11:47 -08:00
Matei Zaharia 370b95816f Added sampling for large arrays in SizeEstimator 2011-03-06 11:11:20 -08:00
Matei Zaharia a789e9aaea Merge remote branch 'origin/master' into new-rdds 2011-03-01 10:33:37 -08:00
Matei Zaharia 021c50a8d4 Remove unnecessary lock which was there to work around a bug in
Configuration in Hadoop 0.20.0
2011-03-01 10:28:38 -08:00
Matei Zaharia adaba4d550 Removed old slf4j jars that came with Hadoop 2011-03-01 10:28:21 -08:00
Matei Zaharia 447debb771 Updated Hadoop to 0.20.2 to include some bug fixes 2011-03-01 10:27:48 -08:00
Matei Zaharia 9e59afd710 More work on new RDD design 2011-02-27 19:15:52 -08:00
Matei Zaharia f38f86d59e More stuff 2011-02-27 14:27:12 -08:00
Matei Zaharia 2e6023f2bf stuff 2011-02-26 23:41:44 -08:00
Matei Zaharia 309367c477 Initial work towards new RDD design 2011-02-26 23:15:33 -08:00
Mosharaf Chowdhury 0416cc22d2 Picking peers weighted by the number of rare blocks they have. A block is rare if there are at most 2 copies in the neighborhood. Better number can be used (some function of neighborhood size) 2011-02-15 16:27:44 -08:00
Mosharaf Chowdhury cf81da9485 Optimization: Master sends out at least one copy of each block first regardless of whatever a client is asking for. Once one copy of each block is out, Master then responds to specific blocks from individual receivers. 2011-02-14 15:08:33 -08:00
Mosharaf Chowdhury 2b946fb2d1 pickBlockRarestFirst and gossips commented OUT for now.
Problem with the rarestFirst implemention is that we are picking peers randomly first and then picking blocks from the random peer using rarestFirst. NOT the right away to do it. It should be the other way around.
Problem with gossip is that peers might end up overwriting newer information by older ones. To fix that we either have to have timestamps or must match the bitVectors before overwriting.
2011-02-13 13:53:15 -08:00
Mosharaf Chowdhury ca2895ebb0 Fix in rarestFirst implemenation.
If there are more than one rarest blocks, pick randomly between them (was deterministic before)
2011-02-10 20:37:44 -08:00
Mosharaf Chowdhury 520bbdc7e3 Peers now gossip about their neighbors when they talk. 2011-02-10 20:15:30 -08:00
Matei Zaharia dc24aecd8f Close record readers in HadoopFile after finishing a split 2011-02-10 12:07:48 -08:00
Mosharaf Chowdhury 441462bc7f Fixed some warnings during compilation. 2011-02-09 12:11:43 -08:00
Mosharaf Chowdhury 1a73c0d265 Merged with master. Using sbt. 2011-02-09 10:48:48 -08:00
Mosharaf Chowdhury 495b38658e Merge branch 'master' into mos-bt 2011-02-09 10:40:23 -08:00
Matei Zaharia 99f3f23efa Changed default shuffle to LocalFileShuffle because it's way faster for small files 2011-02-08 17:03:03 -08:00
Matei Zaharia ec28b607fd Merge branch 'master' into sbt
Conflicts:
	Makefile
	core/src/main/java/spark/compress/lzf/LZF.java
	core/src/main/java/spark/compress/lzf/LZFInputStream.java
	core/src/main/java/spark/compress/lzf/LZFOutputStream.java
	core/src/main/native/spark_compress_lzf_LZF.c
	run
2011-02-02 00:25:54 -08:00
Matei Zaharia e5c4cd8a5e Made examples and core subprojects 2011-02-01 15:11:08 -08:00