Commit graph

29896 commits

Author SHA1 Message Date
Mosharaf Chowdhury a8411eb59e - Moving common stuff to a separate Shuffle object.
- Moved ShuffleTrackerStrategy to a separate file.
2010-12-28 12:19:07 -08:00
Mosharaf Chowdhury 1bc10ba64c Formatting. 2010-12-28 00:47:08 -08:00
Mosharaf Chowdhury 95ebd58d0f - Made sure only one Leaving notification goes to the tracker per ShuffleClient.
- Why ShuffleClient and ShuffleServerThread are crashing is still unknown.
2010-12-27 03:25:50 -08:00
Mosharaf Chowdhury 7ac3463ab8 Bug fix: tracker (running in Spark master) wasn't initializing Shuffle object and was using inconsistent config values. 2010-12-27 02:55:45 -08:00
Mosharaf Chowdhury 44c8fb0873 Fixed closing order of of some of the i/o streams. Bugs remain. 2010-12-27 02:08:03 -08:00
Mosharaf Chowdhury f859941062 Merge branch 'mos-shuffle-parallel' into mos-shuffle-tracked
Conflicts:
	conf/java-opts
2010-12-25 23:12:59 -08:00
Mosharaf Chowdhury 6c8e9cb2f9 Consolidated shuffle options. 2010-12-25 23:08:53 -08:00
Mosharaf Chowdhury 20eec59f04 Bug fix + formatting. 2010-12-25 22:54:46 -08:00
Mosharaf Chowdhury 90e467206d Tracker framework is in place that supports pluggable tracker strategy. There are several bugs along with performance problems.
- For larger data shuffle ShuffleServerThread gets "Broken Pipe" and ShuffleClient gets "Connection Reset"
 - There is a bug in the accounting counters of BalanceConnectionsShuffleTrackerStrategy. Some of them go below zero while decrementing which is not supposed to happen.
2010-12-25 22:45:50 -08:00
Mosharaf Chowdhury ba71b61e40 Reading masterHostAddress from config file until #42 has been resolved. 2010-12-25 10:22:22 -08:00
Mosharaf Chowdhury c1ff210387 Fixed some comments. 2010-12-24 20:05:00 -08:00
Mosharaf Chowdhury 8dc44bfa96 CustomBlockedInMemoryShuffle is an in- memroy implementation of CustomBlockedLFS 2010-12-22 21:06:03 -08:00
Mosharaf Chowdhury a064835808 CustomBlockedLocalFileShuffle has been added. This is essentially ManualBlockedLocalFileShuffle with our servers. 2010-12-22 19:02:20 -08:00
Mosharaf Chowdhury 3447f903da Renamed CustomBlockedLocalFileShuffle to ManualBlockedLocalFileShuffle.
There will be a new CustomBlockedLocalFileShuffle where 'Custom' will mean ManualBlockedLocalFileShuffle with custom server instead of jetty.
2010-12-22 17:17:33 -08:00
Mosharaf Chowdhury c484b735bb Bug squashed. CustomParallelInMemoryShuffle is rocking!
We were serializing one (the wrong) thing, trying to deserialize another (the right thing).
2010-12-22 17:03:31 -08:00
Mosharaf Chowdhury 23586d3bef Added an in-memory implementation of CustomParalleLFS. There is a serialization/deserialization bug in the implementation. 2010-12-22 16:45:26 -08:00
Mosharaf Chowdhury c4c8f72e98 Fixed an indexing bug in HttpBlockedLocalFileShuffle. It still doesn't work on EC2 with >5 nodes cluster. 2010-12-22 12:48:11 -08:00
Mosharaf Chowdhury a5a8b7048d CustomBlockedLocalFileShuffle has separate consumer thread. 2010-12-22 12:04:12 -08:00
Mosharaf Chowdhury 92d2a9a13a Removed unncessary stuff from HttpParallelLocalFileShuffle 2010-12-22 11:28:50 -08:00
Mosharaf Chowdhury 4ab268ee36 HttpParallelLocalFileShuffle also has a consuming thread. It works on EC2. 2010-12-21 23:50:02 -08:00
Mosharaf Chowdhury 5f7bfbc70e HttpBlockedLocalFileShuffle has also been converted to have per-reducer consumption thread. Works in local mesos, but NOT on EC2 :| 2010-12-21 23:05:32 -08:00
Mosharaf Chowdhury 5f0cdabd40 Added a separate thread to deserialize (1 thread per reducer) in CustomParallelLocalFileShuffle
Upside: No synchronized blocking on "combiners" variable. 3x faster :)
Downside: Inefficient implementation. Requiring too much temporary data. Approx. 2x increase in memory requirement :( Should be fixed at some point.
2010-12-21 21:52:37 -08:00
Mosharaf Chowdhury f4d0e917a2 Added all the options to the java-opts file. Tired of writing them for separate runs :| 2010-12-21 18:59:51 -08:00
Mosharaf Chowdhury 6ef17e918b Fixed logging. Again. 2010-12-21 18:49:35 -08:00
Mosharaf Chowdhury f47fb44479 - Divided maxConnections to max[Rx|Tx]Connections.
- Fixed config param loading bug in CustomParallelLFS
2010-12-21 17:34:51 -08:00
Mosharaf Chowdhury d92b067350 Fixed log message in CustomParallelLocalFileShuffle that was giving some problem in log processing. 2010-12-21 13:12:15 -08:00
Mosharaf Chowdhury 3b21a5fb26 Code formatting... 2010-12-19 18:03:20 -08:00
Mosharaf Chowdhury 81f78282e1 All shuffle implementations are now in the same place. Time to work on new things. 2010-12-19 14:32:40 -08:00
Mosharaf Chowdhury 272c72b405 Merge branch 'mos-shuffle' into mos-shuffle-parallel
Conflicts:
	conf/java-opts
	src/scala/spark/BasicLocalFileShuffle.scala
2010-12-19 14:25:13 -08:00
Mosharaf Chowdhury ca37e7b33d Renamed CustomParallelLocalFileShuffle 2010-12-19 14:22:05 -08:00
Mosharaf Chowdhury 864d202cda Merge branch 'mos-shuffle-parallel-http' into mos-shuffle
Conflicts:
	conf/java-opts
	src/scala/spark/BlockedLocalFileShuffle.scala
	src/scala/spark/CustomBlockedLocalFileShuffle.scala
	src/scala/spark/HttpBlockedLocalFileShuffle.scala
2010-12-19 14:08:39 -08:00
Mosharaf Chowdhury 89172fcd69 Renamed this version of BlockedLocalFileShuffle to CustomBlockedLocalFileShuffle. 2010-12-19 14:05:35 -08:00
Mosharaf Chowdhury a83a722256 Renamed BlockedLocalFileShuffle to HttpBlockedLocalFileShuffle for merging with the mos-shuffle branch. 2010-12-19 14:02:19 -08:00
Mosharaf Chowdhury 62d61ed928 - Reimplemented BlockedLocalFileShuffle without creating too many files.
- Clients now request for byte ranges to the server using an INDEX file.
2010-12-18 14:03:49 -08:00
Mosharaf Chowdhury 5c5d767bc1 Modified MultiBroadcastTest. 2010-12-18 10:40:00 -08:00
Mosharaf Chowdhury d18d08ec9d Added a new BroadcastTest in the examples where 2 broadcasts are required. Should be used to experiment how multiple broadcasts work. 2010-12-17 10:43:49 -08:00
Mosharaf Chowdhury e30fdeb025 Updated GroupByKey example. 2010-12-16 20:30:18 -08:00
Mosharaf Chowdhury a40cbc1904 Code formatting. 2010-12-16 16:54:02 -08:00
Mosharaf Chowdhury ce96d8a7d3 First version of BlockedLocalFileShuffle is in. It works! 2010-12-16 15:15:51 -08:00
Mosharaf Chowdhury fddcdf87c9 Added a small description of how ParallelLFS works. 2010-12-16 11:58:00 -08:00
Mosharaf Chowdhury 77a4017585 Fixed config param naming in ParallelLocalFileShuffle 2010-12-16 11:42:37 -08:00
Mosharaf Chowdhury c5483e39f9 - ParallelLocalFileShuffle does NOT use HttpPipelining at all.
- Config option related to pipelining has been removed.
 - Summary: Basic -> Pipelining / Parallel -> NO pipelining
2010-12-15 22:08:34 -08:00
Mosharaf Chowdhury 56d8a2afa1 - Updated java-opts file of this branch.
- Renamed some ParallelLocalFileShuffle config options for clarity.
2010-12-15 20:56:22 -08:00
Mosharaf Chowdhury 25fb3c4cf6 - Brought back Matei's LocalFileShuffle implementation as BasicLocalFileShuffle
- Renamed parallel-pull version to ParallelLocalFileShuffle
 - Note that setting max-concurrent connections to 1 in ParallelLocalFileShuffle should essentially be the same as BasicLocalFileShuffle
2010-12-15 20:33:28 -08:00
Matei Zaharia 817e722321 Merge branch 'master' of github.com:mesos/spark 2010-12-15 19:40:35 -08:00
Matei Zaharia 14c29c1b14 Fixed import 2010-12-15 19:40:27 -08:00
Mosharaf Chowdhury 5cafdd7ba2 Removed some unused imports from Broadcast.scala 2010-12-15 19:11:23 -08:00
Mosharaf Chowdhury be0ce57de2 - Fixed an compilation error due to wrong 'import' of legacy lzf libraries in DfsBroadcast.scala
- Updated to use ning libraries.
 - Passes all unit tests
2010-12-15 18:34:27 -08:00
Matei Zaharia 5c222dbe28 Merge branch 'master' into mos-bt
Conflicts:
	src/scala/spark/Broadcast.scala
2010-12-15 10:57:39 -08:00
Mosharaf Chowdhury 0a5c24ae3d - Default broadcast mechanism is set to DfsBroadcast
- Configuration parameters are renamed to follow our convention
 - Master now automatically supplies its hostAddress instead of reading from config file
 - sendBroadcast has been removed from the Broadcast trait
2010-12-13 14:36:39 -08:00