Commit graph

151 commits

Author SHA1 Message Date
Mosharaf Chowdhury 92d2a9a13a Removed unncessary stuff from HttpParallelLocalFileShuffle 2010-12-22 11:28:50 -08:00
Mosharaf Chowdhury 4ab268ee36 HttpParallelLocalFileShuffle also has a consuming thread. It works on EC2. 2010-12-21 23:50:02 -08:00
Mosharaf Chowdhury 5f7bfbc70e HttpBlockedLocalFileShuffle has also been converted to have per-reducer consumption thread. Works in local mesos, but NOT on EC2 :| 2010-12-21 23:05:32 -08:00
Mosharaf Chowdhury 5f0cdabd40 Added a separate thread to deserialize (1 thread per reducer) in CustomParallelLocalFileShuffle
Upside: No synchronized blocking on "combiners" variable. 3x faster :)
Downside: Inefficient implementation. Requiring too much temporary data. Approx. 2x increase in memory requirement :( Should be fixed at some point.
2010-12-21 21:52:37 -08:00
Mosharaf Chowdhury f4d0e917a2 Added all the options to the java-opts file. Tired of writing them for separate runs :| 2010-12-21 18:59:51 -08:00
Mosharaf Chowdhury 6ef17e918b Fixed logging. Again. 2010-12-21 18:49:35 -08:00
Mosharaf Chowdhury f47fb44479 - Divided maxConnections to max[Rx|Tx]Connections.
- Fixed config param loading bug in CustomParallelLFS
2010-12-21 17:34:51 -08:00
Mosharaf Chowdhury d92b067350 Fixed log message in CustomParallelLocalFileShuffle that was giving some problem in log processing. 2010-12-21 13:12:15 -08:00
Mosharaf Chowdhury 3b21a5fb26 Code formatting... 2010-12-19 18:03:20 -08:00
Mosharaf Chowdhury 81f78282e1 All shuffle implementations are now in the same place. Time to work on new things. 2010-12-19 14:32:40 -08:00
Mosharaf Chowdhury 272c72b405 Merge branch 'mos-shuffle' into mos-shuffle-parallel
Conflicts:
	conf/java-opts
	src/scala/spark/BasicLocalFileShuffle.scala
2010-12-19 14:25:13 -08:00
Mosharaf Chowdhury ca37e7b33d Renamed CustomParallelLocalFileShuffle 2010-12-19 14:22:05 -08:00
Mosharaf Chowdhury 864d202cda Merge branch 'mos-shuffle-parallel-http' into mos-shuffle
Conflicts:
	conf/java-opts
	src/scala/spark/BlockedLocalFileShuffle.scala
	src/scala/spark/CustomBlockedLocalFileShuffle.scala
	src/scala/spark/HttpBlockedLocalFileShuffle.scala
2010-12-19 14:08:39 -08:00
Mosharaf Chowdhury 89172fcd69 Renamed this version of BlockedLocalFileShuffle to CustomBlockedLocalFileShuffle. 2010-12-19 14:05:35 -08:00
Mosharaf Chowdhury a83a722256 Renamed BlockedLocalFileShuffle to HttpBlockedLocalFileShuffle for merging with the mos-shuffle branch. 2010-12-19 14:02:19 -08:00
Mosharaf Chowdhury 62d61ed928 - Reimplemented BlockedLocalFileShuffle without creating too many files.
- Clients now request for byte ranges to the server using an INDEX file.
2010-12-18 14:03:49 -08:00
Mosharaf Chowdhury e30fdeb025 Updated GroupByKey example. 2010-12-16 20:30:18 -08:00
Mosharaf Chowdhury a40cbc1904 Code formatting. 2010-12-16 16:54:02 -08:00
Mosharaf Chowdhury ce96d8a7d3 First version of BlockedLocalFileShuffle is in. It works! 2010-12-16 15:15:51 -08:00
Mosharaf Chowdhury fddcdf87c9 Added a small description of how ParallelLFS works. 2010-12-16 11:58:00 -08:00
Mosharaf Chowdhury 77a4017585 Fixed config param naming in ParallelLocalFileShuffle 2010-12-16 11:42:37 -08:00
Mosharaf Chowdhury c5483e39f9 - ParallelLocalFileShuffle does NOT use HttpPipelining at all.
- Config option related to pipelining has been removed.
 - Summary: Basic -> Pipelining / Parallel -> NO pipelining
2010-12-15 22:08:34 -08:00
Mosharaf Chowdhury 56d8a2afa1 - Updated java-opts file of this branch.
- Renamed some ParallelLocalFileShuffle config options for clarity.
2010-12-15 20:56:22 -08:00
Mosharaf Chowdhury 25fb3c4cf6 - Brought back Matei's LocalFileShuffle implementation as BasicLocalFileShuffle
- Renamed parallel-pull version to ParallelLocalFileShuffle
 - Note that setting max-concurrent connections to 1 in ParallelLocalFileShuffle should essentially be the same as BasicLocalFileShuffle
2010-12-15 20:33:28 -08:00
Mosharaf Chowdhury f82cc17bc5 UseHttpPipelining option is brought back in. It works! 2010-12-07 10:07:30 -08:00
Mosharaf Chowdhury 7e2d72c328 Multiple connections created at a time. No upper limit on the server side though. 2010-12-04 18:55:55 -08:00
Mosharaf Chowdhury c6df327dd7 Updated logging format. 2010-12-04 16:41:13 -08:00
Mosharaf Chowdhury 7df20d681a Combined MaxRxPeers and MaxTxPeers to a single config parameter MaxConnections 2010-12-04 14:37:16 -08:00
Mosharaf Chowdhury b1745b3103 Removed an unnecessary byte array in the middle. Probabaly will have to bring it back if we do block level data movement. 2010-12-04 13:55:25 -08:00
Mosharaf Chowdhury 3a671ce989 Config parameters are in place. Good to go (I think) 2010-12-04 10:59:06 -08:00
Mosharaf Chowdhury 476a216d9d Parallel is working. Need to fix/finalize some config parameters. 2010-12-04 02:05:41 -08:00
Mosharaf Chowdhury c546c299bc combining is happening inside the thread. Its still synchronized though. 2010-12-04 00:59:25 -08:00
Mosharaf Chowdhury 0d7ca7751e Bug fixes. Not yet parallel. 2010-12-04 00:06:47 -08:00
Mosharaf Chowdhury 52086cef32 Building blocks are in place. Still not pulling parallely though. 2010-12-03 20:29:39 -08:00
Mosharaf Chowdhury 540a41163f UseHttpPipelining is 'true' by default. 2010-12-02 19:56:17 -08:00
Mosharaf Chowdhury 0de859fbe2 Enabling/disabling HTTP pipelining is a config option now. Performance tradeoffs are not obvious yet. 2010-12-02 02:32:44 -08:00
Mosharaf Chowdhury 8494b3a4f9 - Added log messages for benchmarking.
- Added GroupByTest.scala for benchmarking.
2010-11-27 23:51:43 -08:00
Matei Zaharia f8ea98d989 Remove -unchecked compiler parameter 2010-11-13 18:39:07 -08:00
Matei Zaharia f8966ffc11 Added a shuffle test with negative hash codes for some keys (this was a bug earlier) 2010-11-12 16:18:45 -08:00
Matei Zaharia d0a9966555 Unit tests for shuffle operations. Fixes #33. 2010-11-12 16:12:14 -08:00
Matei Zaharia 7b25ab87af Added options for using an external HTTP server with LocalFileShuffle 2010-11-09 13:46:30 -08:00
Matei Zaharia 504f839c65 Removed unnecessary collectAsMap 2010-11-08 08:49:42 -08:00
Matei Zaharia 9d3f05a990 Made shuffle algorithm pluggable and added LocalFileShuffle. 2010-11-08 00:46:12 -08:00
Matei Zaharia d9ea6d69a5 Create output files one by one instead of at the same time in the map
phase of DfsShuffle.
2010-11-06 10:53:57 -07:00
Matei Zaharia 16ff4dc0be Merge branch 'matei-shuffle' of github.com:mesos/spark into matei-shuffle 2010-11-04 14:40:36 -07:00
Matei Zaharia d984b8ab23 Properly set the number of output splits in DFS shuffle 2010-11-04 14:39:55 -07:00
root 4cc0984b43 Fixed a small bug in DFS shuffle -- the number of reduce tasks was not being set based on numOutputSplits 2010-11-04 21:34:55 +00:00
Matei Zaharia 96f0be935a Added groupBy function in RDD 2010-11-03 23:58:53 -07:00
Matei Zaharia 72ec298cd4 Added reduceByKey, groupByKey and join operations based on combine, as
well as versions of the shuffle operations that set the number of splits
automatically.
2010-11-03 23:51:11 -07:00
Matei Zaharia d947cb9778 Fixed a bug with negative hashcodes 2010-11-03 22:52:41 -07:00