Mosharaf Chowdhury
b566be47d7
Bug fix/update: All the shuffle implementations are using consistent config parameters.
2010-12-30 17:52:01 -08:00
Mosharaf Chowdhury
4545df67cf
Consumption is delayed until everything has been received. Otherwise it interferes with network performance.
2010-12-30 17:10:20 -08:00
Mosharaf Chowdhury
1e26fb3953
CustomBlockedLocalFileShuffle: reducers are reading multiple blocks per connections instead of just one.
...
Sometimes ShuffleServer fails to start for small shuffle data with small block size in local VM. No problem with large block size.
2010-12-30 13:33:34 -08:00
Mosharaf Chowdhury
fb51df0b5b
Added a skewed shuffle test example.
...
Output per mapper is distributed from 1/numMappers to 1 of numKVPairs.
2010-12-29 13:50:43 -08:00
Mosharaf Chowdhury
2fefbe17e4
TreeBroadcast is an extended version of ChainedBroadcast with customizable maxDegree per node. maxDegree = 1 is ChainedBroadcast.
2010-12-28 20:18:49 -08:00
Mosharaf Chowdhury
5540a99ab7
ChainedBroadcast is also reading masterHostAddress from config file until #42 is resolved.
2010-12-28 18:53:59 -08:00
Mosharaf Chowdhury
b23d337c92
Updating reception stats before consuming. Can create trouble if there is any exception during consumption (less likely,) but this frees up splits that threads can connect to instead of idling around.
2010-12-28 16:08:40 -08:00
Mosharaf Chowdhury
5074e8500a
- Implemented TrackedCustomBlockedLocalFileShuffle.
...
- Fixed several bugs. (Copy-paste is the bane of coding :|)
2010-12-28 15:28:43 -08:00
Mosharaf Chowdhury
a8411eb59e
- Moving common stuff to a separate Shuffle object.
...
- Moved ShuffleTrackerStrategy to a separate file.
2010-12-28 12:19:07 -08:00
Mosharaf Chowdhury
1bc10ba64c
Formatting.
2010-12-28 00:47:08 -08:00
Mosharaf Chowdhury
95ebd58d0f
- Made sure only one Leaving notification goes to the tracker per ShuffleClient.
...
- Why ShuffleClient and ShuffleServerThread are crashing is still unknown.
2010-12-27 03:25:50 -08:00
Mosharaf Chowdhury
7ac3463ab8
Bug fix: tracker (running in Spark master) wasn't initializing Shuffle object and was using inconsistent config values.
2010-12-27 02:55:45 -08:00
Mosharaf Chowdhury
44c8fb0873
Fixed closing order of of some of the i/o streams. Bugs remain.
2010-12-27 02:08:03 -08:00
Mosharaf Chowdhury
f859941062
Merge branch 'mos-shuffle-parallel' into mos-shuffle-tracked
...
Conflicts:
conf/java-opts
2010-12-25 23:12:59 -08:00
Mosharaf Chowdhury
6c8e9cb2f9
Consolidated shuffle options.
2010-12-25 23:08:53 -08:00
Mosharaf Chowdhury
20eec59f04
Bug fix + formatting.
2010-12-25 22:54:46 -08:00
Mosharaf Chowdhury
90e467206d
Tracker framework is in place that supports pluggable tracker strategy. There are several bugs along with performance problems.
...
- For larger data shuffle ShuffleServerThread gets "Broken Pipe" and ShuffleClient gets "Connection Reset"
- There is a bug in the accounting counters of BalanceConnectionsShuffleTrackerStrategy. Some of them go below zero while decrementing which is not supposed to happen.
2010-12-25 22:45:50 -08:00
Mosharaf Chowdhury
ba71b61e40
Reading masterHostAddress from config file until #42 has been resolved.
2010-12-25 10:22:22 -08:00
Mosharaf Chowdhury
c1ff210387
Fixed some comments.
2010-12-24 20:05:00 -08:00
Mosharaf Chowdhury
8dc44bfa96
CustomBlockedInMemoryShuffle is an in- memroy implementation of CustomBlockedLFS
2010-12-22 21:06:03 -08:00
Mosharaf Chowdhury
a064835808
CustomBlockedLocalFileShuffle has been added. This is essentially ManualBlockedLocalFileShuffle with our servers.
2010-12-22 19:02:20 -08:00
Mosharaf Chowdhury
3447f903da
Renamed CustomBlockedLocalFileShuffle to ManualBlockedLocalFileShuffle.
...
There will be a new CustomBlockedLocalFileShuffle where 'Custom' will mean ManualBlockedLocalFileShuffle with custom server instead of jetty.
2010-12-22 17:17:33 -08:00
Mosharaf Chowdhury
c484b735bb
Bug squashed. CustomParallelInMemoryShuffle is rocking!
...
We were serializing one (the wrong) thing, trying to deserialize another (the right thing).
2010-12-22 17:03:31 -08:00
Mosharaf Chowdhury
23586d3bef
Added an in-memory implementation of CustomParalleLFS. There is a serialization/deserialization bug in the implementation.
2010-12-22 16:45:26 -08:00
Mosharaf Chowdhury
c4c8f72e98
Fixed an indexing bug in HttpBlockedLocalFileShuffle. It still doesn't work on EC2 with >5 nodes cluster.
2010-12-22 12:48:11 -08:00
Mosharaf Chowdhury
a5a8b7048d
CustomBlockedLocalFileShuffle has separate consumer thread.
2010-12-22 12:04:12 -08:00
Mosharaf Chowdhury
92d2a9a13a
Removed unncessary stuff from HttpParallelLocalFileShuffle
2010-12-22 11:28:50 -08:00
Mosharaf Chowdhury
4ab268ee36
HttpParallelLocalFileShuffle also has a consuming thread. It works on EC2.
2010-12-21 23:50:02 -08:00
Mosharaf Chowdhury
5f7bfbc70e
HttpBlockedLocalFileShuffle has also been converted to have per-reducer consumption thread. Works in local mesos, but NOT on EC2 :|
2010-12-21 23:05:32 -08:00
Mosharaf Chowdhury
5f0cdabd40
Added a separate thread to deserialize (1 thread per reducer) in CustomParallelLocalFileShuffle
...
Upside: No synchronized blocking on "combiners" variable. 3x faster :)
Downside: Inefficient implementation. Requiring too much temporary data. Approx. 2x increase in memory requirement :( Should be fixed at some point.
2010-12-21 21:52:37 -08:00
Mosharaf Chowdhury
f4d0e917a2
Added all the options to the java-opts file. Tired of writing them for separate runs :|
2010-12-21 18:59:51 -08:00
Mosharaf Chowdhury
6ef17e918b
Fixed logging. Again.
2010-12-21 18:49:35 -08:00
Mosharaf Chowdhury
f47fb44479
- Divided maxConnections to max[Rx|Tx]Connections.
...
- Fixed config param loading bug in CustomParallelLFS
2010-12-21 17:34:51 -08:00
Mosharaf Chowdhury
d92b067350
Fixed log message in CustomParallelLocalFileShuffle that was giving some problem in log processing.
2010-12-21 13:12:15 -08:00
Mosharaf Chowdhury
3b21a5fb26
Code formatting...
2010-12-19 18:03:20 -08:00
Mosharaf Chowdhury
81f78282e1
All shuffle implementations are now in the same place. Time to work on new things.
2010-12-19 14:32:40 -08:00
Mosharaf Chowdhury
272c72b405
Merge branch 'mos-shuffle' into mos-shuffle-parallel
...
Conflicts:
conf/java-opts
src/scala/spark/BasicLocalFileShuffle.scala
2010-12-19 14:25:13 -08:00
Mosharaf Chowdhury
ca37e7b33d
Renamed CustomParallelLocalFileShuffle
2010-12-19 14:22:05 -08:00
Mosharaf Chowdhury
864d202cda
Merge branch 'mos-shuffle-parallel-http' into mos-shuffle
...
Conflicts:
conf/java-opts
src/scala/spark/BlockedLocalFileShuffle.scala
src/scala/spark/CustomBlockedLocalFileShuffle.scala
src/scala/spark/HttpBlockedLocalFileShuffle.scala
2010-12-19 14:08:39 -08:00
Mosharaf Chowdhury
89172fcd69
Renamed this version of BlockedLocalFileShuffle to CustomBlockedLocalFileShuffle.
2010-12-19 14:05:35 -08:00
Mosharaf Chowdhury
a83a722256
Renamed BlockedLocalFileShuffle to HttpBlockedLocalFileShuffle for merging with the mos-shuffle branch.
2010-12-19 14:02:19 -08:00
Mosharaf Chowdhury
62d61ed928
- Reimplemented BlockedLocalFileShuffle without creating too many files.
...
- Clients now request for byte ranges to the server using an INDEX file.
2010-12-18 14:03:49 -08:00
Mosharaf Chowdhury
5c5d767bc1
Modified MultiBroadcastTest.
2010-12-18 10:40:00 -08:00
Mosharaf Chowdhury
d18d08ec9d
Added a new BroadcastTest in the examples where 2 broadcasts are required. Should be used to experiment how multiple broadcasts work.
2010-12-17 10:43:49 -08:00
Mosharaf Chowdhury
e30fdeb025
Updated GroupByKey example.
2010-12-16 20:30:18 -08:00
Mosharaf Chowdhury
a40cbc1904
Code formatting.
2010-12-16 16:54:02 -08:00
Mosharaf Chowdhury
ce96d8a7d3
First version of BlockedLocalFileShuffle is in. It works!
2010-12-16 15:15:51 -08:00
Mosharaf Chowdhury
fddcdf87c9
Added a small description of how ParallelLFS works.
2010-12-16 11:58:00 -08:00
Mosharaf Chowdhury
77a4017585
Fixed config param naming in ParallelLocalFileShuffle
2010-12-16 11:42:37 -08:00
Mosharaf Chowdhury
c5483e39f9
- ParallelLocalFileShuffle does NOT use HttpPipelining at all.
...
- Config option related to pipelining has been removed.
- Summary: Basic -> Pipelining / Parallel -> NO pipelining
2010-12-15 22:08:34 -08:00