spark-instrumented-optimizer/common/network-shuffle
Li Yichao d107b3b910 [SPARK-20640][CORE] Make rpc timeout and retry for shuffle registration configurable.
## What changes were proposed in this pull request?

Currently the shuffle service registration timeout and retry has been hardcoded. This works well for small workloads but under heavy workload when the shuffle service is busy transferring large amount of data we see significant delay in responding to the registration request, as a result we often see the executors fail to register with the shuffle service, eventually failing the job. We need to make these two parameters configurable.

## How was this patch tested?

* Updated `BlockManagerSuite` to test registration timeout and max attempts configuration actually works.

cc sitalkedia

Author: Li Yichao <lyc@zhihu.com>

Closes #18092 from liyichao/SPARK-20640.
2017-06-21 21:54:29 +08:00
..
src [SPARK-20640][CORE] Make rpc timeout and retry for shuffle registration configurable. 2017-06-21 21:54:29 +08:00
pom.xml [SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT 2017-04-24 21:48:04 -07:00