spark-instrumented-optimizer/common
yi.wu e6fec33f18 [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled
### What changes were proposed in this pull request?

This PR adds support to read host-local shuffle data from disk directly when external shuffle service is disabled.

Similar to #25299, we first try to get local disk directories for the shuffle data, which is located at the same host with the current executor. The only difference is, in #25299, it gets the directories from the external shuffle service while in this PR, it gets the directory from the executors.

To implement the feature, this PR extends the `HostLocalDirManager ` for both `ExternalBlockStoreClient` and `NettyBlockTransferService`. Also, this PR adds `getHostLocalDirs` for `NettyBlockTransferService` as `ExternalBlockStoreClient` does, in order to send the get-dir-request to the corresponding executor. And this PR resued the request message`GetLocalDirsForExecutors` for simple.

### Why are the changes needed?

After SPARK-27651 / #25299, Spark can read host-local shuffle data directly from disk when external shuffle service is enabled. To extend the future, we can also support it when the external shuffle service is disabled.

### Does this PR introduce _any_ user-facing change?

Yes. Before this PR, to use the host-local shuffle reading feature, users should not only enable `spark.shuffle.readHostLocalDisk` but also `spark.shuffle.service.enabled`. After this PR, enable `spark.shuffle.readHostLocalDisk` should be enough, and external shuffle service is no longer a pre-requirement.

### How was this patch tested?

Added test and tested manually.

Closes #28911 from Ngone51/support_node_local_shuffle.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-09-02 13:03:44 -07:00
..
kvstore [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore 2020-07-22 13:27:34 +09:00
network-common [MINOR][DOCS] fix typo for docs,log message and comments 2020-08-22 06:45:35 +09:00
network-shuffle [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled 2020-09-02 13:03:44 -07:00
network-yarn [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system 2020-05-08 15:50:19 -07:00
sketch [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+ 2020-07-23 16:20:17 -07:00
tags [SPARK-32245][INFRA] Run Spark tests in Github Actions 2020-07-11 13:09:06 -07:00
unsafe [SPARK-32559][SQL] Fix the trim logic in UTF8String.toInt/toLong did't handle non-ASCII characters correctly 2020-08-07 05:00:33 +00:00