From 0d4e4df06105cf2985dde17c1af76093b3ae8c13 Mon Sep 17 00:00:00 2001 From: "yi.wu" Date: Wed, 15 Apr 2020 11:29:55 -0700 Subject: [PATCH] [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone ### What changes were proposed in this pull request? Update the document and shell script to warn user about the deprecation of multiple workers on the same host support. ### Why are the changes needed? This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0. ### Does this PR introduce any user-facing change? Yeah, user see warning when they run start worker script. ### How was this patch tested? Tested manually. Closes #27768 from Ngone51/deprecate_spark_worker_instances. Authored-by: yi.wu Signed-off-by: Xingbo Jiang --- docs/core-migration-guide.md | 2 ++ docs/hardware-provisioning.md | 8 ++++---- sbin/start-slave.sh | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md index 66a489bcc8..cde6e070c5 100644 --- a/docs/core-migration-guide.md +++ b/docs/core-migration-guide.md @@ -38,3 +38,5 @@ license: | - Event log file will be written as UTF-8 encoding, and Spark History Server will replay event log files as UTF-8 encoding. Previously Spark wrote the event log file as default charset of driver JVM process, so Spark History Server of Spark 2.x is needed to read the old event log files in case of incompatible encoding. - A new protocol for fetching shuffle blocks is used. It's recommended that external shuffle services be upgraded when running Spark 3.0 apps. You can still use old external shuffle services by setting the configuration `spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into errors with messages like `IllegalArgumentException: Unexpected message type: `. + +- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended to launch multiple executors in one worker and launch one worker per node instead of launching multiple workers per node and launching one executor per worker. diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md index 4e5d681962..fc87995f98 100644 --- a/docs/hardware-provisioning.md +++ b/docs/hardware-provisioning.md @@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level and serialization fo the [tuning guide](tuning.html) for tips on how to reduce it. Finally, note that the Java VM does not always behave well with more than 200 GiB of RAM. If you -purchase machines with more RAM than this, you can run _multiple worker JVMs per node_. In -Spark's [standalone mode](spark-standalone.html), you can set the number of workers per node -with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the number of cores -per worker with `SPARK_WORKER_CORES`. +purchase machines with more RAM than this, you can launch multiple executors in a single node. In +Spark's [standalone mode](spark-standalone.html), a worker is responsible for launching multiple +executors according to its available memory and cores, and each executor will be launched in a +separate Java VM. # Network diff --git a/sbin/start-slave.sh b/sbin/start-slave.sh index 2cb17a04f6..9b3b26b078 100755 --- a/sbin/start-slave.sh +++ b/sbin/start-slave.sh @@ -22,7 +22,7 @@ # Environment Variables # # SPARK_WORKER_INSTANCES The number of worker instances to run on this -# slave. Default is 1. +# slave. Default is 1. Note it has been deprecate since Spark 3.0. # SPARK_WORKER_PORT The base port number for the first worker. If set, # subsequent workers will increment this number. If # unset, Spark will find a valid port number, but