[SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone

### What changes were proposed in this pull request? Update the document and shell script to warn user about the deprecation of multiple workers on the same host support. ### Why are the changes needed? This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0. ### Does this PR introduce any user-facing change? Yeah, user see warning when they run start worker script. ### How was this patch tested? Tested manually. Closes #27768 from Ngone51/deprecate_spark_worker_instances. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2020-04-15 11:29:55 -07:00 · 2020-04-15 11:29:55 -07:00 · 0d4e4df061
parent 2b10d70bad
commit 0d4e4df061
3 changed files with 7 additions and 5 deletions
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@ -38,3 +38,5 @@ license: |
 - Event log file will be written as UTF-8 encoding, and Spark History Server will replay event log files as UTF-8 encoding. Previously Spark wrote the event log file as default charset of driver JVM process, so Spark History Server of Spark 2.x is needed to read the old event log files in case of incompatible encoding.

 - A new protocol for fetching shuffle blocks is used. It's recommended that external shuffle services be upgraded when running Spark 3.0 apps. You can still use old external shuffle services by setting the configuration `spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into errors with messages like `IllegalArgumentException: Unexpected message type: <number>`.
+
+- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended to launch multiple executors in one worker and launch one worker per node instead of launching multiple workers per node and launching one executor per worker.
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level and serialization fo
 the [tuning guide](tuning.html) for tips on how to reduce it.

 Finally, note that the Java VM does not always behave well with more than 200 GiB of RAM. If you
-purchase machines with more RAM than this, you can run _multiple worker JVMs per node_. In
-Spark's [standalone mode](spark-standalone.html), you can set the number of workers per node
-with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the number of cores
-per worker with `SPARK_WORKER_CORES`.
+purchase machines with more RAM than this, you can launch multiple executors in a single node. In
+Spark's [standalone mode](spark-standalone.html), a worker is responsible for launching multiple
+executors according to its available memory and cores, and each executor will be launched in a
+separate Java VM.

 # Network

--- a/sbin/start-slave.sh
+++ b/sbin/start-slave.sh
@ -22,7 +22,7 @@
 # Environment Variables
 #
 #   SPARK_WORKER_INSTANCES  The number of worker instances to run on this
-#                           slave.  Default is 1.
+#                           slave.  Default is 1. Note it has been deprecate since Spark 3.0.
 #   SPARK_WORKER_PORT       The base port number for the first worker. If set,
 #                           subsequent workers will increment this number.  If
 #                           unset, Spark will find a valid port number, but