spark-instrumented-optimizer/sbin
Holden Karau d273a2bb0f [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
This PR is based on an existing/previou PR - https://github.com/apache/spark/pull/19045

### What changes were proposed in this pull request?

This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache.

There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing

### Why are the changes needed?

With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required.

### Does this PR introduce any user-facing change?

There is no API change, however an additional configuration flag is added to enable/disable this behaviour.

### How was this patch tested?

New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s.

Closes #26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-02-14 12:36:52 -08:00
..
decommission-slave.sh [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support 2020-02-14 12:36:52 -08:00
slaves.sh [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) 2015-11-04 10:49:34 +00:00
spark-config.sh [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1 2018-10-31 09:55:03 -07:00
spark-daemon.sh [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support 2020-02-14 12:36:52 -08:00
spark-daemons.sh [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) 2015-11-04 10:49:34 +00:00
start-all.sh [SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts 2016-02-26 22:35:12 -08:00
start-history-server.sh [SPARK-25711][CORE] Improve start-history-server.sh: show usage User-Friendly and remove deprecated options 2018-10-13 13:34:31 -07:00
start-master.sh [SPARK-25712][CORE][MINOR] Improve usage message of start-master.sh and start-slave.sh 2018-10-12 12:42:34 -05:00
start-mesos-dispatcher.sh [SPARK-17944][DEPLOY] sbin/start-* scripts use of hostname -f fail with Solaris 2016-10-22 09:37:53 +01:00
start-mesos-shuffle-service.sh [SPARK-27056][MESOS] Remove start-shuffle-service.sh 2019-03-08 18:51:38 -06:00
start-slave.sh [SPARK-28164] Fix usage description of start-slave.sh 2019-06-26 12:42:33 -05:00
start-slaves.sh [SPARK-17944][DEPLOY] sbin/start-* scripts use of hostname -f fail with Solaris 2016-10-22 09:37:53 +01:00
start-thriftserver.sh [SPARK-25735][CORE][MINOR] Improve start-thriftserver.sh: print clean usage and exit with code 1 2018-10-17 09:56:17 -05:00
stop-all.sh [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) 2015-11-04 10:49:34 +00:00
stop-history-server.sh [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) 2015-11-04 10:49:34 +00:00
stop-master.sh [SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts 2016-02-26 22:35:12 -08:00
stop-mesos-dispatcher.sh [SPARK-13414][MESOS] Allow multiple dispatchers to be launched. 2016-02-20 12:58:47 -08:00
stop-mesos-shuffle-service.sh [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) 2015-11-04 10:49:34 +00:00
stop-slave.sh [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) 2015-11-04 10:49:34 +00:00
stop-slaves.sh [SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts 2016-02-26 22:35:12 -08:00
stop-thriftserver.sh [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) 2015-11-04 10:49:34 +00:00