History

“attilapiros” 0ec95bb7df [SPARK-22577][CORE] executor page blacklist status should update with TaskSet level blacklisting ## What changes were proposed in this pull request? In this PR stage blacklisting is propagated to UI by introducing a new Spark listener event (SparkListenerExecutorBlacklistedForStage) which indicates the executor is blacklisted for a stage. Either because of the number of failures are exceeded a limit given for an executor (spark.blacklist.stage.maxFailedTasksPerExecutor) or because of the whole node is blacklisted for a stage (spark.blacklist.stage.maxFailedExecutorsPerNode). In case of the node is blacklisting all executors will listed as blacklisted for the stage. Blacklisting state for a selected stage can be seen "Aggregated Metrics by Executor" table's blacklisting column, where after this change three possible labels could be found: - "for application": when the executor is blacklisted for the application (see the configuration spark.blacklist.application.maxFailedTasksPerExecutor for details) - "for stage": when the executor is only blacklisted for the stage - "false" : when the executor is not blacklisted at all ## How was this patch tested? It is tested both manually and with unit tests. #### Unit tests - HistoryServerSuite - TaskSetBlacklistSuite - AppStatusListenerSuite #### Manual test for executor blacklisting Running Spark as a local cluster: ``` $ bin/spark-shell --master "local-cluster[2,1,1024]" --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=10" --conf "spark.eventLog.enabled=true" ``` Executing: ``` scala import org.apache.spark.SparkEnv sc.parallelize(1 to 10, 10).map { x => if (SparkEnv.get.executorId == "0") throw new RuntimeException("Bad executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() ``` To see result check the "Aggregated Metrics by Executor" section at the bottom of picture: ![UI screenshot for stage level blacklisting executor](https://issues.apache.org/jira/secure/attachment/12905283/stage_blacklisting.png) #### Manual test for node blacklisting Running Spark as on a cluster: ``` bash ./bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=8 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=10" --conf "spark.eventLog.enabled=true" ``` And the job was: ``` scala import org.apache.spark.SparkEnv sc.parallelize(1 to 10000, 10).map { x => if (SparkEnv.get.executorId.toInt >= 4) throw new RuntimeException("Bad executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() ``` The result is: ![UI screenshot for stage level node blacklisting](https://issues.apache.org/jira/secure/attachment/12906833/node_blacklisting_for_stage.png) Here you can see apiros3.gce.test.com was node blacklisted for the stage because of failures on executor 4 and 5. As expected executor 3 is also blacklisted even it has no failures itself but sharing the node with 4 and 5. Author: “attilapiros” <piros.attila.zsolt@gmail.com> Author: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com> Closes #20203 from attilapiros/SPARK-22577.		2018-01-24 11:34:59 -06:00
..
create-release	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
deps	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
sparktestsupport	[SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext and Catalog in PySpark	2018-01-18 14:51:05 +09:00
tests	[SPARK-10359] Enumerate dependencies in a file and diff against it for new pull requests	2015-12-30 12:47:42 -08:00
.gitignore	[SPARK-6219] Reuse pep8.py	2015-04-18 16:46:28 -07:00
.rat-excludes	[SPARK-22577][CORE] executor page blacklist status should update with TaskSet level blacklisting	2018-01-24 11:34:59 -06:00
appveyor-guide.md	[SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate building and testing on Windows (currently SparkR only)	2016-09-08 08:26:59 -07:00
appveyor-install-dependencies.ps1	[MINOR][BUILD] Download RAT and R version info over HTTPS; use RAT 0.12	2017-08-12 14:31:05 +09:00
change-scala-version.sh	[SPARK-19810][BUILD][CORE] Remove support for Scala 2.10	2017-07-13 17:06:24 +08:00
check-license	[SPARK-22511][BUILD] Update maven central repo address	2017-11-14 17:58:07 -06:00
checkstyle-suppressions.xml	[HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle	2017-09-27 13:40:21 -07:00
checkstyle.xml	[HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle	2017-09-27 13:40:21 -07:00
github_jira_sync.py	[MINOR] Fix a bunch of typos	2018-01-02 07:10:19 +09:00
lint-java	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
lint-python	[SPARK-23174][BUILD][PYTHON] python code style checker update	2018-01-24 21:13:47 +09:00
lint-r	[SPARK-10328] [SPARKR] Fix generic for na.omit	2015-08-28 00:37:50 -07:00
lint-r.R	[SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r	2017-10-01 18:42:45 +09:00
lint-scala	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	2014-08-06 12:58:24 -07:00
make-distribution.sh	[SPARK-22777][SCHEDULER] Kubernetes mode dockerfile permission and distribution	2017-12-18 15:31:47 -08:00
merge_spark_pr.py	[SPARK-23044] Error handling for jira assignment	2018-01-16 16:25:10 -08:00
mima	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
pip-sanity-check.py	[SPARK-19064][PYSPARK] Fix pip installing of sub components	2017-01-25 14:43:39 -08:00
README.md	Merge pull request #565 from pwendell/dev-scripts. Closes #565 .	2014-02-08 23:13:34 -08:00
requirements.txt	[SPARK-19064][PYSPARK] Fix pip installing of sub components	2017-01-25 14:43:39 -08:00
run-pip-tests	Revert "[SPARK-13534][PYSPARK] Using Apache Arrow to increase performance of DataFrame.toPandas"	2017-06-28 14:28:40 +08:00
run-tests	[SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7	2017-10-22 02:22:35 +09:00
run-tests-jenkins	[SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7	2017-10-22 02:22:35 +09:00
run-tests-jenkins.py	[SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT	2018-01-13 00:37:59 +08:00
run-tests.py	[SPARK-23174][BUILD][PYTHON] python code style checker update	2018-01-24 21:13:47 +09:00
scalastyle	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
test-dependencies.sh	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
tox.ini	[SPARK-23174][BUILD][PYTHON] python code style checker update	2018-01-24 21:13:47 +09:00

README.md

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.