From 31ef268baec98e5f9ce7e37ad03dd87a7476938e Mon Sep 17 00:00:00 2001 From: Kousuke Saruta Date: Sun, 11 Aug 2019 08:13:19 -0500 Subject: [PATCH] [SPARK-28639][CORE][DOC] Configuration doc for Barrier Execution Mode ## What changes were proposed in this pull request? SPARK-24817 and SPARK-24819 introduced new 3 non-internal properties for barrier-execution mode but they are not documented. So I've added a section into configuration.md for barrier-mode execution. ## How was this patch tested? Built using jekyll and confirm the layout by browser. Closes #25370 from sarutak/barrier-exec-mode-conf-doc. Authored-by: Kousuke Saruta Signed-off-by: Sean Owen --- .../spark/internal/config/package.scala | 2 +- docs/configuration.md | 44 +++++++++++++++++++ 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 214675b6cf..b898413ac8 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -1058,7 +1058,7 @@ package object config { ConfigBuilder("spark.barrier.sync.timeout") .doc("The timeout in seconds for each barrier() call from a barrier task. If the " + "coordinator didn't receive all the sync messages from barrier tasks within the " + - "configed time, throw a SparkException to fail all the tasks. The default value is set " + + "configured time, throw a SparkException to fail all the tasks. The default value is set " + "to 31536000(3600 * 24 * 365) so the barrier() call shall wait for one year.") .timeConf(TimeUnit.SECONDS) .checkValue(v => v > 0, "The value should be a positive time value.") diff --git a/docs/configuration.md b/docs/configuration.md index 84545475ae..aad496dc0a 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -2039,6 +2039,50 @@ Apart from these, the following properties are also available, and may be useful +### Barrier Execution Mode + + + + + + + + + + + + + + + + + + +
Property NameDefaultMeaning
spark.barrier.sync.timeout365d + The timeout in seconds for each barrier() call from a barrier task. If the + coordinator didn't receive all the sync messages from barrier tasks within the + configured time, throw a SparkException to fail all the tasks. The default value is set + to 31536000(3600 * 24 * 365) so the barrier() call shall wait for one year. +
spark.scheduler.barrier.maxConcurrentTasksCheck.interval15s + Time in seconds to wait between a max concurrent tasks check failure and the next + check. A max concurrent tasks check ensures the cluster can launch more concurrent + tasks than required by a barrier stage on job submitted. The check can fail in case + a cluster has just started and not enough executors have registered, so we wait for a + little while and try to perform the check again. If the check fails more than a + configured max failure times for a job then fail current job submission. Note this + config only applies to jobs that contain one or more barrier stages, we won't perform + the check on non-barrier jobs. +
spark.scheduler.barrier.maxConcurrentTasksCheck.maxFailures40 + Number of max concurrent tasks check failures allowed before fail a job submission. + A max concurrent tasks check ensures the cluster can launch more concurrent tasks than + required by a barrier stage on job submitted. The check can fail in case a cluster + has just started and not enough executors have registered, so we wait for a little + while and try to perform the check again. If the check fails more than a configured + max failure times for a job then fail current job submission. Note this config only + applies to jobs that contain one or more barrier stages, we won't perform the check on + non-barrier jobs. +
+ ### Dynamic Allocation