[SPARK-5724] fix the misconfiguration in AkkaUtils

https://issues.apache.org/jira/browse/SPARK-5724 In AkkaUtil, we set several failure detector related the parameters as following ``` al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String]) .withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString( s""" |akka.daemonic = on |akka.loggers = [""akka.event.slf4j.Slf4jLogger""] |akka.stdout-loglevel = "ERROR" |akka.jvm-exit-on-fatal-error = off |akka.remote.require-cookie = "$requireCookie" |akka.remote.secure-cookie = "$secureCookie" |akka.remote.transport-failure-detector.heartbeat-interval = $akkaHeartBeatInterval s |akka.remote.transport-failure-detector.acceptable-heartbeat-pause = $akkaHeartBeatPauses s |akka.remote.transport-failure-detector.threshold = $akkaFailureDetector |akka.actor.provider = "akka.remote.RemoteActorRefProvider" |akka.remote.netty.tcp.transport-class = "akka.remote.transport.netty.NettyTransport" |akka.remote.netty.tcp.hostname = "$host" |akka.remote.netty.tcp.port = $port |akka.remote.netty.tcp.tcp-nodelay = on |akka.remote.netty.tcp.connection-timeout = $akkaTimeout s |akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B |akka.remote.netty.tcp.execution-pool-size = $akkaThreads |akka.actor.default-dispatcher.throughput = $akkaBatchSize |akka.log-config-on-start = $logAkkaConfig |akka.remote.log-remote-lifecycle-events = $lifecycleEvents |akka.log-dead-letters = $lifecycleEvents |akka.log-dead-letters-during-shutdown = $lifecycleEvents """.stripMargin)) ``` Actually, we do not have any parameter naming "akka.remote.transport-failure-detector.threshold" see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html what we have is "akka.remote.watch-failure-detector.threshold" Author: CodingCat <zhunansjtu@gmail.com> Closes #4512 from CodingCat/SPARK-5724 and squashes the following commits: bafe56e [CodingCat] fix the grammar in configuration doc 338296e [CodingCat] remove failure-detector related info 8bfcfd4 [CodingCat] fix the misconfiguration in AkkaUtils
2015-02-23 11:29:25 +00:00 · 2015-02-23 11:29:25 +00:00 · 242d49584c
parent 757b14b862
commit 242d49584c
2 changed files with 12 additions and 27 deletions
--- a/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala
+++ b/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala
@ -79,8 +79,6 @@ private[spark] object AkkaUtils extends Logging {
    val logAkkaConfig = if (conf.getBoolean("spark.akka.logAkkaConfig", false)) "on" else "off"

    val akkaHeartBeatPauses = conf.getInt("spark.akka.heartbeat.pauses", 6000)
-    val akkaFailureDetector =
-      conf.getDouble("spark.akka.failure-detector.threshold", 300.0)
    val akkaHeartBeatInterval = conf.getInt("spark.akka.heartbeat.interval", 1000)

    val secretKey = securityManager.getSecretKey()
@ -106,7 +104,6 @@ private[spark] object AkkaUtils extends Logging {
      |akka.remote.secure-cookie = "$secureCookie"
      |akka.remote.transport-failure-detector.heartbeat-interval = $akkaHeartBeatInterval s
      |akka.remote.transport-failure-detector.acceptable-heartbeat-pause = $akkaHeartBeatPauses s
-      |akka.remote.transport-failure-detector.threshold = $akkaFailureDetector
      |akka.actor.provider = "akka.remote.RemoteActorRefProvider"
      |akka.remote.netty.tcp.transport-class = "akka.remote.transport.netty.NettyTransport"
      |akka.remote.netty.tcp.hostname = "$host"
--- a/docs/configuration.md
+++ b/docs/configuration.md
@ -903,36 +903,24 @@ Apart from these, the following properties are also available, and may be useful
  <td><code>spark.akka.heartbeat.pauses</code></td>
  <td>6000</td>
  <td>
-     This is set to a larger value to disable failure detector that comes inbuilt akka. It can be
-     enabled again, if you plan to use this feature (Not recommended). Acceptable heart beat pause
-     in seconds for akka. This can be used to control sensitivity to gc pauses. Tune this in
-     combination of `spark.akka.heartbeat.interval` and `spark.akka.failure-detector.threshold`
-     if you need to.
-  </td>
-</tr>
-<tr>
-  <td><code>spark.akka.failure-detector.threshold</code></td>
-  <td>300.0</td>
-  <td>
-     This is set to a larger value to disable failure detector that comes inbuilt akka. It can be
-     enabled again, if you plan to use this feature (Not recommended). This maps to akka's
-     `akka.remote.transport-failure-detector.threshold`. Tune this in combination of
-     `spark.akka.heartbeat.pauses` and `spark.akka.heartbeat.interval` if you need to.
+     This is set to a larger value to disable the transport failure detector that comes built in to Akka.
+     It can be enabled again, if you plan to use this feature (Not recommended). Acceptable heart 
+     beat pause in seconds for Akka. This can be used to control sensitivity to GC pauses. Tune
+     this along with `spark.akka.heartbeat.interval` if you need to.
  </td>
 </tr>
 <tr>
  <td><code>spark.akka.heartbeat.interval</code></td>
  <td>1000</td>
  <td>
-    This is set to a larger value to disable failure detector that comes inbuilt akka. It can be
-    enabled again, if you plan to use this feature (Not recommended). A larger interval value in
-    seconds reduces network overhead and a smaller value ( ~ 1 s) might be more informative for
-    akka's failure detector. Tune this in combination of `spark.akka.heartbeat.pauses` and
-    `spark.akka.failure-detector.threshold` if you need to. Only positive use case for using
-    failure detector can be, a sensistive failure detector can help evict rogue executors really
-    quick. However this is usually not the case as gc pauses and network lags are expected in a
-    real Spark cluster. Apart from that enabling this leads to a lot of exchanges of heart beats
-    between nodes leading to flooding the network with those.
+    This is set to a larger value to disable the transport failure detector that comes built in to Akka.
+    It can be enabled again, if you plan to use this feature (Not recommended). A larger interval 
+    value in seconds reduces network overhead and a smaller value ( ~ 1 s) might be more informative 
+    for Akka's failure detector. Tune this in combination of `spark.akka.heartbeat.pauses` if you need
+    to. A likely positive use case for using failure detector would be: a sensistive failure detector
+    can help evict rogue executors quickly. However this is usually not the case as GC pauses
+    and network lags are expected in a real Spark cluster. Apart from that enabling this leads to 
+    a lot of exchanges of heart beats between nodes leading to flooding the network with those.
  </td>
 </tr>
 <tr>