[SPARK-14180][CORE] Fix a deadlock in CoarseGrainedExecutorBackend Shutdown

## What changes were proposed in this pull request? Call `executor.stop` in a new thread to eliminate deadlock. ## How was this patch tested? Existing unit tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #12012 from zsxwing/SPARK-14180.
2016-03-28 16:23:29 -07:00 · 2016-03-28 16:23:29 -07:00 · 34c0638ee6
parent 328c71161b
commit 34c0638ee6
1 changed files with 9 additions and 3 deletions
--- a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
+++ b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
@ -113,9 +113,15 @@ private[spark] class CoarseGrainedExecutorBackend(

    case Shutdown =>
      stopping.set(true)
-      executor.stop()
-      stop()
-      rpcEnv.shutdown()
+      new Thread("CoarseGrainedExecutorBackend-stop-executor") {
+        override def run(): Unit = {
+          // executor.stop() will call `SparkEnv.stop()` which waits until RpcEnv stops totally.
+          // However, if `executor.stop()` runs in some thread of RpcEnv, RpcEnv won't be able to
+          // stop until `executor.stop()` returns, which becomes a dead-lock (See SPARK-14180).
+          // Therefore, we put this line in a new thread.
+          executor.stop()
+        }
+      }.start()
  }

  override def onDisconnected(remoteAddress: RpcAddress): Unit = {