From 90c4cf3cf3b574e593e163951935196cf58a5d4e Mon Sep 17 00:00:00 2001 From: yangjie01 Date: Sun, 12 Sep 2021 09:57:06 -0500 Subject: [PATCH] [SPARK-36636][CORE][TEST] LocalSparkCluster change to use tmp workdir in test to avoid directory name collision MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### What changes were proposed in this pull request? As described in SPARK-36636,if the test cases with config `local-cluster[n, c, m]` are run continuously within 1 second, the workdir name collision will occur because appid use format as `app-yyyyMMddHHmmss-0000` and workdir name associated with it in test now, the related logs are as follows: ``` java.io.IOException: Failed to create directory /spark-mine/work/app-20210908074432-0000/1 at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:578) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 21/09/08 22:44:32.266 dispatcher-event-loop-0 INFO Worker: Asked to launch executor app-20210908074432-0000/0 for test 21/09/08 22:44:32.266 dispatcher-event-loop-0 ERROR Worker: Failed to launch executor app-20210908074432-0000/0 for test. java.io.IOException: Failed to create directory /spark-mine/work/app-20210908074432-0000/0 at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:578) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` Since the default value of s`park.deploy.maxExecutorRetries` is 10, the test failed will occur when 5 consecutive cases with local-cluster[3, 1, 1024] are completed within 1 second: 1. case 1: use worker directories: `/app-202109102324-0000/0`, `/app-202109102324-0000/1`, `/app-202109102324-0000/2` 2. case 2: retry 3 times then use worker directories: `/app-202109102324-0000/3`, `/app-202109102324-0000/4`, `/app-202109102324-0000/5` 3. case 3: retry 6 times then use worker directories: `/app-202109102324-0000/6`, `/app-202109102324-0000/7`, `/app-202109102324-0000/8` 4. case 4: retry 9 times then use worker directories: `/app-202109102324-0000/9`, `/app-202109102324-0000/10`, `/app-202109102324-0000/11` 5. case 5: retry more than **10** times then **failed** To avoid this issue, this pr change to use tmp workdir in test with config `local-cluster[n, c, m]`. ### Why are the changes needed? Avoid UT failures caused by continuous workdir name collision. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GA or Jenkins Tests. - Manual test: `build/mvn clean install -Pscala-2.13 -pl core -am` or `build/mvn clean install -pl core -am`, with Scala 2.13 is easier to reproduce this problem **Before** The test failed error logs as follows and randomness in test failure: ``` - SPARK-33084: Add jar support Ivy URI -- test exclude param when transitive=true *** FAILED *** org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at: org.apache.spark.SparkContextSuite.$anonfun$new$134(SparkContextSuite.scala:1101) org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) org.scalatest.Transformer.apply(Transformer.scala:22) org.scalatest.Transformer.apply(Transformer.scala:20) org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) scala.collection.immutable.List.foreach(List.scala:333) at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2647) at scala.Option.foreach(Option.scala:437) at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2644) at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2734) at org.apache.spark.SparkContext.(SparkContext.scala:95) at org.apache.spark.SparkContextSuite.$anonfun$new$138(SparkContextSuite.scala:1109) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) ... - SPARK-33084: Add jar support Ivy URI -- test different version *** FAILED *** org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at: org.apache.spark.SparkContextSuite.$anonfun$new$134(SparkContextSuite.scala:1101) org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) org.scalatest.Transformer.apply(Transformer.scala:22) org.scalatest.Transformer.apply(Transformer.scala:20) org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) scala.collection.immutable.List.foreach(List.scala:333) at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2647) at scala.Option.foreach(Option.scala:437) at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2644) at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2734) at org.apache.spark.SparkContext.(SparkContext.scala:95) at org.apache.spark.SparkContextSuite.$anonfun$new$142(SparkContextSuite.scala:1118) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) ... - SPARK-33084: Add jar support Ivy URI -- test invalid param *** FAILED *** org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at: org.apache.spark.SparkContextSuite.$anonfun$new$134(SparkContextSuite.scala:1101) org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) org.scalatest.Transformer.apply(Transformer.scala:22) org.scalatest.Transformer.apply(Transformer.scala:20) org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) scala.collection.immutable.List.foreach(List.scala:333) at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2647) at scala.Option.foreach(Option.scala:437) at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2644) at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2734) at org.apache.spark.SparkContext.(SparkContext.scala:95) at org.apache.spark.SparkContextSuite.$anonfun$new$146(SparkContextSuite.scala:1129) at org.apache.spark.SparkFunSuite.withLogAppender(SparkFunSuite.scala:235) at org.apache.spark.SparkContextSuite.$anonfun$new$145(SparkContextSuite.scala:1127) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) ... - SPARK-33084: Add jar support Ivy URI -- test multiple transitive params *** FAILED *** org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at: org.apache.spark.SparkContextSuite.$anonfun$new$134(SparkContextSuite.scala:1101) org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) org.scalatest.Transformer.apply(Transformer.scala:22) org.scalatest.Transformer.apply(Transformer.scala:20) org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) scala.collection.immutable.List.foreach(List.scala:333) at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2647) at scala.Option.foreach(Option.scala:437) at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2644) at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2734) at org.apache.spark.SparkContext.(SparkContext.scala:95) at org.apache.spark.SparkContextSuite.$anonfun$new$149(SparkContextSuite.scala:1140) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) ... - SPARK-33084: Add jar support Ivy URI -- test param key case sensitive *** FAILED *** org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at: org.apache.spark.SparkContextSuite.$anonfun$new$134(SparkContextSuite.scala:1101) org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) org.scalatest.Transformer.apply(Transformer.scala:22) org.scalatest.Transformer.apply(Transformer.scala:20) org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) scala.collection.immutable.List.foreach(List.scala:333) at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2647) at scala.Option.foreach(Option.scala:437) at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2644) at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2734) at org.apache.spark.SparkContext.(SparkContext.scala:95) at org.apache.spark.SparkContextSuite.$anonfun$new$154(SparkContextSuite.scala:1155) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) ... - SPARK-33084: Add jar support Ivy URI -- test transitive value case insensitive *** FAILED *** org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at: org.apache.spark.SparkContextSuite.$anonfun$new$134(SparkContextSuite.scala:1101) org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) org.scalatest.Transformer.apply(Transformer.scala:22) org.scalatest.Transformer.apply(Transformer.scala:20) org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) scala.collection.immutable.List.foreach(List.scala:333) at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2647) at scala.Option.foreach(Option.scala:437) at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2644) at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2734) at org.apache.spark.SparkContext.(SparkContext.scala:95) at org.apache.spark.SparkContextSuite.$anonfun$new$159(SparkContextSuite.scala:1166) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) ``` **After** ``` Run completed in 26 minutes, 38 seconds. Total number of tests run: 2863 Suites: completed 276, aborted 0 Tests: succeeded 2863, failed 0, canceled 4, ignored 8, pending 0 All tests passed. ``` Closes #33963 from LuciferYang/SPARK-36636. Authored-by: yangjie01 Signed-off-by: Sean Owen (cherry picked from commit 0e1157df06ba6364ca57be846194996327801ded) Signed-off-by: Sean Owen --- .../scala/org/apache/spark/deploy/LocalSparkCluster.scala | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala b/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala index 33851d9145..b2ed91256f 100644 --- a/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala +++ b/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala @@ -63,8 +63,11 @@ class LocalSparkCluster( /* Start the Workers */ for (workerNum <- 1 to numWorkers) { + val workDir = if (Utils.isTesting) { + Utils.createTempDir(namePrefix = "worker").getAbsolutePath + } else null val workerEnv = Worker.startRpcEnvAndEndpoint(localHostname, 0, 0, coresPerWorker, - memoryPerWorker, masters, null, Some(workerNum), _conf, + memoryPerWorker, masters, workDir, Some(workerNum), _conf, conf.get(config.Worker.SPARK_WORKER_RESOURCE_FILE)) workerRpcEnvs += workerEnv }