[SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite

### What changes were proposed in this pull request?

This PR proposes to show the actual traceback when "handle large number of containers and tasks (SPARK-18750)" test fails in `LocalityPlacementStrategySuite`.

**It does not fully resolve the JIRA SPARK-31746 yet**. I tried to reproduce in my local by controlling the factors in the tests but I couldn't. I double checked the changes in SPARK-18750 are still valid.

### Why are the changes needed?

This test is flaky for an unknown reason (see https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122768/testReport/org.apache.spark.deploy.yarn/LocalityPlacementStrategySuite/handle_large_number_of_containers_and_tasks__SPARK_18750_/):

```
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: java.lang.StackOverflowError did not equal null
	at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
	at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
	at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
	at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
```

After this PR, it will help to investigate the root cause:

**Before**:

```
[info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (824 milliseconds)
[info]   java.lang.StackOverflowError did not equal null (LocalityPlacementStrategySuite.scala:49)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
[info]   at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
[info]   at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.$anonfun$new$1(LocalityPlacementStrategySuite.scala:49)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:157)
[info]   at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
[info]   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
[info]   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
[info]   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
...
```

**After**:

```
[info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (825 milliseconds)
[info]   StackOverflowError should not be thrown; however, got:
[info]
[info]    java.lang.StackOverflowError
[info]   	at scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:256)
[info]   	at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256)
[info]   	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
[info]   	at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256)
[info]   	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
[info]   	at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256)
[info]   	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
[info]   	at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256)
[info]   	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
[info]   	at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256)
[info]   	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
[info]   	at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256)
[info]   	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
[info]   	at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256)
[info]   	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
[info]   	at scala.collection.MapLike$MappedValues.$anonfun$foreach$3(MapLike.scala:256)
...

```

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested by reverting 76db394f2b locally.

Closes #28566 from HyukjinKwon/SPARK-31746.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
HyukjinKwon 2020-05-18 14:35:02 +09:00
parent dc01b7556f
commit 3bf7bf99e9

View file

@ -17,7 +17,8 @@
package org.apache.spark.deploy.yarn
import scala.collection.JavaConverters._
import java.io.{PrintWriter, StringWriter}
import scala.collection.mutable.{HashMap, HashSet, Set}
import org.apache.hadoop.yarn.api.records._
@ -46,7 +47,11 @@ class LocalityPlacementStrategySuite extends SparkFunSuite {
thread.start()
thread.join()
assert(error === null)
if (error != null) {
val errors = new StringWriter()
error.printStackTrace(new PrintWriter(errors))
fail(s"StackOverflowError should not be thrown; however, got:\n\n$errors")
}
}
private def runTest(): Unit = {
@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite {
// goal is to create enough requests for localized containers (so there should be many
// tasks on several hosts that have no allocated containers).
val resource = Resource.newInstance(8 * 1024, 4)
val strategy = new LocalityPreferredContainerPlacementStrategy(new SparkConf(),
yarnConf, new MockResolver())