[SPARK-31379][CORE][TEST] Fix flaky o.a.s.scheduler.CoarseGrainedSchedulerBackendSuite.extra resources from executor

### What changes were proposed in this pull request?

This PR (SPARK-31379) adds one line `when(ts.resourceOffers(any[IndexedSeq[WorkerOffer]])).thenReturn(Seq.empty)` to avoid allocating resources.

### Why are the changes needed?

The test is flaky and here's part of error stack:

```
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException:
The code passed to eventually never returned normally. Attempted 325 times over 5.01070979 seconds.
Last failure message: ArrayBuffer("1", "3") did not equal Array("0", "1", "3").
...
org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.eventually(CoarseGrainedSchedulerBackendSuite.scala:45)
```

You can check [here](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120786/testReport/org.apache.spark.scheduler/CoarseGrainedSchedulerBackendSuite/extra_resources_from_executor/) for details.

And it is flaky because:  after sending `StatusUpdate` to `CoarseGrainedSchedulerBackend`, `CoarseGrainedSchedulerBackend` will call `makeOffer` immediately once releasing the resources. So, it's possible that `availableAddrs` has allocated again before we assert `execResources(GPU).availableAddrs.sorted === Array("0", "1", "3")`.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

The issue can be stably reproduced by inserting `Thread.sleep(3000)` after the line of sending `StatusUpdate`. After applying this fix, the issue is gone.

Closes #28145 from Ngone51/fix_flaky.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
yi.wu 2020-04-08 17:54:28 +09:00 committed by HyukjinKwon
parent a3d83948b8
commit a2789c2a51

View file

@ -258,6 +258,9 @@ class CoarseGrainedSchedulerBackendSuite extends SparkFunSuite with LocalSparkCo
assert(execResources(GPU).assignedAddrs === Array("0"))
}
// To avoid allocating any resources immediately after releasing the resource from the task to
// make sure that `availableAddrs` below won't change
when(ts.resourceOffers(any[IndexedSeq[WorkerOffer]])).thenReturn(Seq.empty)
backend.driverEndpoint.send(
StatusUpdate("1", 1, TaskState.FINISHED, buffer, taskResources))