spark-instrumented-optimizer/project
sychen 38263f6d15 [SPARK-27630][CORE] Properly handle task end events from completed stages
## What changes were proposed in this pull request?
Track tasks separately for each stage attempt (instead of tracking by stage), and do NOT reset the numRunningTasks to 0 on StageCompleted.

In the case of stage retry, the `taskEnd` event from the zombie stage sometimes makes the number of `totalRunningTasks` negative, which will causes the job to get stuck.
Similar problem also exists with `stageIdToTaskIndices` & `stageIdToSpeculativeTaskIndices`.
If it is a failed `taskEnd` event of the zombie stage, this will cause `stageIdToTaskIndices` or `stageIdToSpeculativeTaskIndices` to remove the task index of the active stage, and the number of `totalPendingTasks` will increase unexpectedly.
## How was this patch tested?
unit test properly handle task end events from completed stages

Closes #24497 from cxzl25/fix_stuck_job_follow_up.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2019-06-25 14:30:13 -05:00
..
build.properties [SPARK-26317][BUILD] Upgrade SBT to 0.13.18 2018-12-10 12:04:44 -08:00
MimaBuild.scala [SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 2018-11-14 16:22:23 -08:00
MimaExcludes.scala [SPARK-27630][CORE] Properly handle task end events from completed stages 2019-06-25 14:30:13 -05:00
plugins.sbt [SPARK-27493][BUILD][FOLLOWUP] Upgrade ASM to 7.1 in plugins.sbt 2019-04-23 18:18:02 +00:00
SparkBuild.scala [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies 2019-06-09 00:26:26 -07:00