spark-instrumented-optimizer/resource-managers
“attilapiros” 76e5d75e36 [SPARK-33763] Add metrics for better tracking of dynamic allocation
### What changes were proposed in this pull request?

This PR adds the following metrics to track executor remove reasons during dynamic allocation:
-  `numberExecutorsGracefullyDecommissioned`: number of executors which reached the finished decommissioning state and shut itself down cleanly
- `numberExecutorsDecommissionUnfinished`: executors which requested to decommission but they stopped without reaching the finished decommissioning state
- `numberExecutorsKilledByDriver`: executors killed by the driver (requested to stop)
-  `numberExecutorsExitedUnexpectedly`: executors exited without driver request

### Why are the changes needed?

For supporting monitoring of dynamic allocation better with these metrics.

### Does this PR introduce _any_ user-facing change?

Yes. The new metrics will be available for monitoring.

### How was this patch tested?

With unit and integration tests.

Finally manually checked the new metrics in jconsole:
<img width="1054" alt="jmx" src="https://user-images.githubusercontent.com/2017933/107458686-de8adf00-6b54-11eb-86f7-41faf2fb638f.png">

Closes #31450 from attilapiros/SPARK-33763-final.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2021-02-17 13:44:36 -08:00
..
kubernetes [SPARK-33763] Add metrics for better tracking of dynamic allocation 2021-02-17 13:44:36 -08:00
mesos [SPARK-34059][SQL][CORE] Use for/foreach rather than map to make sure execute it eagerly 2021-01-10 15:22:24 -08:00
yarn [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test 2021-01-29 23:54:40 +09:00