spark-instrumented-optimizer/core/src/main
Imran Rashid 93cdb8a7d0 [SPARK-8425][CORE] Application Level Blacklisting
## What changes were proposed in this pull request?

This builds upon the blacklisting introduced in SPARK-17675 to add blacklisting of executors and nodes for an entire Spark application.  Resources are blacklisted based on tasks that fail, in tasksets that eventually complete successfully; they are automatically returned to the pool of active resources based on a timeout.  Full details are available in a design doc attached to the jira.
## How was this patch tested?

Added unit tests, ran them via Jenkins, also ran a handful of them in a loop to check for flakiness.

The added tests include:
- verifying BlacklistTracker works correctly
- verifying TaskSchedulerImpl interacts with BlacklistTracker correctly (via a mock BlacklistTracker)
- an integration test for the entire scheduler with blacklisting in a few different scenarios

Author: Imran Rashid <irashid@cloudera.com>
Author: mwws <wei.mao@intel.com>

Closes #14079 from squito/blacklist-SPARK-8425.
2016-12-15 08:29:56 -06:00
..
java/org/apache/spark [SPARK-18208][SHUFFLE] Executor OOM due to a growing LongArray in BytesToBytesMap 2016-12-07 04:33:30 -08:00
resources/org/apache/spark [SPARK-18816][WEB UI] Executors Logs column only ran visibility check on initial table load 2016-12-13 21:37:46 +00:00
scala/org/apache/spark [SPARK-8425][CORE] Application Level Blacklisting 2016-12-15 08:29:56 -06:00