spark-instrumented-optimizer

History

Dhruve Ashar fdd3bace1d [SPARK-22148][SPARK-15815][SCHEDULER] Acquire new executors to avoid hang because of blacklisting ## What changes were proposed in this pull request? Every time a task is unschedulable because of the condition where no. of task failures < no. of executors available, we currently abort the taskSet - failing the job. This change tries to acquire new executors so that we can complete the job successfully. We try to acquire a new executor only when we can kill an existing idle executor. We fallback to the older implementation where we abort the job if we cannot find an idle executor. ## How was this patch tested? I performed some manual tests to check and validate the behavior. ```scala val rdd = sc.parallelize(Seq(1 to 10), 3) import org.apache.spark.TaskContext val mapped = rdd.mapPartitionsWithIndex ( (index, iterator) => { if (index == 2) { Thread.sleep(30 * 1000); val attemptNum = TaskContext.get.attemptNumber; if (attemptNum < 3) throw new Exception("Fail for blacklisting")}; iterator.toList.map (x => x + " -> " + index).iterator } ) mapped.collect ``` Closes #22288 from dhruve/bug/SPARK-22148. Lead-authored-by: Dhruve Ashar <dhruveashar@gmail.com> Co-authored-by: Dhruve Ashar <dhruve@users.noreply.github.com> Co-authored-by: Tom Graves <tgraves@apache.org> Signed-off-by: Thomas Graves <tgraves@apache.org>		2018-11-06 08:25:32 -06:00
..
java	[MINOR] Fix typos and misspellings	2018-11-05 17:34:23 -06:00
resources	[SPARK-23429][CORE] Add executor memory metrics to heartbeat and expose in executors REST API	2018-09-07 10:42:46 -07:00
scala/org/apache	[SPARK-22148][SPARK-15815][SCHEDULER] Acquire new executors to avoid hang because of blacklisting	2018-11-06 08:25:32 -06:00