[SPARK-2403] Catch all errors during serialization in DAGScheduler
https://issues.apache.org/jira/browse/SPARK-2403 Spark hangs for us whenever we forget to register a class with Kryo. This should be a simple fix for that. But let me know if you have a better suggestion. I did not write a new test for this. It would be pretty complicated and I'm not sure it's worthwhile for such a simple change. Let me know if you disagree. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #1329 from darabos/spark-2403 and squashes the following commits: 3aceaad [Daniel Darabos] Print full stack trace for miscellaneous exceptions during serialization. 52c22ba [Daniel Darabos] Only catch NonFatal exceptions. 361e962 [Daniel Darabos] Catch all errors during serialization in DAGScheduler.
This commit is contained in:
parent
cc3e0a14da
commit
c8a2313cdf
|
@ -26,6 +26,7 @@ import scala.concurrent.Await
|
||||||
import scala.concurrent.duration._
|
import scala.concurrent.duration._
|
||||||
import scala.language.postfixOps
|
import scala.language.postfixOps
|
||||||
import scala.reflect.ClassTag
|
import scala.reflect.ClassTag
|
||||||
|
import scala.util.control.NonFatal
|
||||||
|
|
||||||
import akka.actor._
|
import akka.actor._
|
||||||
import akka.actor.OneForOneStrategy
|
import akka.actor.OneForOneStrategy
|
||||||
|
@ -768,6 +769,10 @@ class DAGScheduler(
|
||||||
abortStage(stage, "Task not serializable: " + e.toString)
|
abortStage(stage, "Task not serializable: " + e.toString)
|
||||||
runningStages -= stage
|
runningStages -= stage
|
||||||
return
|
return
|
||||||
|
case NonFatal(e) => // Other exceptions, such as IllegalArgumentException from Kryo.
|
||||||
|
abortStage(stage, s"Task serialization failed: $e\n${e.getStackTraceString}")
|
||||||
|
runningStages -= stage
|
||||||
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
logInfo("Submitting " + tasks.size + " missing tasks from " + stage + " (" + stage.rdd + ")")
|
logInfo("Submitting " + tasks.size + " missing tasks from " + stage + " (" + stage.rdd + ")")
|
||||||
|
|
Loading…
Reference in a new issue