spark-instrumented-optimizer/resource-managers
Shanyu Zhao 3c34e45df4 [SPARK-31029] Avoid using global execution context in driver main thread for YarnSchedulerBackend
#31029 # What changes were proposed in this pull request?
In YarnSchedulerBackend, we should avoid using the global execution context for its Future. Otherwise if user's Spark application also uses global execution context for its Future, the user is facing indeterministic behavior in terms of the thread's context class loader.

### Why are the changes needed?
When running tpc-ds test (https://github.com/databricks/spark-sql-perf), occasionally we see error related to class not found:

2020-02-04 20:00:26,673 ERROR yarn.ApplicationMaster: User class threw exception: scala.ScalaReflectionException: class com.databricks.spark.sql.perf.ExperimentRun in JavaMirror with
sun.misc.Launcher$AppClassLoader28ba21f3 of type class sun.misc.Launcher$AppClassLoader with classpath [...]
and parent being sun.misc.Launcher$ExtClassLoader3ff5d147 of type class sun.misc.Launcher$ExtClassLoader with classpath [...]
and parent being primordial classloader with boot classpath [...] not found.

This is the root cause for the problem:

Spark driver starts ApplicationMaster in the main thread, which starts a user thread and set MutableURLClassLoader to that thread's ContextClassLoader.
  userClassThread = startUserApplication()

The main thread then setup YarnSchedulerBackend RPC endpoints, which handles these calls using scala Future with the default global ExecutionContext:
  doRequestTotalExecutors
  doKillExecutors

So for the main thread and user thread, whoever starts the future first get a chance to set ContextClassLoader to the default thread pool:

- If main thread starts a future to handle doKillExecutors() before user thread does then the default thread pool thread's ContextClassLoader would be the default (AppClassLoader).
- If user thread starts a future first then the thread pool thread will have MutableURLClassLoader.

Note that only MutableURLClassLoader can load user provided class for the Spark app, you will see errors related to class not found if the ContextClassLoader is AppClassLoader.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing unit tests and manual tests

Closes #27843 from shanyu/shanyu-31029.

Authored-by: Shanyu Zhao <shzhao@microsoft.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-06-19 09:59:14 -05:00
..
kubernetes [SPARK-31994][K8S] Docker image should use https urls for only deb.debian.org mirrors 2020-06-15 11:26:03 -07:00
mesos [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling 2020-04-09 11:00:29 +00:00
yarn [SPARK-31029] Avoid using global execution context in driver main thread for YarnSchedulerBackend 2020-06-19 09:59:14 -05:00