[SPARK-22151] PYTHONPATH not picked up from the spark.yarn.appMaste…

…rEnv properly

Running in yarn cluster mode and trying to set pythonpath via spark.yarn.appMasterEnv.PYTHONPATH doesn't work.

the yarn Client code looks at the env variables:
val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
But when you set spark.yarn.appMasterEnv it puts it into the local env.

So the python path set in spark.yarn.appMasterEnv isn't properly set.

You can work around if you are running in cluster mode by setting it on the client like:

PYTHONPATH=./addon/python/ spark-submit

## What changes were proposed in this pull request?
In Client.scala, PYTHONPATH was being overridden, so changed code to append values to PYTHONPATH instead of overriding them.

## How was this patch tested?
Added log statements to ApplicationMaster.scala to check for environment variable PYTHONPATH, ran a spark job in cluster mode before the change and verified the issue. Performed the same test after the change and verified the fix.

Author: pgandhi <pgandhi@oath.com>

Closes #21468 from pgandhi999/SPARK-22151.
This commit is contained in:
pgandhi 2018-07-18 14:07:03 -05:00 committed by Thomas Graves
parent c8bee932cb
commit 1272b2034d

View file

@ -811,10 +811,12 @@ private[spark] class Client(
// Finally, update the Spark config to propagate PYTHONPATH to the AM and executors.
if (pythonPath.nonEmpty) {
val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
val pythonPathList = (sys.env.get("PYTHONPATH") ++ pythonPath)
env("PYTHONPATH") = (env.get("PYTHONPATH") ++ pythonPathList)
.mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
env("PYTHONPATH") = pythonPathStr
sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
val pythonPathExecutorEnv = (sparkConf.getExecutorEnv.toMap.get("PYTHONPATH") ++
pythonPathList).mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
sparkConf.setExecutorEnv("PYTHONPATH", pythonPathExecutorEnv)
}
if (isClusterMode) {