SPARK-1680: use configs for specifying environment variables on YARN
Note that this also documents spark.executorEnv.* which to me means its public. If we don't want that please speak up. Author: Thomas Graves <tgraves@apache.org> Closes #1512 from tgravescs/SPARK-1680 and squashes the following commits: 11525df [Thomas Graves] more doc changes 553bad0 [Thomas Graves] fix documentation 152bf7c [Thomas Graves] fix docs 5382326 [Thomas Graves] try fix docs 32f86a4 [Thomas Graves] use configs for specifying environment variables on YARN
This commit is contained in:
parent
74f82c71b0
commit
41e0a21b22
|
@ -206,6 +206,14 @@ Apart from these, the following properties are also available, and may be useful
|
|||
used during aggregation goes above this amount, it will spill the data into disks.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.executorEnv.[EnvironmentVariableName]</code></td>
|
||||
<td>(none)</td>
|
||||
<td>
|
||||
Add the environment variable specified by <code>EnvironmentVariableName</code> to the Executor
|
||||
process. The user can specify multiple of these and to set multiple environment variables.
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
#### Shuffle Behavior
|
||||
|
|
|
@ -17,10 +17,6 @@ To build Spark yourself, refer to the [building with Maven guide](building-with-
|
|||
|
||||
Most of the configs are the same for Spark on YARN as for other deployment modes. See the [configuration page](configuration.html) for more information on those. These are configs that are specific to Spark on YARN.
|
||||
|
||||
#### Environment Variables
|
||||
|
||||
* `SPARK_YARN_USER_ENV`, to add environment variables to the Spark processes launched on YARN. This can be a comma separated list of environment variables, e.g. `SPARK_YARN_USER_ENV="JAVA_HOME=/jdk64,FOO=bar"`.
|
||||
|
||||
#### Spark Properties
|
||||
|
||||
<table class="table">
|
||||
|
@ -110,7 +106,23 @@ Most of the configs are the same for Spark on YARN as for other deployment modes
|
|||
<td><code>spark.yarn.access.namenodes</code></td>
|
||||
<td>(none)</td>
|
||||
<td>
|
||||
A list of secure HDFS namenodes your Spark application is going to access. For example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`. The Spark application must have acess to the namenodes listed and Kerberos must be properly configured to be able to access them (either in the same realm or in a trusted realm). Spark acquires security tokens for each of the namenodes so that the Spark application can access those remote HDFS clusters.
|
||||
A list of secure HDFS namenodes your Spark application is going to access. For
|
||||
example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`.
|
||||
The Spark application must have acess to the namenodes listed and Kerberos must
|
||||
be properly configured to be able to access them (either in the same realm or in
|
||||
a trusted realm). Spark acquires security tokens for each of the namenodes so that
|
||||
the Spark application can access those remote HDFS clusters.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code></td>
|
||||
<td>(none)</td>
|
||||
<td>
|
||||
Add the environment variable specified by <code>EnvironmentVariableName</code> to the
|
||||
Application Master process launched on YARN. The user can specify multiple of
|
||||
these and to set multiple environment variables. In yarn-cluster mode this controls
|
||||
the environment of the SPARK driver and in yarn-client mode it only controls
|
||||
the environment of the executor launcher.
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
|
|
@ -259,6 +259,14 @@ trait ClientBase extends Logging {
|
|||
localResources
|
||||
}
|
||||
|
||||
/** Get all application master environment variables set on this SparkConf */
|
||||
def getAppMasterEnv: Seq[(String, String)] = {
|
||||
val prefix = "spark.yarn.appMasterEnv."
|
||||
sparkConf.getAll.filter{case (k, v) => k.startsWith(prefix)}
|
||||
.map{case (k, v) => (k.substring(prefix.length), v)}
|
||||
}
|
||||
|
||||
|
||||
def setupLaunchEnv(
|
||||
localResources: HashMap[String, LocalResource],
|
||||
stagingDir: String): HashMap[String, String] = {
|
||||
|
@ -276,6 +284,11 @@ trait ClientBase extends Logging {
|
|||
distCacheMgr.setDistFilesEnv(env)
|
||||
distCacheMgr.setDistArchivesEnv(env)
|
||||
|
||||
getAppMasterEnv.foreach { case (key, value) =>
|
||||
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
|
||||
}
|
||||
|
||||
// Keep this for backwards compatibility but users should move to the config
|
||||
sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
|
||||
// Allow users to specify some environment variables.
|
||||
YarnSparkHadoopUtil.setEnvFromInputString(env, userEnvs, File.pathSeparator)
|
||||
|
|
|
@ -171,7 +171,11 @@ trait ExecutorRunnableUtil extends Logging {
|
|||
val extraCp = sparkConf.getOption("spark.executor.extraClassPath")
|
||||
ClientBase.populateClasspath(null, yarnConf, sparkConf, env, extraCp)
|
||||
|
||||
// Allow users to specify some environment variables
|
||||
sparkConf.getExecutorEnv.foreach { case (key, value) =>
|
||||
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
|
||||
}
|
||||
|
||||
// Keep this for backwards compatibility but users should move to the config
|
||||
YarnSparkHadoopUtil.setEnvFromInputString(env, System.getenv("SPARK_YARN_USER_ENV"),
|
||||
File.pathSeparator)
|
||||
|
||||
|
|
Loading…
Reference in a new issue