2012-09-13 12:47:54 -04:00
---
layout: global
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
title: Running Spark on YARN
2012-09-13 12:47:54 -04:00
---
2018-03-26 15:45:45 -04:00
* This will become a table of contents (this text will be scraped).
{:toc}
2012-09-13 12:47:54 -04:00
2013-08-30 15:38:23 -04:00
Support for running on [YARN (Hadoop
2014-11-11 01:18:00 -05:00
NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html)
2014-05-03 00:42:31 -04:00
was added to Spark in version 0.6.0, and improved in subsequent releases.
2012-09-13 12:47:54 -04:00
2015-06-27 02:07:10 -04:00
# Launching Spark on YARN
Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster.
These configs are used to write to HDFS and connect to the YARN ResourceManager. The
configuration contained in this directory will be distributed to the YARN cluster so that all
containers used by the application use the same configuration. If the configuration references
Java system properties or environment variables not managed by YARN, they should also be set in the
Spark application's configuration (driver, executors, and the AM when running in client mode).
2015-10-04 04:31:52 -04:00
There are two deploy modes that can be used to launch Spark applications on YARN. In `cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
2015-06-27 02:07:10 -04:00
2017-12-21 20:21:11 -05:00
Unlike other cluster managers supported by Spark in which the master's address is specified in the `--master`
parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration.
Thus, the `--master` parameter is `yarn` .
2015-09-15 15:42:33 -04:00
2015-10-04 04:31:52 -04:00
To launch a Spark application in `cluster` mode:
2015-06-27 02:07:10 -04:00
2015-10-04 04:31:52 -04:00
$ ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] < app jar > [app options]
2015-09-21 14:46:39 -04:00
2015-06-27 02:07:10 -04:00
For example:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
2015-10-04 04:31:52 -04:00
--master yarn \
--deploy-mode cluster \
2015-06-27 02:07:10 -04:00
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
2018-01-22 07:17:05 -05:00
examples/jars/spark-examples*.jar \
2015-06-27 02:07:10 -04:00
10
The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the "Debugging your Application" section below for how to see driver and executor logs.
2015-10-04 04:31:52 -04:00
To launch a Spark application in `client` mode, do the same, but replace `cluster` with `client` . The following shows how you can run `spark-shell` in `client` mode:
2015-06-27 02:07:10 -04:00
2015-10-04 04:31:52 -04:00
$ ./bin/spark-shell --master yarn --deploy-mode client
2015-06-27 02:07:10 -04:00
## Adding Other JARs
2015-10-04 04:31:52 -04:00
In `cluster` mode, the driver runs on a different machine than the client, so `SparkContext.addJar` won't work out of the box with files that are local to the client. To make files on the client available to `SparkContext.addJar` , include them with the `--jars` option in the launch command.
2015-06-27 02:07:10 -04:00
$ ./bin/spark-submit --class my.main.Class \
2015-10-04 04:31:52 -04:00
--master yarn \
--deploy-mode cluster \
2016-03-19 00:32:48 -04:00
--jars my-other-jar.jar,my-other-other-jar.jar \
my-main-jar.jar \
2015-06-27 02:07:10 -04:00
app_arg1 app_arg2
2012-09-26 02:26:56 -04:00
# Preparations
2012-09-13 12:47:54 -04:00
2015-09-15 15:42:33 -04:00
Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
2018-08-21 13:02:17 -04:00
Binary distributions can be downloaded from the [downloads page ](https://spark.apache.org/downloads.html ) of the project website.
2014-09-16 12:18:03 -04:00
To build Spark yourself, refer to [Building Spark ](building-spark.html ).
2012-09-13 12:47:54 -04:00
2016-05-27 14:31:25 -04:00
To make Spark runtime jars accessible from YARN side, you can specify `spark.yarn.archive` or `spark.yarn.jars` . For details please refer to [Spark Properties ](running-on-yarn.html#spark-properties ). If neither `spark.yarn.archive` nor `spark.yarn.jars` is specified, Spark will create a zip file with all jars under `$SPARK_HOME/jars` and upload it to the distributed cache.
2013-08-26 17:44:27 -04:00
# Configuration
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
Most of the configs are the same for Spark on YARN as for other deployment modes. See the [configuration page ](configuration.html ) for more information on those. These are configs that are specific to Spark on YARN.
2013-08-26 17:44:27 -04:00
2015-06-27 02:07:10 -04:00
# Debugging your Application
2015-09-21 14:46:39 -04:00
In YARN terminology, executors and application masters run inside "containers". YARN has two modes for handling container logs after an application has completed. If log aggregation is turned on (with the `yarn.log-aggregation-enable` config), container logs are copied to HDFS and deleted on the local machine. These logs can be viewed from anywhere on the cluster with the `yarn logs` command.
2015-06-27 02:07:10 -04:00
yarn logs -applicationId < app ID >
2015-09-21 14:46:39 -04:00
2015-07-27 09:02:40 -04:00
will print out the contents of all log files from all containers from the given application. You can also view the container log files directly in HDFS using the HDFS shell or API. The directory where they are located can be found by looking at your YARN configs (`yarn.nodemanager.remote-app-log-dir` and `yarn.nodemanager.remote-app-log-dir-suffix` ). The logs are also available on the Spark Web UI under the Executors Tab. You need to have both the Spark history server and the MapReduce history server running and configure `yarn.log.server.url` in `yarn-site.xml` properly. The log URL on the Spark history server UI will redirect you to the MapReduce history server to show the aggregated logs.
2015-06-27 02:07:10 -04:00
2015-07-27 09:02:40 -04:00
When log aggregation isn't turned on, logs are retained locally on each machine under `YARN_APP_LOGS_DIR` , which is usually configured to `/tmp/logs` or `$HADOOP_HOME/logs/userlogs` depending on the Hadoop version and installation. Viewing logs for a container requires going to the host that contains them and looking in this directory. Subdirectories organize log files by application ID and container ID. The logs are also available on the Spark Web UI under the Executors Tab and doesn't require running the MapReduce history server.
2015-06-27 02:07:10 -04:00
To review per-container launch environment, increase `yarn.nodemanager.delete.debug-delay-sec` to a
2015-09-21 14:46:39 -04:00
large value (e.g. `36000` ), and then access the application cache through `yarn.nodemanager.local-dirs`
2015-06-27 02:07:10 -04:00
on the nodes on which containers are launched. This directory contains the launch script, JARs, and
all environment variables used for launching each container. This process is useful for debugging
classpath problems in particular. (Note that enabling this requires admin privileges on cluster
settings and a restart of all node managers. Thus, this is not applicable to hosted clusters).
2015-10-20 14:12:28 -04:00
To use a custom log4j configuration for the application master or executors, here are the options:
2015-06-27 02:07:10 -04:00
- upload a custom `log4j.properties` using `spark-submit` , by adding it to the `--files` list of files
to be uploaded with the application.
- add `-Dlog4j.configuration=<location of configuration file>` to `spark.driver.extraJavaOptions`
(for the driver) or `spark.executor.extraJavaOptions` (for executors). Note that if using a file,
the `file:` protocol should be explicitly provided, and the file needs to exist locally on all
the nodes.
2015-10-20 14:12:28 -04:00
- update the `$SPARK_CONF_DIR/log4j.properties` file and it will be automatically uploaded along
with the other configurations. Note that other 2 options has higher priority than this option if
multiple options are specified.
2015-06-27 02:07:10 -04:00
Note that for the first option, both executors and the application master will share the same
log4j configuration, which may cause issues when they run on the same node (e.g. trying to write
to the same log file).
2015-09-21 14:46:39 -04:00
If you need a reference to the proper location to put log files in the YARN so that YARN can properly display and aggregate them, use `spark.yarn.app.container.log.dir` in your `log4j.properties` . For example, `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log` . For streaming applications, configuring `RollingFileAppender` and setting file location to YARN's log directory will avoid disk overflow caused by large log files, and logs can be accessed using YARN's log utility.
2015-06-27 02:07:10 -04:00
2016-05-27 14:31:25 -04:00
To use a custom metrics.properties for the application master and executors, update the `$SPARK_CONF_DIR/metrics.properties` file. It will automatically be uploaded with other configurations, so you don't need to specify it manually with `--files` .
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
#### Spark Properties
< table class = "table" >
< tr > < th > Property Name< / th > < th > Default< / th > < th > Meaning< / th > < / tr >
2015-01-09 16:20:32 -05:00
< tr >
< td > < code > spark.yarn.am.memory< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > 512m< / code > < / td >
2015-01-09 16:20:32 -05:00
< td >
Amount of memory to use for the YARN Application Master in client mode, in the same format as JVM memory strings (e.g. < code > 512m< / code > , < code > 2g< / code > ).
In cluster mode, use < code > spark.driver.memory< / code > instead.
2015-09-21 14:46:39 -04:00
< p / >
Use lower-case suffixes, e.g. < code > k< / code > , < code > m< / code > , < code > g< / code > , < code > t< / code > , and < code > p< / code > , for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively.
2015-01-16 12:16:56 -05:00
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.am.cores< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > 1< / code > < / td >
2015-01-16 12:16:56 -05:00
< td >
Number of cores to use for the YARN Application Master in client mode.
In cluster mode, use < code > spark.driver.cores< / code > instead.
2015-01-09 16:20:32 -05:00
< / td >
< / tr >
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< tr >
2014-12-18 13:19:07 -05:00
< td > < code > spark.yarn.am.waitTime< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > 100s< / code > < / td >
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< td >
[SPARK-24182][YARN] Improve error message when client AM fails.
Instead of always throwing a generic exception when the AM fails,
print a generic error and throw the exception with the YARN
diagnostics containing the reason for the failure.
There was an issue with YARN sometimes providing a generic diagnostic
message, even though the AM provides a failure reason when
unregistering. That was happening because the AM was registering
too late, and if errors happened before the registration, YARN would
just create a generic "ExitCodeException" which wasn't very helpful.
Since most errors in this path are a result of not being able to
connect to the driver, this change modifies the AM registration
a bit so that the AM is registered before the connection to the
driver is established. That way, errors are properly propagated
through YARN back to the driver.
As part of that, I also removed the code that retried connections
to the driver from the client AM. At that point, the driver should
already be up and waiting for connections, so it's unlikely that
retrying would help - and in case it does, that means a flaky
network, which would mean problems would probably show up again.
The effect of that is that connection-related errors are reported
back to the driver much faster now (through the YARN report).
One thing to note is that there seems to be a race on the YARN
side that causes a report to be sent to the client without the
corresponding diagnostics string from the AM; the diagnostics are
available later from the RM web page. For that reason, the generic
error messages are kept in the Spark scheduler code, to help
guide users to a way of debugging their failure.
Also of note is that if YARN's max attempts configuration is lower
than Spark's, Spark will not unregister the AM with a proper
diagnostics message. Unfortunately there seems to be no way to
unregister the AM and still allow further re-attempts to happen.
Testing:
- existing unit tests
- some of our integration tests
- hardcoded an invalid driver address in the code and verified
the error in the shell. e.g.
```
scala> 18/05/04 15:09:34 ERROR cluster.YarnClientSchedulerBackend: YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
18/05/04 15:09:34 ERROR cluster.YarnClientSchedulerBackend: Diagnostics message: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult:
<AM stack trace>
Caused by: java.io.IOException: Failed to connect to localhost/127.0.0.1:1234
<More stack trace>
```
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #21243 from vanzin/SPARK-24182.
2018-05-11 05:40:35 -04:00
Only used in < code > cluster< / code > mode. Time for the YARN Application Master to wait for the
SparkContext to be initialized.
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.submit.file.replication< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > The default HDFS replication (usually < code > 3< / code > )< / td >
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< td >
HDFS replication level for the files uploaded into HDFS for the application. These include things like the Spark jar, the app jar, and any distributed cache files/archives.
< / td >
< / tr >
2016-04-05 15:12:00 -04:00
< tr >
< td > < code > spark.yarn.stagingDir< / code > < / td >
< td > Current user's home directory in the filesystem< / td >
< td >
Staging directory used while submitting applications.
< / td >
< / tr >
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< tr >
< td > < code > spark.yarn.preserve.staging.files< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > false< / code > < / td >
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< td >
2015-09-21 14:46:39 -04:00
Set to < code > true< / code > to preserve the staged files (Spark jar, app jar, distributed cache files) at the end of the job rather than delete them.
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.scheduler.heartbeat.interval-ms< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > 3000< / code > < / td >
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< td >
The interval in ms in which the Spark application master heartbeats into the YARN ResourceManager.
2015-09-21 14:46:39 -04:00
The value is capped at half the value of YARN's configuration for the expiry interval, i.e.
< code > yarn.am.liveness-monitor.expiry-interval-ms< / code > .
2015-05-20 09:27:39 -04:00
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.scheduler.initial-allocation.interval< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > 200ms< / code > < / td >
2015-05-20 09:27:39 -04:00
< td >
The initial interval in which the Spark application master eagerly heartbeats to the YARN ResourceManager
when there are pending container allocation requests. It should be no larger than
< code > spark.yarn.scheduler.heartbeat.interval-ms< / code > . The allocation interval will doubled on
successive eager heartbeats if pending containers still exist, until
< code > spark.yarn.scheduler.heartbeat.interval-ms< / code > is reached.
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.max.executor.failures< / code > < / td >
2014-07-08 14:55:42 -04:00
< td > numExecutors * 2, with minimum of 3< / td >
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< td >
The maximum number of executor failures before failing the application.
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.historyServer.address< / code > < / td >
< td > (none)< / td >
< td >
2015-09-21 14:46:39 -04:00
The address of the Spark history server, e.g. < code > host.com:18080< / code > . The address should not contain a scheme (< code > http://< / code > ). Defaults to not being set since the history server is an optional service. This address is given to the YARN ResourceManager when the Spark application finishes to link the application from the ResourceManager UI to the Spark history server UI.
For this property, YARN properties can be used as variables, and these are substituted by Spark at runtime. For example, if the Spark history server runs on the same node as the YARN ResourceManager, it can be set to < code > ${hadoopconf-yarn.resourcemanager.hostname}:18080< / code > .
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< / td >
< / tr >
2014-06-16 15:27:31 -04:00
< tr >
2014-06-19 13:11:26 -04:00
< td > < code > spark.yarn.dist.archives< / code > < / td >
< td > (none)< / td >
< td >
Comma separated list of archives to be extracted into the working directory of each executor.
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.dist.files< / code > < / td >
< td > (none)< / td >
< td >
Comma-separated list of files to be placed in the working directory of each executor.
2014-08-26 21:55:00 -04:00
< / td >
2014-06-19 13:11:26 -04:00
< / tr >
2016-04-01 13:52:13 -04:00
< tr >
< td > < code > spark.yarn.dist.jars< / code > < / td >
< td > (none)< / td >
< td >
Comma-separated list of jars to be placed in the working directory of each executor.
< / td >
< / tr >
2017-09-19 10:20:05 -04:00
< tr >
< td > < code > spark.yarn.dist.forceDownloadSchemes< / code > < / td >
< td > < code > (none)< / code > < / td >
< td >
2018-07-08 22:21:40 -04:00
Comma-separated list of schemes for which resources will be downloaded to the local disk prior to
2018-03-26 15:45:45 -04:00
being added to YARN's distributed cache. For use in cases where the YARN service does not
2018-07-08 22:21:40 -04:00
support schemes that are supported by Spark, like http, https and ftp, or jars required to be in the
local YARN client's classpath. Wildcard '*' is denoted to download resources for all the schemes.
2017-09-19 10:20:05 -04:00
< / td >
< / tr >
2015-02-06 14:57:02 -05:00
< tr >
< td > < code > spark.executor.instances< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > 2< / code > < / td >
2015-02-06 14:57:02 -05:00
< td >
2016-06-23 15:03:46 -04:00
The number of executors for static allocation. With < code > spark.dynamicAllocation.enabled< / code > , the initial set of executors will be at least this large.
2015-02-06 14:57:02 -05:00
< / td >
< / tr >
2015-01-09 16:20:32 -05:00
< tr >
< td > < code > spark.yarn.am.memoryOverhead< / code > < / td >
2015-09-17 12:52:40 -04:00
< td > AM memory * 0.10, with minimum of 384 < / td >
2015-01-09 16:20:32 -05:00
< td >
2017-12-11 18:14:59 -05:00
Same as < code > spark.driver.memoryOverhead< / code > , but for the YARN Application Master in client mode.
2014-06-16 15:27:31 -04:00
< / td >
< / tr >
2014-12-03 16:16:24 -05:00
< tr >
< td > < code > spark.yarn.queue< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > default< / code > < / td >
2014-12-03 16:16:24 -05:00
< td >
The name of the YARN queue to which the application is submitted.
< / td >
< / tr >
2014-06-23 09:51:11 -04:00
< tr >
2016-03-11 08:54:57 -05:00
< td > < code > spark.yarn.jars< / code > < / td >
2014-06-23 09:51:11 -04:00
< td > (none)< / td >
< td >
2016-03-11 08:54:57 -05:00
List of libraries containing Spark code to distribute to YARN containers.
By default, Spark on YARN will use Spark jars installed locally, but the Spark jars can also be
2014-06-23 09:51:11 -04:00
in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't
2016-03-11 08:54:57 -05:00
need to be distributed each time an application runs. To point to jars on HDFS, for example,
set this configuration to < code > hdfs:///some/path< / code > . Globs are allowed.
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.archive< / code > < / td >
< td > (none)< / td >
< td >
An archive containing needed Spark jars for distribution to the YARN cache. If set, this
configuration replaces < code > spark.yarn.jars< / code > and the archive is used in all the
application's containers. The archive should contain jar files in its root directory.
Like with the previous option, the archive can also be hosted on HDFS to speed up file
distribution.
2014-06-23 09:51:11 -04:00
< / td >
< / tr >
2014-08-05 16:57:32 -04:00
< tr >
< td > < code > spark.yarn.appMasterEnv.[EnvironmentVariableName]< / code > < / td >
< td > (none)< / td >
< td >
2015-09-21 14:46:39 -04:00
Add the environment variable specified by < code > EnvironmentVariableName< / code > to the
Application Master process launched on YARN. The user can specify multiple of
2015-10-04 04:31:52 -04:00
these and to set multiple environment variables. In < code > cluster< / code > mode this controls
the environment of the Spark driver and in < code > client< / code > mode it only controls
2015-09-21 14:46:39 -04:00
the environment of the executor launcher.
2014-08-05 13:48:26 -04:00
< / td >
< / tr >
2014-09-10 15:34:24 -04:00
< tr >
< td > < code > spark.yarn.containerLauncherMaxThreads< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > 25< / code > < / td >
2014-09-10 15:34:24 -04:00
< td >
2015-09-21 14:46:39 -04:00
The maximum number of threads to use in the YARN Application Master for launching executor containers.
2014-09-10 15:34:24 -04:00
< / td >
< / tr >
2014-12-18 11:01:46 -05:00
< tr >
< td > < code > spark.yarn.am.extraJavaOptions< / code > < / td >
< td > (none)< / td >
< td >
2015-01-09 16:20:32 -05:00
A string of extra JVM options to pass to the YARN Application Master in client mode.
2016-04-14 11:29:14 -04:00
In cluster mode, use < code > spark.driver.extraJavaOptions< / code > instead. Note that it is illegal
to set maximum heap size (-Xmx) settings with this option. Maximum heap size settings can be set
with < code > spark.yarn.am.memory< / code >
2014-12-18 11:01:46 -05:00
< / td >
< / tr >
2015-05-01 16:20:46 -04:00
< tr >
< td > < code > spark.yarn.am.extraLibraryPath< / code > < / td >
< td > (none)< / td >
< td >
2015-09-21 14:46:39 -04:00
Set a special library path to use when launching the YARN Application Master in client mode.
2015-05-01 16:20:46 -04:00
< / td >
< / tr >
2015-01-07 09:14:39 -05:00
< tr >
< td > < code > spark.yarn.maxAppAttempts< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > yarn.resourcemanager.am.max-attempts< / code > in YARN< / td >
2015-01-07 09:14:39 -05:00
< td >
The maximum number of attempts that will be made to submit the application.
It should be no larger than the global number of max attempts in the YARN configuration.
< / td >
< / tr >
2015-10-12 21:17:28 -04:00
< tr >
< td > < code > spark.yarn.am.attemptFailuresValidityInterval< / code > < / td >
< td > (none)< / td >
< td >
Defines the validity interval for AM failure tracking.
If the AM has been running for at least the defined interval, the AM failure count will be reset.
2017-02-08 07:20:07 -05:00
This feature is not enabled if not configured.
2015-10-12 21:17:28 -04:00
< / td >
< / tr >
2016-04-28 13:38:19 -04:00
< tr >
< td > < code > spark.yarn.executor.failuresValidityInterval< / code > < / td >
< td > (none)< / td >
< td >
Defines the validity interval for executor failure tracking.
Executor failures which are older than the validity interval will be ignored.
< / td >
< / tr >
2015-04-07 09:36:25 -04:00
< tr >
< td > < code > spark.yarn.submit.waitAppCompletion< / code > < / td >
2015-09-21 14:46:39 -04:00
< td > < code > true< / code > < / td >
2015-04-07 09:36:25 -04:00
< td >
In YARN cluster mode, controls whether the client waits to exit until the application completes.
2015-09-21 14:46:39 -04:00
If set to < code > true< / code > , the client process will stay alive reporting the application's status.
2015-04-07 09:36:25 -04:00
Otherwise, the client process will exit after submission.
< / td >
< / tr >
2015-11-23 13:41:17 -05:00
< tr >
< td > < code > spark.yarn.am.nodeLabelExpression< / code > < / td >
< td > (none)< / td >
< td >
A YARN node label expression that restricts the set of nodes AM will be scheduled on.
Only versions of YARN greater than or equal to 2.6 support node label expressions, so when
running against earlier versions, this property will be ignored.
< / td >
< / tr >
2015-05-11 15:09:39 -04:00
< tr >
< td > < code > spark.yarn.executor.nodeLabelExpression< / code > < / td >
< td > (none)< / td >
< td >
A YARN node label expression that restricts the set of nodes executors will be scheduled on.
Only versions of YARN greater than or equal to 2.6 support node label expressions, so when
running against earlier versions, this property will be ignored.
< / td >
< / tr >
2015-08-18 17:34:20 -04:00
< tr >
< td > < code > spark.yarn.tags< / code > < / td >
< td > (none)< / td >
< td >
Comma-separated list of strings to pass through as YARN application tags appearing
in YARN ApplicationReports, which can be used for filtering when querying YARN apps.
< / td >
< / tr >
2015-06-26 09:45:22 -04:00
< tr >
< td > < code > spark.yarn.config.gatewayPath< / code > < / td >
< td > (none)< / td >
< td >
A path that is valid on the gateway host (the host where a Spark application is started) but may
differ for paths for the same resource in other nodes in the cluster. Coupled with
< code > spark.yarn.config.replacementPath< / code > , this is used to support clusters with
heterogeneous configurations, so that Spark can correctly launch remote processes.
< p / >
The replacement path normally will contain a reference to some environment variable exported by
YARN (and, thus, visible to Spark containers).
< p / >
For example, if the gateway node has Hadoop libraries installed on < code > /disk1/hadoop< / code > , and
the location of the Hadoop install is exported by YARN as the < code > HADOOP_HOME< / code >
environment variable, setting this value to < code > /disk1/hadoop< / code > and the replacement path to
< code > $HADOOP_HOME< / code > will make sure that paths used to launch remote processes properly
reference the local YARN configuration.
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.config.replacementPath< / code > < / td >
< td > (none)< / td >
< td >
See < code > spark.yarn.config.gatewayPath< / code > .
< / td >
< / tr >
2016-06-29 09:17:27 -04:00
< tr >
< td > < code > spark.yarn.rolledLog.includePattern< / code > < / td >
< td > (none)< / td >
< td >
Java Regex to filter the log files which match the defined include pattern
and those log files will be aggregated in a rolling fashion.
This will be used with YARN's rolling log aggregation, to enable this feature in YARN side
< code > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds< / code > should be
configured in yarn-site.xml.
2017-02-10 08:44:26 -05:00
This feature can only be used with Hadoop 2.6.4+. The Spark log4j appender needs be changed to use
2018-01-20 17:34:37 -05:00
FileAppender or another appender that can handle the files being removed while it is running. Based
2016-06-29 09:17:27 -04:00
on the file name configured in the log4j configuration (like spark.log), the user should set the
regex (spark*) to include all the log files that need to be aggregated.
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.rolledLog.excludePattern< / code > < / td >
< td > (none)< / td >
< td >
Java Regex to filter the log files which match the defined exclude pattern
and those log files will not be aggregated in a rolling fashion. If the log file
name matches both the include and the exclude pattern, this file will be excluded eventually.
< / td >
< / tr >
2018-06-21 10:17:18 -04:00
< tr >
< td > < code > spark.yarn.blacklist.executor.launch.blacklisting.enabled< / code > < / td >
< td > false< / td >
< td >
Flag to enable blacklisting of nodes having YARN resource allocation problems.
The error limit for blacklisting can be configured by
< code > spark.blacklist.application.maxFailedExecutorsPerNode< / code > .
< / td >
< / tr >
2018-07-23 21:33:10 -04:00
< tr >
< td > < code > spark.yarn.metrics.namespace< / code > < / td >
< td > (none)< / td >
< td >
The root namespace for AM metrics reporting.
If it is not set then the YARN application ID is used.
< / td >
< / tr >
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes #896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
< / table >
2014-06-23 09:51:11 -04:00
# Important notes
2012-09-13 12:47:54 -04:00
2014-12-09 14:02:43 -05:00
- Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured.
2015-10-04 04:31:52 -04:00
- In `cluster` mode, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs` ). If the user specifies `spark.local.dir` , it will be ignored. In `client` mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in `spark.local.dir` . This is because the Spark driver does not run on the YARN cluster in `client` mode, only the Spark executors do.
2018-04-06 01:37:08 -04:00
- The `--files` and `--archives` options support specifying file names with the # similar to Hadoop. For example, you can specify: `--files localtest.txt#appSees.txt` and this will upload the file you have locally named `localtest.txt` into HDFS but this will be linked to by the name `appSees.txt` , and your application should use the name as `appSees.txt` to reference it when running on YARN.
2015-10-04 04:31:52 -04:00
- The `--jars` option allows the `SparkContext.addJar` function to work if you are using it with local files and running in `cluster` mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files.
2016-05-26 14:55:22 -04:00
2018-03-26 15:45:45 -04:00
# Kerberos
Standard Kerberos support in Spark is covered in the [Security ](security.html#kerberos ) page.
2018-05-18 16:04:00 -04:00
In YARN mode, when accessing Hadoop filesystems, Spark will automatically obtain delegation tokens
for:
- the filesystem hosting the staging directory of the Spark application (which is the default
filesystem if `spark.yarn.stagingDir` is not set);
- if Hadoop federation is enabled, all the federated filesystems in the configuration.
2018-03-26 15:45:45 -04:00
If an application needs to interact with other secure Hadoop filesystems, their URIs need to be
explicitly provided to Spark at launch time. This is done by listing them in the
`spark.yarn.access.hadoopFileSystems` property, described in the configuration section below.
2016-05-26 14:55:22 -04:00
2018-03-26 15:45:45 -04:00
The YARN integration also supports custom delegation token providers using the Java Services
mechanism (see `java.util.ServiceLoader` ). Implementations of
`org.apache.spark.deploy.yarn.security.ServiceCredentialProvider` can be made available to Spark
by listing their names in the corresponding file in the jar's `META-INF/services` directory. These
providers can be disabled individually by setting `spark.security.credentials.{service}.enabled` to
`false` , where `{service}` is the name of the credential provider.
## YARN-specific Kerberos Configuration
< table class = "table" >
< tr > < th > Property Name< / th > < th > Default< / th > < th > Meaning< / th > < / tr >
< tr >
< td > < code > spark.yarn.keytab< / code > < / td >
< td > (none)< / td >
< td >
The full path to the file that contains the keytab for the principal specified above. This keytab
will be copied to the node running the YARN Application Master via the YARN Distributed Cache, and
will be used for renewing the login tickets and the delegation tokens periodically. Equivalent to
the < code > --keytab< / code > command line argument.
< br / > (Works also with the "local" master.)
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.principal< / code > < / td >
< td > (none)< / td >
< td >
Principal to be used to login to KDC, while running on secure clusters. Equivalent to the
< code > --principal< / code > command line argument.
< br / > (Works also with the "local" master.)
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.access.hadoopFileSystems< / code > < / td >
< td > (none)< / td >
< td >
A comma-separated list of secure Hadoop filesystems your Spark application is going to access. For
example, < code > spark.yarn.access.hadoopFileSystems=hdfs://nn1.com:8032,hdfs://nn2.com:8032,
webhdfs://nn3.com:50070< / code > . The Spark application must have access to the filesystems listed
and Kerberos must be properly configured to be able to access them (either in the same realm
or in a trusted realm). Spark acquires security tokens for each of the filesystems so that
the Spark application can access those remote Hadoop filesystems.
< / td >
< / tr >
< tr >
< td > < code > spark.yarn.kerberos.relogin.period< / code > < / td >
< td > 1m< / td >
< td >
How often to check whether the kerberos TGT should be renewed. This should be set to a value
that is shorter than the TGT renewal period (or the TGT lifetime if TGT renewal is not enabled).
The default value should be enough for most deployments.
< / td >
< / tr >
< / table >
2016-05-26 14:55:22 -04:00
2018-03-26 15:45:45 -04:00
## Troubleshooting Kerberos
2016-05-26 14:55:22 -04:00
2018-03-26 15:45:45 -04:00
Debugging Hadoop/Kerberos problems can be "difficult". One useful technique is to
enable extra logging of Kerberos operations in Hadoop by setting the `HADOOP_JAAS_DEBUG`
environment variable.
2016-05-26 14:55:22 -04:00
2018-03-26 15:45:45 -04:00
```bash
export HADOOP_JAAS_DEBUG=true
```
2016-05-26 14:55:22 -04:00
2018-03-26 15:45:45 -04:00
The JDK classes can be configured to enable extra logging of their Kerberos and
SPNEGO/REST authentication via the system properties `sun.security.krb5.debug`
and `sun.security.spnego.debug=true`
2016-05-26 14:55:22 -04:00
2018-03-26 15:45:45 -04:00
```
-Dsun.security.krb5.debug=true -Dsun.security.spnego.debug=true
```
2016-05-26 14:55:22 -04:00
2018-03-26 15:45:45 -04:00
All these options can be enabled in the Application Master:
2016-05-26 14:55:22 -04:00
```
2018-03-26 15:45:45 -04:00
spark.yarn.appMasterEnv.HADOOP_JAAS_DEBUG true
spark.yarn.am.extraJavaOptions -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug=true
2016-05-26 14:55:22 -04:00
```
2018-03-26 15:45:45 -04:00
Finally, if the log level for `org.apache.spark.deploy.yarn.Client` is set to `DEBUG` , the log
will include a list of all tokens obtained, and their expiry details
[SPARK-14743][YARN] Add a configurable credential manager for Spark running on YARN
## What changes were proposed in this pull request?
Add a configurable token manager for Spark on running on yarn.
### Current Problems ###
1. Supported token provider is hard-coded, currently only hdfs, hbase and hive are supported and it is impossible for user to add new token provider without code changes.
2. Also this problem exits in timely token renewer and updater.
### Changes In This Proposal ###
In this proposal, to address the problems mentioned above and make the current code more cleaner and easier to understand, mainly has 3 changes:
1. Abstract a `ServiceTokenProvider` as well as `ServiceTokenRenewable` interface for token provider. Each service wants to communicate with Spark through token way needs to implement this interface.
2. Provide a `ConfigurableTokenManager` to manage all the register token providers, also token renewer and updater. Also this class offers the API for other modules to obtain tokens, get renewal interval and so on.
3. Implement 3 built-in token providers `HDFSTokenProvider`, `HiveTokenProvider` and `HBaseTokenProvider` to keep the same semantics as supported today. Whether to load in these built-in token providers is controlled by configuration "spark.yarn.security.tokens.${service}.enabled", by default for all the built-in token providers are loaded.
### Behavior Changes ###
For the end user there's no behavior change, we still use the same configuration `spark.yarn.security.tokens.${service}.enabled` to decide which token provider is enabled (hbase or hive).
For user implemented token provider (assume the name of token provider is "test") needs to add into this class should have two configurations:
1. `spark.yarn.security.tokens.test.enabled` to true
2. `spark.yarn.security.tokens.test.class` to the full qualified class name.
So we still keep the same semantics as current code while add one new configuration.
### Current Status ###
- [x] token provider interface and management framework.
- [x] implement built-in token providers (hdfs, hbase, hive).
- [x] Coverage of unit test.
- [x] Integrated test with security cluster.
## How was this patch tested?
Unit test and integrated test.
Please suggest and review, any comment is greatly appreciated.
Author: jerryshao <sshao@hortonworks.com>
Closes #14065 from jerryshao/SPARK-16342.
2016-08-10 18:39:30 -04:00
2018-03-26 15:45:45 -04:00
# Configuring the External Shuffle Service
2016-07-14 10:42:32 -04:00
To start the Spark Shuffle Service on each `NodeManager` in your YARN cluster, follow these
instructions:
1. Build Spark with the [YARN profile ](building-spark.html ). Skip this step if you are using a
pre-packaged distribution.
1. Locate the `spark-<version>-yarn-shuffle.jar` . This should be under
`$SPARK_HOME/common/network-yarn/target/scala-<version>` if you are building Spark yourself, and under
2016-12-05 15:57:41 -05:00
`yarn` if you are using a distribution.
2016-07-14 10:42:32 -04:00
1. Add this jar to the classpath of all `NodeManager` s in your cluster.
1. In the `yarn-site.xml` on each node, add `spark_shuffle` to `yarn.nodemanager.aux-services` ,
then set `yarn.nodemanager.aux-services.spark_shuffle.class` to
`org.apache.spark.network.yarn.YarnShuffleService` .
2017-02-10 08:44:26 -05:00
1. Increase `NodeManager's` heap size by setting `YARN_HEAPSIZE` (1000 by default) in `etc/hadoop/yarn-env.sh`
to avoid garbage collection issues during shuffle.
2016-07-14 10:42:32 -04:00
1. Restart all `NodeManager` s in your cluster.
The following extra configuration options are available when the shuffle service is running on YARN:
< table class = "table" >
< tr > < th > Property Name< / th > < th > Default< / th > < th > Meaning< / th > < / tr >
< tr >
< td > < code > spark.yarn.shuffle.stopOnFailure< / code > < / td >
< td > < code > false< / code > < / td >
< td >
Whether to stop the NodeManager when there's a failure in the Spark Shuffle Service's
initialization. This prevents application failures caused by running containers on
NodeManagers where the Spark Shuffle Service is not running.
< / td >
< / tr >
< / table >
2018-03-26 15:45:45 -04:00
# Launching your application with Apache Oozie
2016-05-26 14:55:22 -04:00
Apache Oozie can launch Spark applications as part of a workflow.
In a secure cluster, the launched application will need the relevant tokens to access the cluster's
services. If Spark is launched with a keytab, this is automatic.
However, if Spark is to be launched without a keytab, the responsibility for setting up security
must be handed over to Oozie.
The details of configuring Oozie for secure clusters and obtaining
credentials for a job can be found on the [Oozie web site ](http://oozie.apache.org/ )
in the "Authentication" section of the specific release's documentation.
For Spark applications, the Oozie workflow must be set up for Oozie to request all tokens which
the application needs, including:
- The YARN resource manager.
2017-01-11 10:24:02 -05:00
- The local Hadoop filesystem.
- Any remote Hadoop filesystems used as a source or destination of I/O.
2016-05-26 14:55:22 -04:00
- Hive —if used.
- HBase —if used.
- The YARN timeline server, if the application interacts with this.
To avoid Spark attempting —and then failing— to obtain Hive, HBase and remote HDFS tokens,
the Spark configuration must be set to disable token collection for the services.
The Spark configuration must include the lines:
```
[SPARK-20434][YARN][CORE] Move Hadoop delegation token code from yarn to core
## What changes were proposed in this pull request?
Move Hadoop delegation token code from `spark-yarn` to `spark-core`, so that other schedulers (such as Mesos), may use it. In order to avoid exposing Hadoop interfaces in spark-core, the new Hadoop delegation token classes are kept private. In order to provider backward compatiblity, and to allow YARN users to continue to load their own delegation token providers via Java service loading, the old YARN interfaces, as well as the client code that uses them, have been retained.
Summary:
- Move registered `yarn.security.ServiceCredentialProvider` classes from `spark-yarn` to `spark-core`. Moved them into a new, private hierarchy under `HadoopDelegationTokenProvider`. Client code in `HadoopDelegationTokenManager` now loads credentials from a whitelist of three providers (`HadoopFSDelegationTokenProvider`, `HiveDelegationTokenProvider`, `HBaseDelegationTokenProvider`), instead of service loading, which means that users are not able to implement their own delegation token providers, as they are in the `spark-yarn` module.
- The `yarn.security.ServiceCredentialProvider` interface has been kept for backwards compatibility, and to continue to allow YARN users to implement their own delegation token provider implementations. Client code in YARN now fetches tokens via the new `YARNHadoopDelegationTokenManager` class, which fetches tokens from the core providers through `HadoopDelegationTokenManager`, as well as service loads them from `yarn.security.ServiceCredentialProvider`.
Old Hierarchy:
```
yarn.security.ServiceCredentialProvider (service loaded)
HadoopFSCredentialProvider
HiveCredentialProvider
HBaseCredentialProvider
yarn.security.ConfigurableCredentialManager
```
New Hierarchy:
```
HadoopDelegationTokenManager
HadoopDelegationTokenProvider (not service loaded)
HadoopFSDelegationTokenProvider
HiveDelegationTokenProvider
HBaseDelegationTokenProvider
yarn.security.ServiceCredentialProvider (service loaded)
yarn.security.YARNHadoopDelegationTokenManager
```
## How was this patch tested?
unit tests
Author: Michael Gummelt <mgummelt@mesosphere.io>
Author: Dr. Stefan Schimanski <sttts@mesosphere.io>
Closes #17723 from mgummelt/SPARK-20434-refactor-kerberos.
2017-06-15 14:46:00 -04:00
spark.security.credentials.hive.enabled false
spark.security.credentials.hbase.enabled false
2016-05-26 14:55:22 -04:00
```
2017-01-17 10:30:56 -05:00
The configuration option `spark.yarn.access.hadoopFileSystems` must be unset.
2016-05-26 14:55:22 -04:00
2018-03-26 15:45:45 -04:00
# Using the Spark History Server to replace the Spark Web UI
[SPARK-19554][UI,YARN] Allow SHS URL to be used for tracking in YARN RM.
Allow an application to use the History Server URL as the tracking
URL in the YARN RM, so there's still a link to the web UI somewhere
in YARN even if the driver's UI is disabled. This is useful, for
example, if an admin wants to disable the driver UI by default for
applications, since it's harder to secure it (since it involves non
trivial ssl certificate and auth management that admins may not want
to expose to user apps).
This needs to be opt-in, because of the way the YARN proxy works, so
a new configuration was added to enable the option.
The YARN RM will proxy requests to live AMs instead of redirecting
the client, so pages in the SHS UI will not render correctly since
they'll reference invalid paths in the RM UI. The proxy base support
in the SHS cannot be used since that would prevent direct access to
the SHS.
So, to solve this problem, for the feature to work end-to-end, a new
YARN-specific filter was added that detects whether the requests come
from the proxy and redirects the client appropriatly. The SHS admin has
to add this filter manually if they want the feature to work.
Tested with new unit test, and by running with the documented configuration
set in a test cluster. Also verified the driver UI is used when it's
enabled.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #16946 from vanzin/SPARK-19554.
2017-02-22 17:37:53 -05:00
It is possible to use the Spark History Server application page as the tracking URL for running
applications when the application UI is disabled. This may be desirable on secure clusters, or to
reduce the memory usage of the Spark driver. To set up tracking through the Spark History Server,
do the following:
- On the application side, set < code > spark.yarn.historyServer.allowTracking=true</ code > in Spark's
configuration. This will tell Spark to use the history server's URL as the tracking URL if
the application's UI is disabled.
- On the Spark History Server, add < code > org.apache.spark.deploy.yarn.YarnProxyRedirectFilter</ code >
to the list of filters in the < code > spark.ui.filters< / code > configuration.
Be aware that the history server information may not be up-to-date with the application's state.