spark-instrumented-optimizer/docs/running-on-yarn.md

372 lines
19 KiB
Markdown
Raw Normal View History

---
layout: global
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
title: Running Spark on YARN
---
Support for running on [YARN (Hadoop
NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html)
was added to Spark in version 0.6.0, and improved in subsequent releases.
# Launching Spark on YARN
Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster.
These configs are used to write to HDFS and connect to the YARN ResourceManager. The
configuration contained in this directory will be distributed to the YARN cluster so that all
containers used by the application use the same configuration. If the configuration references
Java system properties or environment variables not managed by YARN, they should also be set in the
Spark application's configuration (driver, executors, and the AM when running in client mode).
There are two deploy modes that can be used to launch Spark applications on YARN. In `yarn-cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `yarn-client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Unlike in Spark standalone and Mesos mode, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn-client` or `yarn-cluster`.
To launch a Spark application in `yarn-cluster` mode:
`$ ./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]`
For example:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the "Debugging your Application" section below for how to see driver and executor logs.
To launch a Spark application in `yarn-client` mode, do the same, but replace `yarn-cluster` with `yarn-client`. To run spark-shell:
$ ./bin/spark-shell --master yarn-client
## Adding Other JARs
In `yarn-cluster` mode, the driver runs on a different machine than the client, so `SparkContext.addJar` won't work out of the box with files that are local to the client. To make files on the client available to `SparkContext.addJar`, include them with the `--jars` option in the launch command.
$ ./bin/spark-submit --class my.main.Class \
--master yarn-cluster \
--jars my-other-jar.jar,my-other-other-jar.jar
my-main-jar.jar
app_arg1 app_arg2
# Preparations
Running Spark-on-YARN requires a binary distribution of Spark which is built with YARN support.
Binary distributions can be downloaded from the Spark project website.
SPARK-3069 [DOCS] Build instructions in README are outdated Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now. (Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.) Author: Sean Owen <sowen@cloudera.com> Closes #2014 from srowen/SPARK-3069 and squashes the following commits: 501507e [Sean Owen] Note that Zinc is for Maven builds too db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark 91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md 999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md 8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md
2014-09-16 12:18:03 -04:00
To build Spark yourself, refer to [Building Spark](building-spark.html).
# Configuration
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
Most of the configs are the same for Spark on YARN as for other deployment modes. See the [configuration page](configuration.html) for more information on those. These are configs that are specific to Spark on YARN.
# Debugging your Application
In YARN terminology, executors and application masters run inside "containers". YARN has two modes for handling container logs after an application has completed. If log aggregation is turned on (with the `yarn.log-aggregation-enable` config), container logs are copied to HDFS and deleted on the local machine. These logs can be viewed from anywhere on the cluster with the "yarn logs" command.
yarn logs -applicationId <app ID>
will print out the contents of all log files from all containers from the given application. You can also view the container log files directly in HDFS using the HDFS shell or API. The directory where they are located can be found by looking at your YARN configs (`yarn.nodemanager.remote-app-log-dir` and `yarn.nodemanager.remote-app-log-dir-suffix`). The logs are also available on the Spark Web UI under the Executors Tab. You need to have both the Spark history server and the MapReduce history server running and configure `yarn.log.server.url` in `yarn-site.xml` properly. The log URL on the Spark history server UI will redirect you to the MapReduce history server to show the aggregated logs.
When log aggregation isn't turned on, logs are retained locally on each machine under `YARN_APP_LOGS_DIR`, which is usually configured to `/tmp/logs` or `$HADOOP_HOME/logs/userlogs` depending on the Hadoop version and installation. Viewing logs for a container requires going to the host that contains them and looking in this directory. Subdirectories organize log files by application ID and container ID. The logs are also available on the Spark Web UI under the Executors Tab and doesn't require running the MapReduce history server.
To review per-container launch environment, increase `yarn.nodemanager.delete.debug-delay-sec` to a
large value (e.g. 36000), and then access the application cache through `yarn.nodemanager.local-dirs`
on the nodes on which containers are launched. This directory contains the launch script, JARs, and
all environment variables used for launching each container. This process is useful for debugging
classpath problems in particular. (Note that enabling this requires admin privileges on cluster
settings and a restart of all node managers. Thus, this is not applicable to hosted clusters).
To use a custom log4j configuration for the application master or executors, there are two options:
- upload a custom `log4j.properties` using `spark-submit`, by adding it to the `--files` list of files
to be uploaded with the application.
- add `-Dlog4j.configuration=<location of configuration file>` to `spark.driver.extraJavaOptions`
(for the driver) or `spark.executor.extraJavaOptions` (for executors). Note that if using a file,
the `file:` protocol should be explicitly provided, and the file needs to exist locally on all
the nodes.
Note that for the first option, both executors and the application master will share the same
log4j configuration, which may cause issues when they run on the same node (e.g. trying to write
to the same log file).
If you need a reference to the proper location to put log files in the YARN so that YARN can properly display and aggregate them, use `spark.yarn.app.container.log.dir` in your log4j.properties. For example, `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`. For streaming application, configuring `RollingFileAppender` and setting file location to YARN's log directory will avoid disk overflow caused by large log file, and logs can be accessed using YARN's log utility.
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
#### Spark Properties
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
[SPARK-1953][YARN]yarn client mode Application Master memory size is same as driver memory... ... size Ways to set Application Master's memory on yarn-client mode: 1. `spark.yarn.am.memory` in SparkConf or System Properties 2. default value 512m Note: this arguments is only available in yarn-client mode. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following commits: d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes 6c1b264 [WangTaoTheTonic] rebase b8410c0 [WangTaoTheTonic] minor optiminzation ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some improvements 3bf70cc [WangTaoTheTonic] rebase and give proper hint 987b99d [WangTaoTheTonic] disable --driver-memory in client mode 2b27928 [WangTaoTheTonic] inaccurate description b7acbb2 [WangTaoTheTonic] incorrect method invoked 2557c5e [WangTaoTheTonic] missing a single blank 42075b0 [WangTaoTheTonic] arrange the args and warn logging 69c7dba [WangTaoTheTonic] rebase 1960d16 [WangTaoTheTonic] fix wrong comment 7fa9e2e [WangTaoTheTonic] log a warning f6bee0e [WangTaoTheTonic] docs issue d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181 b09c309 [WangTaoTheTonic] use code format ab16bb5 [WangTaoTheTonic] fix bug and add comments 44e48c2 [WangTaoTheTonic] minor fix 6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs 0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size is same as driver memory size
2015-01-09 16:20:32 -05:00
<tr>
<td><code>spark.yarn.am.memory</code></td>
<td>512m</td>
<td>
Amount of memory to use for the YARN Application Master in client mode, in the same format as JVM memory strings (e.g. <code>512m</code>, <code>2g</code>).
In cluster mode, use <code>spark.driver.memory</code> instead.
</td>
</tr>
<tr>
<td><code>spark.driver.cores</code></td>
<td>1</td>
<td>
Number of cores used by the driver in YARN cluster mode.
Since the driver is run in the same JVM as the YARN Application Master in cluster mode, this also controls the cores used by the YARN AM.
In client mode, use <code>spark.yarn.am.cores</code> to control the number of cores used by the YARN AM instead.
</td>
</tr>
<tr>
<td><code>spark.yarn.am.cores</code></td>
<td>1</td>
<td>
Number of cores to use for the YARN Application Master in client mode.
In cluster mode, use <code>spark.driver.cores</code> instead.
[SPARK-1953][YARN]yarn client mode Application Master memory size is same as driver memory... ... size Ways to set Application Master's memory on yarn-client mode: 1. `spark.yarn.am.memory` in SparkConf or System Properties 2. default value 512m Note: this arguments is only available in yarn-client mode. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following commits: d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes 6c1b264 [WangTaoTheTonic] rebase b8410c0 [WangTaoTheTonic] minor optiminzation ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some improvements 3bf70cc [WangTaoTheTonic] rebase and give proper hint 987b99d [WangTaoTheTonic] disable --driver-memory in client mode 2b27928 [WangTaoTheTonic] inaccurate description b7acbb2 [WangTaoTheTonic] incorrect method invoked 2557c5e [WangTaoTheTonic] missing a single blank 42075b0 [WangTaoTheTonic] arrange the args and warn logging 69c7dba [WangTaoTheTonic] rebase 1960d16 [WangTaoTheTonic] fix wrong comment 7fa9e2e [WangTaoTheTonic] log a warning f6bee0e [WangTaoTheTonic] docs issue d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181 b09c309 [WangTaoTheTonic] use code format ab16bb5 [WangTaoTheTonic] fix bug and add comments 44e48c2 [WangTaoTheTonic] minor fix 6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs 0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size is same as driver memory size
2015-01-09 16:20:32 -05:00
</td>
</tr>
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
<tr>
<td><code>spark.yarn.am.waitTime</code></td>
[SPARK-5931][CORE] Use consistent naming for time properties I've added new utility methods to do the conversion from times specified as e.g. 120s, 240ms, 360us to convert to a consistent internal representation. I've updated usage of these constants throughout the code to be consistent. I believe I've captured all usages of time-based properties throughout the code. I've also updated variable names in a number of places to reflect their units for clarity and updated documentation where appropriate. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Author: Ilya Ganelin <ilganeli@gmail.com> Closes #5236 from ilganeli/SPARK-5931 and squashes the following commits: 4526c81 [Ilya Ganelin] Update configuration.md de3bff9 [Ilya Ganelin] Fixing style errors f5fafcd [Ilya Ganelin] Doc updates 951ca2d [Ilya Ganelin] Made the most recent round of changes bc04e05 [Ilya Ganelin] Minor fixes and doc updates 25d3f52 [Ilya Ganelin] Minor nit fixes 642a06d [Ilya Ganelin] Fixed logic for invalid suffixes and addid matching test 8927e66 [Ilya Ganelin] Fixed handling of -1 69fedcc [Ilya Ganelin] Added test for zero dc7bd08 [Ilya Ganelin] Fixed error in exception handling 7d19cdd [Ilya Ganelin] Added fix for possible NPE 6f651a8 [Ilya Ganelin] Now using regexes to simplify code in parseTimeString. Introduces getTimeAsSec and getTimeAsMs methods in SparkConf. Updated documentation cbd2ca6 [Ilya Ganelin] Formatting error 1a1122c [Ilya Ganelin] Formatting fixes and added m for use as minute formatter 4e48679 [Ilya Ganelin] Fixed priority order and mixed up conversions in a couple spots d4efd26 [Ilya Ganelin] Added time conversion for yarn.scheduler.heartbeat.interval-ms cbf41db [Ilya Ganelin] Got rid of thrown exceptions 1465390 [Ilya Ganelin] Nit 28187bf [Ilya Ganelin] Convert straight to seconds ff40bfe [Ilya Ganelin] Updated tests to fix small bugs 19c31af [Ilya Ganelin] Added cleaner computation of time conversions in tests 6387772 [Ilya Ganelin] Updated suffix handling to handle overlap of units more gracefully 5193d5f [Ilya Ganelin] Resolved merge conflicts 76cfa27 [Ilya Ganelin] [SPARK-5931] Minor nit fixes' bf779b0 [Ilya Ganelin] Special handling of overlapping usffixes for java dd0a680 [Ilya Ganelin] Updated scala code to call into java b2fc965 [Ilya Ganelin] replaced get or default since it's not present in this version of java 39164f9 [Ilya Ganelin] [SPARK-5931] Updated Java conversion to be similar to scala conversion. Updated conversions to clean up code a little using TimeUnit.convert. Added Unit tests 3b126e1 [Ilya Ganelin] Fixed conversion to US from seconds 1858197 [Ilya Ganelin] Fixed bug where all time was being converted to us instead of the appropriate units bac9edf [Ilya Ganelin] More whitespace 8613631 [Ilya Ganelin] Whitespace 1c0c07c [Ilya Ganelin] Updated Java code to add day, minutes, and hours 647b5ac [Ilya Ganelin] Udpated time conversion to use map iterator instead of if fall through 70ac213 [Ilya Ganelin] Fixed remaining usages to be consistent. Updated Java-side time conversion 68f4e93 [Ilya Ganelin] Updated more files to clean up usage of default time strings 3a12dd8 [Ilya Ganelin] Updated host revceiver 5232a36 [Ilya Ganelin] [SPARK-5931] Changed default behavior of time string conversion. 499bdf0 [Ilya Ganelin] Merge branch 'SPARK-5931' of github.com:ilganeli/spark into SPARK-5931 9e2547c [Ilya Ganelin] Reverting doc changes 8f741e1 [Ilya Ganelin] Update JavaUtils.java 34f87c2 [Ilya Ganelin] Update Utils.scala 9a29d8d [Ilya Ganelin] Fixed misuse of time in streaming context test 42477aa [Ilya Ganelin] Updated configuration doc with note on specifying time properties cde9bff [Ilya Ganelin] Updated spark.streaming.blockInterval c6a0095 [Ilya Ganelin] Updated spark.core.connection.auth.wait.timeout 5181597 [Ilya Ganelin] Updated spark.dynamicAllocation.schedulerBacklogTimeout 2fcc91c [Ilya Ganelin] Updated spark.dynamicAllocation.executorIdleTimeout 6d1518e [Ilya Ganelin] Upated spark.speculation.interval 3f1cfc8 [Ilya Ganelin] Updated spark.scheduler.revive.interval 3352d34 [Ilya Ganelin] Updated spark.scheduler.maxRegisteredResourcesWaitingTime 272c215 [Ilya Ganelin] Updated spark.locality.wait 7320c87 [Ilya Ganelin] updated spark.akka.heartbeat.interval 064ebd6 [Ilya Ganelin] Updated usage of spark.cleaner.ttl 21ef3dd [Ilya Ganelin] updated spark.shuffle.sasl.timeout c9f5cad [Ilya Ganelin] Updated spark.shuffle.io.retryWait 4933fda [Ilya Ganelin] Updated usage of spark.storage.blockManagerSlaveTimeout 7db6d2a [Ilya Ganelin] Updated usage of spark.akka.timeout 404f8c3 [Ilya Ganelin] Updated usage of spark.core.connection.ack.wait.timeout 59bf9e1 [Ilya Ganelin] [SPARK-5931] Updated Utils and JavaUtils classes to add helper methods to handle time strings. Updated time strings in a few places to properly parse time
2015-04-13 19:28:07 -04:00
<td>100s</td>
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
<td>
In `yarn-cluster` mode, time for the application master to wait for the
SparkContext to be initialized. In `yarn-client` mode, time for the application master to wait
for the driver to connect to it.
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
</td>
</tr>
<tr>
<td><code>spark.yarn.submit.file.replication</code></td>
<td>The default HDFS replication (usually 3)</td>
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
<td>
HDFS replication level for the files uploaded into HDFS for the application. These include things like the Spark jar, the app jar, and any distributed cache files/archives.
</td>
</tr>
<tr>
<td><code>spark.yarn.preserve.staging.files</code></td>
<td>false</td>
<td>
Set to true to preserve the staged files (Spark jar, app jar, distributed cache files) at the end of the job rather than delete them.
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
</td>
</tr>
<tr>
<td><code>spark.yarn.scheduler.heartbeat.interval-ms</code></td>
<td>3000</td>
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
<td>
The interval in ms in which the Spark application master heartbeats into the YARN ResourceManager.
The value is capped at half the value of YARN's configuration for the expiry interval
(<code>yarn.am.liveness-monitor.expiry-interval-ms</code>).
</td>
</tr>
<tr>
<td><code>spark.yarn.scheduler.initial-allocation.interval</code></td>
<td>200ms</td>
<td>
The initial interval in which the Spark application master eagerly heartbeats to the YARN ResourceManager
when there are pending container allocation requests. It should be no larger than
<code>spark.yarn.scheduler.heartbeat.interval-ms</code>. The allocation interval will doubled on
successive eager heartbeats if pending containers still exist, until
<code>spark.yarn.scheduler.heartbeat.interval-ms</code> is reached.
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
</td>
</tr>
<tr>
<td><code>spark.yarn.max.executor.failures</code></td>
<td>numExecutors * 2, with minimum of 3</td>
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
<td>
The maximum number of executor failures before failing the application.
</td>
</tr>
<tr>
<td><code>spark.yarn.historyServer.address</code></td>
<td>(none)</td>
<td>
The address of the Spark history server (i.e. host.com:18080). The address should not contain a scheme (http://). Defaults to not being set since the history server is an optional service. This address is given to the YARN ResourceManager when the Spark application finishes to link the application from the ResourceManager UI to the Spark history server UI.
For this property, YARN properties can be used as variables, and these are substituted by Spark at runtime. For eg, if the Spark history server runs on the same node as the YARN ResourceManager, it can be set to `${hadoopconf-yarn.resourcemanager.hostname}:18080`.
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
</td>
</tr>
<tr>
<td><code>spark.yarn.dist.archives</code></td>
<td>(none)</td>
<td>
Comma separated list of archives to be extracted into the working directory of each executor.
</td>
</tr>
<tr>
<td><code>spark.yarn.dist.files</code></td>
<td>(none)</td>
<td>
Comma-separated list of files to be placed in the working directory of each executor.
2014-08-26 21:55:00 -04:00
</td>
</tr>
<tr>
<td><code>spark.executor.instances</code></td>
<td>2</td>
<td>
The number of executors. Note that this property is incompatible with <code>spark.dynamicAllocation.enabled</code>. If both <code>spark.dynamicAllocation.enabled</code> and <code>spark.executor.instances</code> are specified, dynamic allocation is turned off and the specified number of <code>spark.executor.instances</code> is used.
</td>
</tr>
<tr>
<td><code>spark.yarn.executor.memoryOverhead</code></td>
<td>executorMemory * 0.10, with minimum of 384 </td>
<td>
Modify default YARN memory_overhead-- from an additive constant to a multiplier Redone against the recent master branch (https://github.com/apache/spark/pull/1391) Author: Nishkam Ravi <nravi@cloudera.com> Author: nravi <nravi@c1704.halxg.cloudera.com> Author: nishkamravi2 <nishkamravi@gmail.com> Closes #2485 from nishkamravi2/master_nravi and squashes the following commits: 636a9ff [nishkamravi2] Update YarnAllocator.scala 8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead 35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead 5ac2ec1 [Nishkam Ravi] Remove out dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue 42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue 362da5e [Nishkam Ravi] Additional changes for yarn memory overhead c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead 1cf2d1e [nishkamravi2] Update YarnAllocator.scala ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts) 2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark 2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark 3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark 5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456) 6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed) 5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456) 681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
2014-10-02 14:48:35 -04:00
The amount of off heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
</td>
</tr>
<tr>
<td><code>spark.yarn.driver.memoryOverhead</code></td>
Modify default YARN memory_overhead-- from an additive constant to a multiplier Redone against the recent master branch (https://github.com/apache/spark/pull/1391) Author: Nishkam Ravi <nravi@cloudera.com> Author: nravi <nravi@c1704.halxg.cloudera.com> Author: nishkamravi2 <nishkamravi@gmail.com> Closes #2485 from nishkamravi2/master_nravi and squashes the following commits: 636a9ff [nishkamravi2] Update YarnAllocator.scala 8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead 35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead 5ac2ec1 [Nishkam Ravi] Remove out dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue 42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue 362da5e [Nishkam Ravi] Additional changes for yarn memory overhead c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead 1cf2d1e [nishkamravi2] Update YarnAllocator.scala ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts) 2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark 2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark 3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark 5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456) 6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed) 5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456) 681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
2014-10-02 14:48:35 -04:00
<td>driverMemory * 0.07, with minimum of 384 </td>
<td>
[SPARK-1953][YARN]yarn client mode Application Master memory size is same as driver memory... ... size Ways to set Application Master's memory on yarn-client mode: 1. `spark.yarn.am.memory` in SparkConf or System Properties 2. default value 512m Note: this arguments is only available in yarn-client mode. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following commits: d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes 6c1b264 [WangTaoTheTonic] rebase b8410c0 [WangTaoTheTonic] minor optiminzation ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some improvements 3bf70cc [WangTaoTheTonic] rebase and give proper hint 987b99d [WangTaoTheTonic] disable --driver-memory in client mode 2b27928 [WangTaoTheTonic] inaccurate description b7acbb2 [WangTaoTheTonic] incorrect method invoked 2557c5e [WangTaoTheTonic] missing a single blank 42075b0 [WangTaoTheTonic] arrange the args and warn logging 69c7dba [WangTaoTheTonic] rebase 1960d16 [WangTaoTheTonic] fix wrong comment 7fa9e2e [WangTaoTheTonic] log a warning f6bee0e [WangTaoTheTonic] docs issue d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181 b09c309 [WangTaoTheTonic] use code format ab16bb5 [WangTaoTheTonic] fix bug and add comments 44e48c2 [WangTaoTheTonic] minor fix 6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs 0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size is same as driver memory size
2015-01-09 16:20:32 -05:00
The amount of off heap memory (in megabytes) to be allocated per driver in cluster mode. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the container size (typically 6-10%).
</td>
</tr>
<tr>
<td><code>spark.yarn.am.memoryOverhead</code></td>
<td>AM memory * 0.07, with minimum of 384 </td>
<td>
Same as <code>spark.yarn.driver.memoryOverhead</code>, but for the Application Master in client mode.
</td>
</tr>
<tr>
<td><code>spark.yarn.am.port</code></td>
<td>(random)</td>
<td>
Port for the YARN Application Master to listen on. In YARN client mode, this is used to communicate between the Spark driver running on a gateway and the Application Master running on YARN. In YARN cluster mode, this is used for the dynamic executor feature, where it handles the kill from the scheduler backend.
</td>
</tr>
<tr>
<td><code>spark.yarn.queue</code></td>
<td>default</td>
<td>
The name of the YARN queue to which the application is submitted.
</td>
</tr>
[SPARK-1395] Fix "local:" URI support in Yarn mode (again). Recent changes ignored the fact that path may be defined with "local:" URIs, which means they need to be explicitly added to the classpath everywhere a remote process is started. This change fixes that by: - Using the correct methods to add paths to the classpath - Creating SparkConf settings for the Spark jar itself and for the user's jar - Propagating those two settings to the remote processes where needed This ensures that both in client and in cluster mode, the driver has the necessary info to build the executor's classpath and have things still work when they contain "local:" references. The change also fixes some confusion in ClientBase about whether to use SparkConf or system properties to propagate config options to the driver and executors, by standardizing on using data held by SparkConf. On the cleanup front, I removed the hacky way that log4j configuration was being propagated to handle the "local:" case. It's much more cleanly (and generically) handled by using spark-submit arguments (--files to upload a config file, or setting spark.executor.extraJavaOptions to pass JVM arguments and use a local file). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #560 from vanzin/yarn-local-2 and squashes the following commits: 4e7f066 [Marcelo Vanzin] Correctly propagate SPARK_JAVA_OPTS to driver/executor. 6a454ea [Marcelo Vanzin] Use constants for PWD in test. 6dd5943 [Marcelo Vanzin] Fix propagation of config options to driver / executor. b2e377f [Marcelo Vanzin] Review feedback. 93c3f85 [Marcelo Vanzin] Fix ClassCastException in test. e5c682d [Marcelo Vanzin] Fix cluster mode, restore SPARK_LOG4J_CONF. 1dfbb40 [Marcelo Vanzin] Add documentation for spark.yarn.jar. bbdce05 [Marcelo Vanzin] [SPARK-1395] Fix "local:" URI support in Yarn mode (again).
2014-06-23 09:51:11 -04:00
<tr>
<td><code>spark.yarn.jar</code></td>
<td>(none)</td>
<td>
The location of the Spark jar file, in case overriding the default location is desired.
By default, Spark on YARN will use a Spark jar installed locally, but the Spark jar can also be
in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't
need to be distributed each time an application runs. To point to a jar on HDFS, for example,
set this configuration to "hdfs:///some/path".
</td>
</tr>
<tr>
<td><code>spark.yarn.access.namenodes</code></td>
<td>(none)</td>
<td>
A list of secure HDFS namenodes your Spark application is going to access. For
example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`.
The Spark application must have acess to the namenodes listed and Kerberos must
be properly configured to be able to access them (either in the same realm or in
a trusted realm). Spark acquires security tokens for each of the namenodes so that
the Spark application can access those remote HDFS clusters.
</td>
</tr>
<tr>
<td><code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code></td>
<td>(none)</td>
<td>
Add the environment variable specified by <code>EnvironmentVariableName</code> to the
Application Master process launched on YARN. The user can specify multiple of
these and to set multiple environment variables. In `yarn-cluster` mode this controls
the environment of the SPARK driver and in `yarn-client` mode it only controls
the environment of the executor launcher.
</td>
</tr>
<tr>
<td><code>spark.yarn.containerLauncherMaxThreads</code></td>
<td>25</td>
<td>
The maximum number of threads to use in the application master for launching executor containers.
</td>
</tr>
[SPARK-4461][YARN] pass extra java options to yarn application master Currently, there is no way to pass yarn am specific java options. It cause some potential issues when reading classpath from hadoop configuration file. Hadoop configuration actually replace variables in its property with the system property passed in java options. How to specify the value depends on different hadoop distribution. The new options are SPARK_YARN_JAVA_OPTS or spark.yarn.extraJavaOptions. I make it as spark global level, because typically we don't want user to specify this in their command line each time submitting spark job after it is setup in spark-defaults.conf. In addition, with this new extra options enabled to be passed to AM, it provides more flexibility. For example int the following valid mapred-site.xml file, we have the class path which specify values using system property. Hadoop can correctly handle it because it has java options passed in. This is the example, currently spark will break due to hadoop.version is not passed in. <property> <name>mapreduce.application.classpath</name> <value>/etc/hadoop/${hadoop.version}/mapreduce/*</value> </property> In the meantime, we cannot relies on mapreduce.admin.map.child.java.opts in mapred-site.xml, because it has its own extra java options specified, which does not apply to Spark. Author: Zhan Zhang <zhazhan@gmail.com> Closes #3409 from zhzhan/Spark-4461 and squashes the following commits: daec3d0 [Zhan Zhang] solve review comments 08f44a7 [Zhan Zhang] add warning in driver mode if spark.yarn.am.extraJavaOptions is configured 5a505d3 [Zhan Zhang] solve review comments 4ed43ad [Zhan Zhang] solve review comments ad777ed [Zhan Zhang] Merge branch 'master' into Spark-4461 3e9e574 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark e3f9abe [Zhan Zhang] solve review comments 8963552 [Zhan Zhang] rebase f8f6700 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark dea1692 [Zhan Zhang] change the option key name to client mode specific 90d5dff [Zhan Zhang] rebase 8ac9254 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 092a25f [Zhan Zhang] solve review comments bc5a9ae [Zhan Zhang] solve review comments 782b014 [Zhan Zhang] add new configuration to docs/running-on-yarn.md and remove it from spark-defaults.conf.template 6faaa97 [Zhan Zhang] solve review comments 369863f [Zhan Zhang] clean up unnecessary var 733de9c [Zhan Zhang] Merge branch 'master' into Spark-4461 a68e7f0 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 864505a [Zhan Zhang] Add extra java options to be passed to Yarn application master 15830fc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 685d911 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 03ebad3 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark 46d9e3d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark ebb213a [Zhan Zhang] revert b983ef3 [Zhan Zhang] test c4efb9b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 779d67b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 4daae6d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 12e1be5 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark ce0ca7b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 93f3081 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3764505 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark a9d372b [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark a00f60f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark f6a8a40 [Zhan Zhang] revert ba14f28 [Zhan Zhang] test
2014-12-18 11:01:46 -05:00
<tr>
<td><code>spark.yarn.am.extraJavaOptions</code></td>
<td>(none)</td>
<td>
[SPARK-1953][YARN]yarn client mode Application Master memory size is same as driver memory... ... size Ways to set Application Master's memory on yarn-client mode: 1. `spark.yarn.am.memory` in SparkConf or System Properties 2. default value 512m Note: this arguments is only available in yarn-client mode. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following commits: d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes 6c1b264 [WangTaoTheTonic] rebase b8410c0 [WangTaoTheTonic] minor optiminzation ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some improvements 3bf70cc [WangTaoTheTonic] rebase and give proper hint 987b99d [WangTaoTheTonic] disable --driver-memory in client mode 2b27928 [WangTaoTheTonic] inaccurate description b7acbb2 [WangTaoTheTonic] incorrect method invoked 2557c5e [WangTaoTheTonic] missing a single blank 42075b0 [WangTaoTheTonic] arrange the args and warn logging 69c7dba [WangTaoTheTonic] rebase 1960d16 [WangTaoTheTonic] fix wrong comment 7fa9e2e [WangTaoTheTonic] log a warning f6bee0e [WangTaoTheTonic] docs issue d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181 b09c309 [WangTaoTheTonic] use code format ab16bb5 [WangTaoTheTonic] fix bug and add comments 44e48c2 [WangTaoTheTonic] minor fix 6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs 0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size is same as driver memory size
2015-01-09 16:20:32 -05:00
A string of extra JVM options to pass to the YARN Application Master in client mode.
In cluster mode, use `spark.driver.extraJavaOptions` instead.
[SPARK-4461][YARN] pass extra java options to yarn application master Currently, there is no way to pass yarn am specific java options. It cause some potential issues when reading classpath from hadoop configuration file. Hadoop configuration actually replace variables in its property with the system property passed in java options. How to specify the value depends on different hadoop distribution. The new options are SPARK_YARN_JAVA_OPTS or spark.yarn.extraJavaOptions. I make it as spark global level, because typically we don't want user to specify this in their command line each time submitting spark job after it is setup in spark-defaults.conf. In addition, with this new extra options enabled to be passed to AM, it provides more flexibility. For example int the following valid mapred-site.xml file, we have the class path which specify values using system property. Hadoop can correctly handle it because it has java options passed in. This is the example, currently spark will break due to hadoop.version is not passed in. <property> <name>mapreduce.application.classpath</name> <value>/etc/hadoop/${hadoop.version}/mapreduce/*</value> </property> In the meantime, we cannot relies on mapreduce.admin.map.child.java.opts in mapred-site.xml, because it has its own extra java options specified, which does not apply to Spark. Author: Zhan Zhang <zhazhan@gmail.com> Closes #3409 from zhzhan/Spark-4461 and squashes the following commits: daec3d0 [Zhan Zhang] solve review comments 08f44a7 [Zhan Zhang] add warning in driver mode if spark.yarn.am.extraJavaOptions is configured 5a505d3 [Zhan Zhang] solve review comments 4ed43ad [Zhan Zhang] solve review comments ad777ed [Zhan Zhang] Merge branch 'master' into Spark-4461 3e9e574 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark e3f9abe [Zhan Zhang] solve review comments 8963552 [Zhan Zhang] rebase f8f6700 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark dea1692 [Zhan Zhang] change the option key name to client mode specific 90d5dff [Zhan Zhang] rebase 8ac9254 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 092a25f [Zhan Zhang] solve review comments bc5a9ae [Zhan Zhang] solve review comments 782b014 [Zhan Zhang] add new configuration to docs/running-on-yarn.md and remove it from spark-defaults.conf.template 6faaa97 [Zhan Zhang] solve review comments 369863f [Zhan Zhang] clean up unnecessary var 733de9c [Zhan Zhang] Merge branch 'master' into Spark-4461 a68e7f0 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 864505a [Zhan Zhang] Add extra java options to be passed to Yarn application master 15830fc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 685d911 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 03ebad3 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark 46d9e3d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark ebb213a [Zhan Zhang] revert b983ef3 [Zhan Zhang] test c4efb9b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 779d67b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 4daae6d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 12e1be5 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark ce0ca7b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 93f3081 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3764505 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark a9d372b [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark a00f60f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark f6a8a40 [Zhan Zhang] revert ba14f28 [Zhan Zhang] test
2014-12-18 11:01:46 -05:00
</td>
</tr>
<tr>
<td><code>spark.yarn.am.extraLibraryPath</code></td>
<td>(none)</td>
<td>
Set a special library path to use when launching the application master in client mode.
</td>
</tr>
<tr>
<td><code>spark.yarn.maxAppAttempts</code></td>
<td>yarn.resourcemanager.am.max-attempts in YARN</td>
<td>
The maximum number of attempts that will be made to submit the application.
It should be no larger than the global number of max attempts in the YARN configuration.
</td>
</tr>
[SPARK-3591][YARN]fire and forget for YARN cluster mode https://issues.apache.org/jira/browse/SPARK-3591 The output after this patch: >doggie153:/opt/oss/spark-1.3.0-bin-hadoop2.4/bin # ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster ../lib/spark-examples*.jar 15/03/31 21:15:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/31 21:15:25 INFO RMProxy: Connecting to ResourceManager at doggie153/10.177.112.153:8032 15/03/31 21:15:25 INFO Client: Requesting a new application from cluster with 4 NodeManagers 15/03/31 21:15:25 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 15/03/31 21:15:25 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/03/31 21:15:25 INFO Client: Setting up container launch context for our AM 15/03/31 21:15:25 INFO Client: Preparing resources for our AM container 15/03/31 21:15:26 INFO Client: Uploading resource file:/opt/oss/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-SNAPSHOT-hadoop2.4.1.jar -> hdfs://doggie153:9000/user/root/.sparkStaging/application_1427257505534_0016/spark-assembly-1.4.0-SNAPSHOT-hadoop2.4.1.jar 15/03/31 21:15:27 INFO Client: Uploading resource file:/opt/oss/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar -> hdfs://doggie153:9000/user/root/.sparkStaging/application_1427257505534_0016/spark-examples-1.3.0-hadoop2.4.0.jar 15/03/31 21:15:28 INFO Client: Setting up the launch environment for our AM container 15/03/31 21:15:28 INFO SecurityManager: Changing view acls to: root 15/03/31 21:15:28 INFO SecurityManager: Changing modify acls to: root 15/03/31 21:15:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/03/31 21:15:28 INFO Client: Submitting application 16 to ResourceManager 15/03/31 21:15:28 INFO YarnClientImpl: Submitted application application_1427257505534_0016 15/03/31 21:15:28 INFO Client: ... waiting before polling ResourceManager for application state 15/03/31 21:15:33 INFO Client: ... polling ResourceManager for application state 15/03/31 21:15:33 INFO Client: Application report for application_1427257505534_0016 (state: RUNNING) 15/03/31 21:15:33 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: doggie157 ApplicationMaster RPC port: 0 queue: default start time: 1427807728307 final status: UNDEFINED tracking URL: http://doggie153:8088/proxy/application_1427257505534_0016/ user: root /cc andrewor14 Author: WangTaoTheTonic <wangtao111@huawei.com> Closes #5297 from WangTaoTheTonic/SPARK-3591 and squashes the following commits: c76d232 [WangTaoTheTonic] wrap lines 16c90a8 [WangTaoTheTonic] move up lines to avoid duplicate fea390d [WangTaoTheTonic] log failed/killed report, style and comment be1cc2e [WangTaoTheTonic] reword f0bc54f [WangTaoTheTonic] minor: expose appid in excepiton messages ba9b22b [WangTaoTheTonic] wrong config name e1a4013 [WangTaoTheTonic] revert to the old version and do some robust 19706c0 [WangTaoTheTonic] add a config to control whether to forget 0cbdce8 [WangTaoTheTonic] fire and forget for YARN cluster mode
2015-04-07 09:36:25 -04:00
<tr>
<td><code>spark.yarn.submit.waitAppCompletion</code></td>
<td>true</td>
<td>
In YARN cluster mode, controls whether the client waits to exit until the application completes.
If set to true, the client process will stay alive reporting the application's status.
Otherwise, the client process will exit after submission.
</td>
</tr>
<tr>
<td><code>spark.yarn.executor.nodeLabelExpression</code></td>
<td>(none)</td>
<td>
A YARN node label expression that restricts the set of nodes executors will be scheduled on.
Only versions of YARN greater than or equal to 2.6 support node label expressions, so when
running against earlier versions, this property will be ignored.
</td>
</tr>
<tr>
<td><code>spark.yarn.keytab</code></td>
<td>(none)</td>
<td>
The full path to the file that contains the keytab for the principal specified above.
This keytab will be copied to the node running the Application Master via the Secure Distributed Cache,
for renewing the login tickets and the delegation tokens periodically.
</td>
</tr>
<tr>
<td><code>spark.yarn.principal</code></td>
<td>(none)</td>
<td>
Principal to be used to login to KDC, while running on secure HDFS.
</td>
</tr>
<tr>
<td><code>spark.yarn.config.gatewayPath</code></td>
<td>(none)</td>
<td>
A path that is valid on the gateway host (the host where a Spark application is started) but may
differ for paths for the same resource in other nodes in the cluster. Coupled with
<code>spark.yarn.config.replacementPath</code>, this is used to support clusters with
heterogeneous configurations, so that Spark can correctly launch remote processes.
<p/>
The replacement path normally will contain a reference to some environment variable exported by
YARN (and, thus, visible to Spark containers).
<p/>
For example, if the gateway node has Hadoop libraries installed on <code>/disk1/hadoop</code>, and
the location of the Hadoop install is exported by YARN as the <code>HADOOP_HOME</code>
environment variable, setting this value to <code>/disk1/hadoop</code> and the replacement path to
<code>$HADOOP_HOME</code> will make sure that paths used to launch remote processes properly
reference the local YARN configuration.
</td>
</tr>
<tr>
<td><code>spark.yarn.config.replacementPath</code></td>
<td>(none)</td>
<td>
See <code>spark.yarn.config.gatewayPath</code>.
</td>
</tr>
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
</table>
[SPARK-1395] Fix "local:" URI support in Yarn mode (again). Recent changes ignored the fact that path may be defined with "local:" URIs, which means they need to be explicitly added to the classpath everywhere a remote process is started. This change fixes that by: - Using the correct methods to add paths to the classpath - Creating SparkConf settings for the Spark jar itself and for the user's jar - Propagating those two settings to the remote processes where needed This ensures that both in client and in cluster mode, the driver has the necessary info to build the executor's classpath and have things still work when they contain "local:" references. The change also fixes some confusion in ClientBase about whether to use SparkConf or system properties to propagate config options to the driver and executors, by standardizing on using data held by SparkConf. On the cleanup front, I removed the hacky way that log4j configuration was being propagated to handle the "local:" case. It's much more cleanly (and generically) handled by using spark-submit arguments (--files to upload a config file, or setting spark.executor.extraJavaOptions to pass JVM arguments and use a local file). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #560 from vanzin/yarn-local-2 and squashes the following commits: 4e7f066 [Marcelo Vanzin] Correctly propagate SPARK_JAVA_OPTS to driver/executor. 6a454ea [Marcelo Vanzin] Use constants for PWD in test. 6dd5943 [Marcelo Vanzin] Fix propagation of config options to driver / executor. b2e377f [Marcelo Vanzin] Review feedback. 93c3f85 [Marcelo Vanzin] Fix ClassCastException in test. e5c682d [Marcelo Vanzin] Fix cluster mode, restore SPARK_LOG4J_CONF. 1dfbb40 [Marcelo Vanzin] Add documentation for spark.yarn.jar. bbdce05 [Marcelo Vanzin] [SPARK-1395] Fix "local:" URI support in Yarn mode (again).
2014-06-23 09:51:11 -04:00
# Important notes
- Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured.
- In `yarn-cluster` mode, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored. In `yarn-client` mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in `spark.local.dir`. This is because the Spark driver does not run on the YARN cluster in `yarn-client` mode, only the Spark executors do.
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
- The `--files` and `--archives` options support specifying file names with the # similar to Hadoop. For example you can specify: `--files localtest.txt#appSees.txt` and this will upload the file you have locally named localtest.txt into HDFS but this will be linked to by the name `appSees.txt`, and your application should use the name as `appSees.txt` to reference it when running on YARN.
- The `--jars` option allows the `SparkContext.addJar` function to work if you are using it with local files and running in `yarn-cluster` mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files.