...ests.
Without this SparkHadoopUtil is used by the Client instead of YarnSparkHadoopUtil.
Author: Hari Shreedharan <hshreedharan@apache.org>
Closes#4711 from harishreedharan/SPARK-5937 and squashes the following commits:
d154de6 [Hari Shreedharan] Use System.clearProperty() instead of setting the value of SPARK_YARN_MODE to empty string.
f729f70 [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests.
some time since some reasons, it lead to some exception while NMClient start some containers.example:we do not config spark_shuffle on some machines, so it will throw a exception:
java.lang.Error: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist.
because YarnAllocator use ThreadPoolExecutor to start Container, so we can not find which container or hostname throw exception. I think we should catch YarnException in ExecutorRunnable when start container. if there are some exceptions, we can know the container id or hostname of failed container.
Author: lianhuiwang <lianhuiwang09@gmail.com>
Closes#4554 from lianhuiwang/SPARK-5759 and squashes the following commits:
caf5a99 [lianhuiwang] use SparkException to warp exception
c02140f [lianhuiwang] ExecutorRunnable should catch YarnException while NMClient start container
Yarn's config option `spark.yarn.user.classpath.first` does not work the same way as
`spark.files.userClassPathFirst`; Yarn's version is a lot more dangerous, in that it
modifies the system classpath, instead of restricting the changes to the user's class
loader. So this change implements the behavior of the latter for Yarn, and deprecates
the more dangerous choice.
To be able to achieve feature-parity, I also implemented the option for drivers (the existing
option only applies to executors). So now there are two options, each controlling whether
to apply userClassPathFirst to the driver or executors. The old option was deprecated, and
aliased to the new one (`spark.executor.userClassPathFirst`).
The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it
was also doing some things that ended up causing JVM errors depending on how things
were being called.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#3233 from vanzin/SPARK-2996 and squashes the following commits:
9cf9cf1 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
a1499e2 [Marcelo Vanzin] Remove SPARK_HOME propagation.
fa7df88 [Marcelo Vanzin] Remove 'test.resource' file, create it dynamically.
a8c69f1 [Marcelo Vanzin] Review feedback.
cabf962 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
a1b8d7e [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
3f768e3 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
2ce3c7a [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
0e6d6be [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
70d4044 [Marcelo Vanzin] Fix pyspark/yarn-cluster test.
0fe7777 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
0e6ef19 [Marcelo Vanzin] Move class loaders around and make names more meaninful.
fe970a7 [Marcelo Vanzin] Review feedback.
25d4fed [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
3cb6498 [Marcelo Vanzin] Call the right loadClass() method on the parent.
fbb8ab5 [Marcelo Vanzin] Add locking in loadClass() to avoid deadlocks.
2e6c4b7 [Marcelo Vanzin] Mention new setting in documentation.
b6497f9 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
a10f379 [Marcelo Vanzin] Some feedback.
3730151 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
f513871 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
44010b6 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
7b57cba [Marcelo Vanzin] Remove now outdated message.
5304d64 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
35949c8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
54e1a98 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
d1273b2 [Marcelo Vanzin] Add test file to rat exclude.
fa1aafa [Marcelo Vanzin] Remove write check on user jars.
89d8072 [Marcelo Vanzin] Cleanups.
a963ea3 [Marcelo Vanzin] Implement spark.driver.userClassPathFirst for standalone cluster mode.
50afa5f [Marcelo Vanzin] Fix Yarn executor command line.
7d14397 [Marcelo Vanzin] Register user jars in executor up front.
7f8603c [Marcelo Vanzin] Fix yarn-cluster mode without userClassPathFirst.
20373f5 [Marcelo Vanzin] Fix ClientBaseSuite.
55c88fa [Marcelo Vanzin] Run all Yarn integration tests via spark-submit.
0b64d92 [Marcelo Vanzin] Add deprecation warning to yarn option.
4a84d87 [Marcelo Vanzin] Fix the child-first class loader.
d0394b8 [Marcelo Vanzin] Add "deprecated configs" to SparkConf.
46d8cf2 [Marcelo Vanzin] Update doc with new option, change name to "userClassPathFirst".
a314f2d [Marcelo Vanzin] Enable driver class path isolation in SparkSubmit.
91f7e54 [Marcelo Vanzin] [yarn] Enable executor class path isolation.
a853e74 [Marcelo Vanzin] Re-work CoarseGrainedExecutorBackend command line arguments.
89522ef [Marcelo Vanzin] Add class path isolation support for Yarn cluster mode.
Before passing to YARN, escape arguments in "extraJavaOptions" args, in order to correctly handle cases like -Dfoo="one two three". Also standardize how these args are handled and ensure that individual args are treated as stand-alone args, not one string.
vanzin andrewor14
Author: Sean Owen <sowen@cloudera.com>
Closes#4452 from srowen/SPARK-4267.2 and squashes the following commits:
c8297d2 [Sean Owen] Before passing to YARN, escape arguments in "extraJavaOptions" args, in order to correctly handle cases like -Dfoo="one two three". Also standardize how these args are handled and ensure that individual args are treated as stand-alone args, not one string.
[SPARK-5613] Added a catch block to catch the ApplicationNotFoundException. Without this catch block the thread gets killed on occurrence of this exception. This Exception occurs when yarn restarts and tries to find an application id for a spark job which got interrupted due to yarn getting stopped.
See the stacktrace in the bug for more details.
Author: Kashish Jain <kashish.jain@guavus.com>
Closes#4392 from kasjain/branch-1.2 and squashes the following commits:
4831000 [Kashish Jain] SPARK-5613: Catch the ApplicationNotFoundException exception to avoid thread from getting killed on yarn restart.
Adds links to stderr/stdout in the executor tab of the webUI for:
1) Standalone
2) Yarn client
3) Yarn cluster
This tries to add the log url support in a general way so as to make it easy to add support for all the
cluster managers. This is done by using environment variables to pass to the executor the log urls. The
SPARK_LOG_URL_ prefix is used and so additional logs besides stderr/stdout can also be added.
To propagate this information to the UI we use the onExecutorAdded spark listener event.
Although this commit doesn't add log urls when running on a mesos cluster, it should be possible to add using the same mechanism.
Author: Kostas Sakellis <kostas@cloudera.com>
Author: Josh Rosen <joshrosen@databricks.com>
Closes#3486 from ksakellis/kostas-spark-2450 and squashes the following commits:
d190936 [Josh Rosen] Fix a few minor style / formatting nits. Reset listener after each test Don't null listener out at end of main().
8673fe1 [Kostas Sakellis] CR feedback. Hide the log column if there are no logs available
5bf6952 [Kostas Sakellis] [SPARK-2450] [CORE] Adds exeuctor log links to Web UI
Author: Sandy Ryza <sandy@cloudera.com>
Closes#4141 from sryza/sandy-spark-4337 and squashes the following commits:
a98bd20 [Sandy Ryza] Andrew's comments
cdaab7f [Sandy Ryza] SPARK-4337. Add ability to cancel pending requests to YARN
in ApplicationMaster rename isDriver to isClusterMode,because in Client it uses isClusterMode,ApplicationMaster should keep consistent with it and uses isClusterMode.Also isClusterMode is easier to understand.
andrewor14 sryza
Author: lianhuiwang <lianhuiwang09@gmail.com>
Closes#4430 from lianhuiwang/am-isDriver-rename and squashes the following commits:
f9f3ed0 [lianhuiwang] rename isDriver to isClusterMode
When we set `SPARK_USE_CONC_INCR_GC`, ConcurrentMarkSweepGC works on the AM.
Actually, if ConcurrentMarkSweepGC is set for the JVM, following JVM options are set automatically and implicitly.
* MaxTenuringThreshold=0
* SurvivorRatio=1024
Those can not be proper value for most cases.
See also http://www.oracle.com/technetwork/java/tuning-139912.html
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#3956 from sarutak/SPARK-5157 and squashes the following commits:
c15da4e [Kousuke Saruta] Set more JVM options for AM when enabling CMS
These are needed transitively from the other Jetty libraries
we include. It was not picked up by unit tests because we
disable the UI.
Author: Patrick Wendell <patrick@databricks.com>
Closes#4323 from pwendell/jetty and squashes the following commits:
d8669da [Patrick Wendell] SPARK-3996: Add jetty servlet and continuations.
SPARK-3883: SSL support for Akka connections and Jetty based file servers.
This story introduced the following changes:
- Introduced SSLOptions object which holds the SSL configuration and can build the appropriate configuration for Akka or Jetty. SSLOptions can be created by parsing SparkConf entries at a specified namespace.
- SSLOptions is created and kept by SecurityManager
- All Akka actor address creation snippets based on interpolated strings were replaced by a dedicated methods from AkkaUtils. Those methods select the proper Akka protocol - whether akka.tcp or akka.ssl.tcp
- Added tests cases for AkkaUtils, FileServer, SSLOptions and SecurityManager
- Added a way to use node local SSL configuration by executors and driver in standalone mode. It can be done by specifying spark.ssl.useNodeLocalConf in SparkConf.
- Made CoarseGrainedExecutorBackend not overwrite the settings which are executor startup configuration - they are passed anyway from Worker
Refer to https://github.com/apache/spark/pull/3571 for discussion and details
Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
Author: Jacek Lewandowski <jacek.lewandowski@datastax.com>
Closes#3571 from jacek-lewandowski/SPARK-3883-master and squashes the following commits:
9ef4ed1 [Jacek Lewandowski] Merge pull request #2 from jacek-lewandowski/SPARK-3883-docs2
fb31b49 [Jacek Lewandowski] SPARK-3883: Added SSL setup documentation
2532668 [Jacek Lewandowski] SPARK-3883: Refactored AkkaUtils.protocol method to not use Try
90a8762 [Jacek Lewandowski] SPARK-3883: Refactored methods to resolve Akka address and made it possible to easily configure multiple communication layers for SSL
72b2541 [Jacek Lewandowski] SPARK-3883: A reference to the fallback SSLOptions can be provided when constructing SSLOptions
93050f4 [Jacek Lewandowski] SPARK-3883: SSL support for HttpServer and Akka
when call killExecutor method, it will only go to the else branch, because the variable executorIdToContainer never be put any value.
Author: Xutingjun <1039320815@qq.com>
Closes#4309 from XuTingjun/dynamicAllocator and squashes the following commits:
c823418 [Xutingjun] fix bugwq
now when we run python application on yarn cluster mode through spark-submit, spark-submit does not support python application on yarn cluster mode. so i modify code of submit and yarn's AM in order to support it.
through specifying .py file or primaryResource file via spark-submit, we can make pyspark run in yarn-cluster mode.
example:spark-submit --master yarn-master --num-executors 1 --driver-memory 1g --executor-memory 1g xx.py --primaryResource yy.conf
this config is same as pyspark on yarn-client mode.
firstly,we put local path of .py or primaryResource to yarn's dist.files.that can be distributed on slave nodes.and then in spark-submit we transfer --py-files and --primaryResource to yarn.Client and use "org.apache.spark.deploy.PythonRunner" to user class that can run .py files on ApplicationMaster.
in yarn.Client we transfer --py-files and --primaryResource to ApplicationMaster.
in ApplicationMaster, user's class is org.apache.spark.deploy.PythonRunner, and user's args is primaryResource and -py-files. so that can make pyspark run on ApplicationMaster.
JoshRosen tgravescs sryza
Author: lianhuiwang <lianhuiwang09@gmail.com>
Author: Wang Lianhui <lianhuiwang09@gmail.com>
Closes#3976 from lianhuiwang/SPARK-5173 and squashes the following commits:
28a8a58 [lianhuiwang] fix variable name
67f8cee [lianhuiwang] update with andrewor's comments
0319ae3 [lianhuiwang] address with sryza's comments
2385ef6 [lianhuiwang] address with sryza's comments
03640ab [lianhuiwang] add sparkHome to env
47d2fc3 [lianhuiwang] fix test
2adc8f5 [lianhuiwang] add spark.test.home
d60bc60 [lianhuiwang] fix test
5b30064 [lianhuiwang] add test
097a5ec [lianhuiwang] fix line length exceeds 100
905a106 [lianhuiwang] update with sryza and andrewor 's comments
f1f55b6 [lianhuiwang] when yarn-cluster, all python files can be non-local
172eec1 [Wang Lianhui] fix a min submit's bug
9c941bc [lianhuiwang] support python application running on yarn cluster mode
... initial number
Author: Sandy Ryza <sandy@cloudera.com>
Closes#4051 from sryza/sandy-spark-4585 and squashes the following commits:
d1dd039 [Sandy Ryza] Add spark.dynamicAllocation.initialNumExecutors and make min and max not required
b7c59dc [Sandy Ryza] SPARK-4585. Spark dynamic executor allocation should use minExecutors as initial number
Modified environment strings and path separators to platform-independent style if possible.
Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
Closes#3943 from tsudukim/feature/SPARK-1825 and squashes the following commits:
ec4b865 [Masayoshi TSUZUKI] Rebased and modified as comments.
f8a1d5a [Masayoshi TSUZUKI] Merge branch 'master' of github.com:tsudukim/spark into feature/SPARK-1825
3d03d35 [Masayoshi TSUZUKI] [SPARK-1825] Make Windows Spark client work fine with Linux YARN cluster
Previously I had tried to solve this with by adding a line in Spark's log4j-defaults.properties.
The issue with the message in log4j-defaults.properties was that the log4j.properties packaged inside Hadoop was getting picked up instead. While it would be ideal to fix that as well, we still want to quiet this in situations where a user supplies their own custom log4j properties.
Author: Sandy Ryza <sandy@cloudera.com>
Closes#4192 from sryza/sandy-spark-5393 and squashes the following commits:
4d5dedc [Sandy Ryza] Only set log level if unset
46e07c5 [Sandy Ryza] SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714
With executor dynamic scaling enabled, executor number shoude be added or killed in yarn-cluster mode.so in yarn-cluster mode, ApplicationMaster start a AMActor that add or kill a executor. then YarnSchedulerActor in YarnSchedulerBackend send message to am's AMActor.
andrewor14 ChengXiangLi tdas
Author: lianhuiwang <lianhuiwang09@gmail.com>
Closes#3962 from lianhuiwang/SPARK-4955 and squashes the following commits:
48d9ebb [lianhuiwang] update with andrewor14's comments
12426af [lianhuiwang] refactor am's code
45da3b0 [lianhuiwang] remove unrelated code
9318fc1 [lianhuiwang] update with andrewor14's comments
08ba473 [lianhuiwang] address andrewor14's comments
265c36d [lianhuiwang] fix small change
f43bda8 [lianhuiwang] fix address andrewor14's comments
7a7767a [lianhuiwang] fix address andrewor14's comments
bbc4d5a [lianhuiwang] address andrewor14's comments
1b029a4 [lianhuiwang] fix bug
7d33791 [lianhuiwang] in AM create a new actorSystem
2164ea8 [lianhuiwang] fix a min bug
6dfeeec [lianhuiwang] in yarn-cluster mode,executor number can be added or killed.
...tor
Author: Sandy Ryza <sandy@cloudera.com>
Closes#4164 from sryza/sandy-spark-5370 and squashes the following commits:
0c8d736 [Sandy Ryza] SPARK-5370. [YARN] Remove some unnecessary synchronization in YarnAllocator
...llocator
The goal of this PR is to simplify YarnAllocator as much as possible and get it up to the level of code quality we see in the rest of Spark.
In service of this, it does a few things:
* Uses AMRMClient APIs for matching containers to requests.
* Adds calls to AMRMClient.removeContainerRequest so that, when we use a container, we don't end up requesting it again.
* Removes YarnAllocator's host->rack cache. YARN's RackResolver already does this caching, so this is redundant.
* Adds tests for basic YarnAllocator functionality.
* Breaks up the allocateResources method, which was previously nearly 300 lines.
* A little bit of stylistic cleanup.
* Fixes a bug that causes three times the requests to be filed when preferred host locations are given.
The patch is lossy. In particular, it loses the logic for trying to avoid containers bunching up on nodes. As I understand it, the logic that's gone is:
* If, in a single response from the RM, we receive a set of containers on a node, and prefer some number of containers on that node greater than 0 but less than the number we received, give back the delta between what we preferred and what we received.
This seems like a weird way to avoid bunching E.g. it does nothing to avoid bunching when we don't request containers on particular nodes.
Author: Sandy Ryza <sandy@cloudera.com>
Closes#3765 from sryza/sandy-spark-1714 and squashes the following commits:
32a5942 [Sandy Ryza] Muffle RackResolver logs
74f56dd [Sandy Ryza] Fix a couple comments and simplify requestTotalExecutors
60ea4bd [Sandy Ryza] Fix scalastyle
ca35b53 [Sandy Ryza] Simplify further
e9cf8a6 [Sandy Ryza] Fix YarnClusterSuite
257acf3 [Sandy Ryza] Remove locality stuff and more cleanup
59a3c5e [Sandy Ryza] Take out rack stuff
5f72fd5 [Sandy Ryza] Further documentation and cleanup
89edd68 [Sandy Ryza] SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnAllocator
https://issues.apache.org/jira/browse/SPARK-5336
Author: WangTao <barneystinson@aliyun.com>
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Closes#4123 from WangTaoTheTonic/SPARK-5336 and squashes the following commits:
6c9676a [WangTao] Update ClientArguments.scala
9632d3a [WangTaoTheTonic] minor comment fix
d03d6fa [WangTaoTheTonic] import ordering should be alphabetical'
3112af9 [WangTao] spark.executor.cores must not be less than spark.task.cpus
There are two relevant 'skip' configurations in the build, the first
is for "mvn install" and the second is for "mvn deploy". As of 1.2,
we actually use "mvn install" to generate our deployed artifcts,
because we have some customization of the nexus upload due to having
to cross compile for Scala 2.10 and 2.11.
There is no reason to have differents settings for these values,
this patch simply cleans this up for the repl/ and yarn/
projects.
Author: Patrick Wendell <patrick@databricks.com>
Closes#4080 from pwendell/master and squashes the following commits:
e21b78b [Patrick Wendell] [HOTFIX]: Minor clean up regarding skipped artifacts in build files.
Based on top of changes in https://github.com/apache/spark/pull/3806.
https://issues.apache.org/jira/browse/SPARK-1507
`--driver-cores` and `spark.driver.cores` for all cluster modes and `spark.yarn.am.cores` for yarn client mode.
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>
Closes#4018 from WangTaoTheTonic/SPARK-1507 and squashes the following commits:
01419d3 [WangTaoTheTonic] amend the args name
b255795 [WangTaoTheTonic] indet thing
d86557c [WangTaoTheTonic] some comments amend
43c9392 [WangTao] fix compile error
b39a100 [WangTao] specify # cores for ApplicationMaster
I found some arguments in yarn module take environment variables before system properties while the latter override the former in core module.
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>
Closes#3557 from WangTaoTheTonic/SPARK4697 and squashes the following commits:
836b9ef [WangTaoTheTonic] fix type mismatch
e3e486a [WangTaoTheTonic] remove the comma
1262d57 [WangTaoTheTonic] handle spark.app.name and SPARK_YARN_APP_NAME in SparkSubmitArguments
bee9447 [WangTaoTheTonic] wrong brace
81833bb [WangTaoTheTonic] rebase
40934b4 [WangTaoTheTonic] just switch blocks
5f43f45 [WangTao] System property can override environment variable
https://issues.apache.org/jira/browse/SPARK-5006
I think the issue is produced in https://github.com/apache/spark/pull/1777.
Not digging mesos's backend yet. Maybe should add same logic either.
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>
Closes#3841 from WangTaoTheTonic/SPARK-5006 and squashes the following commits:
8cdf96d [WangTao] indent thing
2d86d65 [WangTaoTheTonic] fix line length
7cdfd98 [WangTaoTheTonic] fit for new HttpServer constructor
61a370d [WangTaoTheTonic] some minor fixes
bc6e1ec [WangTaoTheTonic] rebase
67bcb46 [WangTaoTheTonic] put conf at 3rd position, modify suite class, add comments
f450cd1 [WangTaoTheTonic] startServiceOnPort will use a SparkConf arg
29b751b [WangTaoTheTonic] rebase as ExecutorRunnableUtil changed to ExecutorRunnable
396c226 [WangTaoTheTonic] make the grammar more like scala
191face [WangTaoTheTonic] invalid value name
62ec336 [WangTaoTheTonic] spark.port.maxRetries doesn't work
... size
Ways to set Application Master's memory on yarn-client mode:
1. `spark.yarn.am.memory` in SparkConf or System Properties
2. default value 512m
Note: this arguments is only available in yarn-client mode.
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Closes#3607 from WangTaoTheTonic/SPARK4181 and squashes the following commits:
d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes
6c1b264 [WangTaoTheTonic] rebase
b8410c0 [WangTaoTheTonic] minor optiminzation
ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some improvements
3bf70cc [WangTaoTheTonic] rebase and give proper hint
987b99d [WangTaoTheTonic] disable --driver-memory in client mode
2b27928 [WangTaoTheTonic] inaccurate description
b7acbb2 [WangTaoTheTonic] incorrect method invoked
2557c5e [WangTaoTheTonic] missing a single blank
42075b0 [WangTaoTheTonic] arrange the args and warn logging
69c7dba [WangTaoTheTonic] rebase
1960d16 [WangTaoTheTonic] fix wrong comment
7fa9e2e [WangTaoTheTonic] log a warning
f6bee0e [WangTaoTheTonic] docs issue
d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181
b09c309 [WangTaoTheTonic] use code format
ab16bb5 [WangTaoTheTonic] fix bug and add comments
44e48c2 [WangTaoTheTonic] minor fix
6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs
0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size is same as driver memory size
Soryy for fetching the wrong max attempts in this commit 8fdd48959c.
We need to fix it now.
tgravescs
If we set an spark.yarn.maxAppAttempts which is larger than `yarn.resourcemanager.am.max-attempts` in yarn side, it will be overrided as described here:
>The maximum number of application attempts. It's a global setting for all application masters. Each application master can specify its individual maximum number of application attempts via the API, but the individual number cannot be more than the global upper bound. If it is, the resourcemanager will override it. The default number is set to 2, to allow at least one retry for AM.
http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Closes#3942 from WangTaoTheTonic/HOTFIX and squashes the following commits:
9ac16ce [WangTaoTheTonic] fetch the correct max attempts
This change does a few things to make the hadoop-provided profile more useful:
- Create new profiles for other libraries / services that might be provided by the infrastructure
- Simplify and fix the poms so that the profiles are only activated while building assemblies.
- Fix tests so that they're able to run when the profiles are activated
- Add a new env variable to be used by distributions that use these profiles to provide the runtime
classpath for Spark jobs and daemons.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#2982 from vanzin/SPARK-4048 and squashes the following commits:
82eb688 [Marcelo Vanzin] Add a comment.
eb228c0 [Marcelo Vanzin] Fix borked merge.
4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
371ebee [Marcelo Vanzin] Review feedback.
52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
322f882 [Marcelo Vanzin] Fix merge fail.
f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9640503 [Marcelo Vanzin] Cleanup child process log message.
115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
d1399ed [Marcelo Vanzin] Restore jetty dependency.
82a54b9 [Marcelo Vanzin] Remove unused profile.
5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
...xt
https://issues.apache.org/jira/browse/SPARK-2165
I still have 2 questions:
* If this config is not set, we should use yarn's corresponding value or a default value(like 2) on spark side?
* Is the config name best? Or "spark.yarn.am.maxAttempts"?
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Closes#3878 from WangTaoTheTonic/SPARK-2165 and squashes the following commits:
1416c83 [WangTaoTheTonic] use the name spark.yarn.maxAppAttempts
202ac85 [WangTaoTheTonic] rephrase some
afdfc99 [WangTaoTheTonic] more detailed description
91562c6 [WangTaoTheTonic] add support for setting maxAppAttempts in the ApplicationSubmissionContext
Nowadays, yarn-client will exit directly when the HA change happens no matter how many times the am should retry.
The reason may be that the default final status only considerred the sys.exit, and the yarn-client HA cann't benefit from this.
So we should distinct the default final status between client and cluster, because the SUCCEEDED status may cause the HA failed in client mode and UNDEFINED may cause the error reporter in cluster when using sys.exit.
Author: huangzhaowei <carlmartinmax@gmail.com>
Closes#3771 from SaintBacchus/YarnHA and squashes the following commits:
c02bfcc [huangzhaowei] Improve the comment of the funciton 'getDefaultFinalStatus'
0e69924 [huangzhaowei] Bug fix: fix the yarn-client code to support HA
This PR:
- Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
- Tells `surefire` to test only Java tests
- Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.
For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.
Author: Sean Owen <sowen@cloudera.com>
Closes#3651 from srowen/SPARK-4159 and squashes the following commits:
2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
ExecutorRunnableUtil is a parent of ExecutorRunnable because of the yarn-alpha and yarn-stable split. Now that yarn-alpha is gone, this commit squashes the unnecessary hierarchy. The methods from ExecutorRunnableUtil are added as private.
Author: Kostas Sakellis <kostas@cloudera.com>
Closes#3696 from ksakellis/kostas-spark-4843 and squashes the following commits:
486716f [Kostas Sakellis] Moved prepareEnvironment call to after yarnConf declaration
470e22e [Kostas Sakellis] Fixed indentation and renamed sparkConf variable
9b1b1c9 [Kostas Sakellis] SPARK-4843 [YARN] Squash ExecutorRunnableUtil and ExecutorRunnable
Author: meiyoula <1039320815@qq.com>
Closes#3797 from XuTingjun/MemoryOverhead and squashes the following commits:
5a780fc [meiyoula] Update ClientArguments.scala
It's really a minor issue.
In ApplicationMaster, there is code like as follows.
val preserveFiles = sparkConf.get("spark.yarn.preserve.staging.files", "false").toBoolean
I think, the code can be simplified like as follows.
val preserveFiles = sparkConf.getBoolean("spark.yarn.preserve.staging.files", false)
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#3733 from sarutak/SPARK-4881 and squashes the following commits:
1771430 [Kousuke Saruta] Modified the code like sparkConf.get(...).toBoolean to sparkConf.getBoolean(...)
c63daa0 [Kousuke Saruta] Simplified code
See https://issues.apache.org/jira/browse/SPARK-4730.
Author: Andrew Or <andrew@databricks.com>
Closes#3590 from andrewor14/yarn-settings and squashes the following commits:
36e0753 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-settings
dcd1316 [Andrew Or] Warn against deprecated YARN settings
... changed to a time period
Author: Sandy Ryza <sandy@cloudera.com>
Closes#3471 from sryza/sandy-spark-3779 and squashes the following commits:
20b9887 [Sandy Ryza] Deprecate old property
42b5df7 [Sandy Ryza] Review feedback
9a959a1 [Sandy Ryza] SPARK-3779. yarn spark.yarn.applicationMaster.waitTries config should be changed to a time period
Currently, there is no way to pass yarn am specific java options. It cause some potential issues when reading classpath from hadoop configuration file. Hadoop configuration actually replace variables in its property with the system property passed in java options. How to specify the value depends on different hadoop distribution.
The new options are SPARK_YARN_JAVA_OPTS or spark.yarn.extraJavaOptions. I make it as spark global level, because typically we don't want user to specify this in their command line each time submitting spark job after it is setup in spark-defaults.conf.
In addition, with this new extra options enabled to be passed to AM, it provides more flexibility.
For example int the following valid mapred-site.xml file, we have the class path which specify values using system property. Hadoop can correctly handle it because it has java options passed in.
This is the example, currently spark will break due to hadoop.version is not passed in.
<property>
<name>mapreduce.application.classpath</name>
<value>/etc/hadoop/${hadoop.version}/mapreduce/*</value>
</property>
In the meantime, we cannot relies on mapreduce.admin.map.child.java.opts in mapred-site.xml, because it has its own extra java options specified, which does not apply to Spark.
Author: Zhan Zhang <zhazhan@gmail.com>
Closes#3409 from zhzhan/Spark-4461 and squashes the following commits:
daec3d0 [Zhan Zhang] solve review comments
08f44a7 [Zhan Zhang] add warning in driver mode if spark.yarn.am.extraJavaOptions is configured
5a505d3 [Zhan Zhang] solve review comments
4ed43ad [Zhan Zhang] solve review comments
ad777ed [Zhan Zhang] Merge branch 'master' into Spark-4461
3e9e574 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
e3f9abe [Zhan Zhang] solve review comments
8963552 [Zhan Zhang] rebase
f8f6700 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
dea1692 [Zhan Zhang] change the option key name to client mode specific
90d5dff [Zhan Zhang] rebase
8ac9254 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
092a25f [Zhan Zhang] solve review comments
bc5a9ae [Zhan Zhang] solve review comments
782b014 [Zhan Zhang] add new configuration to docs/running-on-yarn.md and remove it from spark-defaults.conf.template
6faaa97 [Zhan Zhang] solve review comments
369863f [Zhan Zhang] clean up unnecessary var
733de9c [Zhan Zhang] Merge branch 'master' into Spark-4461
a68e7f0 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
864505a [Zhan Zhang] Add extra java options to be passed to Yarn application master
15830fc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
685d911 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
03ebad3 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark
46d9e3d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
ebb213a [Zhan Zhang] revert
b983ef3 [Zhan Zhang] test
c4efb9b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
779d67b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
4daae6d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
12e1be5 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
ce0ca7b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
93f3081 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
3764505 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
a9d372b [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark
a00f60f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
f6a8a40 [Zhan Zhang] revert
ba14f28 [Zhan Zhang] test
Sorry if this is a little premature with 1.2 still not out the door, but it will make other work like SPARK-4136 and SPARK-2089 a lot easier.
Author: Sandy Ryza <sandy@cloudera.com>
Closes#3215 from sryza/sandy-spark-4338 and squashes the following commits:
1c5ac08 [Sandy Ryza] Update building Spark docs and remove unnecessary newline
9c1421c [Sandy Ryza] SPARK-4338. Ditch yarn-alpha.
In yarn-cluster and standalone-cluster modes, we don't know where driver will run until it is launched. If the `spark.driver.host` property is set on the submitting machine and propagated to the driver through SparkConf then this will lead to errors when the driver launches.
This patch fixes this issue by dropping the `spark.driver.host` property in SparkSubmit when running in a cluster deploy mode.
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>
Closes#3112 from WangTaoTheTonic/SPARK4253 and squashes the following commits:
ed1a25c [WangTaoTheTonic] revert unrelated formatting issue
02c4e49 [WangTao] add comment
32a3f3f [WangTaoTheTonic] ingore it in SparkSubmit instead of SparkContext
667cf24 [WangTaoTheTonic] document fix
ff8d5f7 [WangTaoTheTonic] also ignore it in standalone cluster mode
2286e6b [WangTao] ignore spark.driver.host in yarn-cluster mode
This is caused by a975dc3279
Author: Andrew Or <andrew@databricks.com>
Closes#3589 from andrewor14/yarn-hot-fix and squashes the following commits:
a4fad5f [Andrew Or] Check whether lib directory exists before listing its files
If `spark-submit` finds the datanucleus jars, it adds them to the driver's classpath, but does not add it to the container.
This patch modifies the yarn deployment class to copy all `datanucleus-*` jars found in `[spark-home]/libs` to the container.
Author: Jim Lim <jim@quixey.com>
Closes#3238 from jimjh/SPARK-2624 and squashes the following commits:
3633071 [Jim Lim] SPARK-2624 update documentation and comments
fe95125 [Jim Lim] SPARK-2624 keep java imports together
6c31fe0 [Jim Lim] SPARK-2624 update documentation
6690fbf [Jim Lim] SPARK-2624 add tests
d28d8e9 [Jim Lim] SPARK-2624 add spark.yarn.datanucleus.dir option
84e6cba [Jim Lim] SPARK-2624 add datanucleus jars to the container in yarn-cluster
The security manager adds a lot of overhead to the runtime of the
app, and causes a severe performance regression. Even stubbing out
all unneeded methods (all except checkExit()) does not help.
So, instead, penalize users who do an explicit System.exit() by leaving
them in "undefined behavior" territory: if they do that, the Yarn
backend won't be able to report the final app status to the RM.
The result is that the final status of the application might not match
the user's expectations.
One side-effect of the change is that users who do an explicit
System.exit() will lose the AM retry functionality. Since there is
no way to know if the exit was because of success or failure, the
AM right now errs on the side of it being a successful exit.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#3484 from vanzin/SPARK-4584 and squashes the following commits:
21f2502 [Marcelo Vanzin] Do not retry apps that use System.exit().
4198b3b [Marcelo Vanzin] [SPARK-4584] [yarn] Remove security manager from Yarn AM.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#3277 from vanzin/version-1.3 and squashes the following commits:
7c3c396 [Marcelo Vanzin] Added temp repo to sbt build.
5f404ff [Marcelo Vanzin] Add another exclusion.
19457e7 [Marcelo Vanzin] Update old version to 1.2, add temporary 1.2 repo.
3c8d705 [Marcelo Vanzin] Workaround for MIMA checks.
e940810 [Marcelo Vanzin] Bumping version to 1.3.0-SNAPSHOT.
In YarnClientSchedulerBackend, a variable "stopping" is used as a flag and it's accessed by some threads so it should be volatile.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#3143 from sarutak/stopping-flag-volatile and squashes the following commits:
58fdcc9 [Kousuke Saruta] Marked stoppig flag as volatile
I did not realize there was a `network.util.JavaUtils` when I wrote this code. This PR moves the `ByteBuffer` string conversion to the appropriate place. I tested the changes on a stable yarn cluster.
Author: Andrew Or <andrew@databricks.com>
Closes#3144 from andrewor14/yarn-shuffle-util and squashes the following commits:
b6c08bf [Andrew Or] Remove unused import
94e205c [Andrew Or] Use netty Unpooled
85202a5 [Andrew Or] Use guava Charsets
057135b [Andrew Or] Reword comment
adf186d [Andrew Or] Move byte buffer String conversion logic to JavaUtils
This creates a new module `network/yarn` that depends on `network/shuffle` recently created in #3001. This PR introduces a custom Yarn auxiliary service that runs the external shuffle service. As of the changes here this shuffle service is required for using dynamic allocation with Spark.
This is still WIP mainly because it doesn't handle security yet. I have tested this on a stable Yarn cluster.
Author: Andrew Or <andrew@databricks.com>
Closes#3082 from andrewor14/yarn-shuffle-service and squashes the following commits:
ef3ddae [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
0ee67a2 [Andrew Or] Minor wording suggestions
1c66046 [Andrew Or] Remove unused provided dependencies
0eb6233 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
6489db5 [Andrew Or] Try catch at the right places
7b71d8f [Andrew Or] Add detailed java docs + reword a few comments
d1124e4 [Andrew Or] Add security to shuffle service (INCOMPLETE)
5f8a96f [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
9b6e058 [Andrew Or] Address various feedback
f48b20c [Andrew Or] Fix tests again
f39daa6 [Andrew Or] Do not make network-yarn an assembly module
761f58a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
15a5b37 [Andrew Or] Fix build for Hadoop 1.x
baff916 [Andrew Or] Fix tests
5bf9b7e [Andrew Or] Address a few minor comments
5b419b8 [Andrew Or] Add missing license header
804e7ff [Andrew Or] Include the Yarn shuffle service jar in the distribution
cd076a4 [Andrew Or] Require external shuffle service for dynamic allocation
ea764e0 [Andrew Or] Connect to Yarn shuffle service only if it's enabled
1bf5109 [Andrew Or] Use the shuffle service port specified through hadoop config
b4b1f0c [Andrew Or] 4 tabs -> 2 tabs
43dcb96 [Andrew Or] First cut integration of shuffle service with Yarn aux service
b54a0c4 [Andrew Or] Initial skeleton for Yarn shuffle service