Commit graph

289 commits

Author SHA1 Message Date
Kashish Jain ca66159a4f SPARK-5613: Catch the ApplicationNotFoundException exception to avoid thread from getting killed on yarn restart.
[SPARK-5613] Added a  catch block to catch the ApplicationNotFoundException. Without this catch block the thread gets killed on occurrence of this exception. This Exception occurs when yarn restarts and tries to find an application id for a spark job which got interrupted due to yarn getting stopped.
See the stacktrace in the bug for more details.

Author: Kashish Jain <kashish.jain@guavus.com>

Closes #4392 from kasjain/branch-1.2 and squashes the following commits:

4831000 [Kashish Jain] SPARK-5613: Catch the ApplicationNotFoundException exception to avoid thread from getting killed on yarn restart.
2015-02-06 13:59:11 -08:00
Kostas Sakellis 32e964c410 SPARK-2450 Adds executor log links to Web UI
Adds links to stderr/stdout in the executor tab of the webUI for:
1) Standalone
2) Yarn client
3) Yarn cluster

This tries to add the log url support in a general way so as to make it easy to add support for all the
cluster managers. This is done by using environment variables to pass to the executor the log urls. The
SPARK_LOG_URL_ prefix is used and so additional logs besides stderr/stdout can also be added.

To propagate this information to the UI we use the onExecutorAdded spark listener event.

Although this commit doesn't add log urls when running on a mesos cluster, it should be possible to add using the same mechanism.

Author: Kostas Sakellis <kostas@cloudera.com>
Author: Josh Rosen <joshrosen@databricks.com>

Closes #3486 from ksakellis/kostas-spark-2450 and squashes the following commits:

d190936 [Josh Rosen] Fix a few minor style / formatting nits. Reset listener after each test Don't null listener out at end of main().
8673fe1 [Kostas Sakellis] CR feedback. Hide the log column if there are no logs available
5bf6952 [Kostas Sakellis] [SPARK-2450] [CORE] Adds exeuctor log links to Web UI
2015-02-06 11:13:00 -08:00
Sandy Ryza 1a88f20de7 SPARK-4337. [YARN] Add ability to cancel pending requests
Author: Sandy Ryza <sandy@cloudera.com>

Closes #4141 from sryza/sandy-spark-4337 and squashes the following commits:

a98bd20 [Sandy Ryza] Andrew's comments
cdaab7f [Sandy Ryza] SPARK-4337. Add ability to cancel pending requests to YARN
2015-02-06 10:53:16 -08:00
lianhuiwang cc6e53119d [SPARK-5653][YARN] In ApplicationMaster rename isDriver to isClusterMode
in ApplicationMaster rename isDriver to isClusterMode,because in Client it uses isClusterMode,ApplicationMaster should keep consistent with it and uses isClusterMode.Also isClusterMode is easier to understand.
andrewor14 sryza

Author: lianhuiwang <lianhuiwang09@gmail.com>

Closes #4430 from lianhuiwang/am-isDriver-rename and squashes the following commits:

f9f3ed0 [lianhuiwang] rename isDriver to isClusterMode
2015-02-06 10:48:31 -08:00
Kousuke Saruta 24dbc50b93 [SPARK-5157][YARN] Configure more JVM options properly when we use ConcMarkSweepGC for AM.
When we set `SPARK_USE_CONC_INCR_GC`, ConcurrentMarkSweepGC works on the AM.
Actually, if ConcurrentMarkSweepGC is set for the JVM, following JVM options are set automatically and implicitly.

* MaxTenuringThreshold=0
* SurvivorRatio=1024

Those can not be proper value for most cases.
See also http://www.oracle.com/technetwork/java/tuning-139912.html

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3956 from sarutak/SPARK-5157 and squashes the following commits:

c15da4e [Kousuke Saruta] Set more JVM options for AM when enabling CMS
2015-02-06 09:39:12 +00:00
Patrick Wendell 7930d2bef0 SPARK-3996: Add jetty servlet and continuations.
These are needed transitively from the other Jetty libraries
we include. It was not picked up by unit tests because we
disable the UI.

Author: Patrick Wendell <patrick@databricks.com>

Closes #4323 from pwendell/jetty and squashes the following commits:

d8669da [Patrick Wendell] SPARK-3996: Add jetty servlet and continuations.
2015-02-02 21:01:36 -08:00
Jacek Lewandowski cfea30037f Spark 3883: SSL support for HttpServer and Akka
SPARK-3883: SSL support for Akka connections and Jetty based file servers.

This story introduced the following changes:
- Introduced SSLOptions object which holds the SSL configuration and can build the appropriate configuration for Akka or Jetty. SSLOptions can be created by parsing SparkConf entries at a specified namespace.
- SSLOptions is created and kept by SecurityManager
- All Akka actor address creation snippets based on interpolated strings were replaced by a dedicated methods from AkkaUtils. Those methods select the proper Akka protocol - whether akka.tcp or akka.ssl.tcp
- Added tests cases for AkkaUtils, FileServer, SSLOptions and SecurityManager
- Added a way to use node local SSL configuration by executors and driver in standalone mode. It can be done by specifying spark.ssl.useNodeLocalConf in SparkConf.
- Made CoarseGrainedExecutorBackend not overwrite the settings which are executor startup configuration - they are passed anyway from Worker

Refer to https://github.com/apache/spark/pull/3571 for discussion and details

Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
Author: Jacek Lewandowski <jacek.lewandowski@datastax.com>

Closes #3571 from jacek-lewandowski/SPARK-3883-master and squashes the following commits:

9ef4ed1 [Jacek Lewandowski] Merge pull request #2 from jacek-lewandowski/SPARK-3883-docs2
fb31b49 [Jacek Lewandowski] SPARK-3883: Added SSL setup documentation
2532668 [Jacek Lewandowski] SPARK-3883: Refactored AkkaUtils.protocol method to not use Try
90a8762 [Jacek Lewandowski] SPARK-3883: Refactored methods to resolve Akka address and made it possible to easily configure multiple communication layers for SSL
72b2541 [Jacek Lewandowski] SPARK-3883: A reference to the fallback SSLOptions can be provided when constructing SSLOptions
93050f4 [Jacek Lewandowski] SPARK-3883: SSL support for HttpServer and Akka
2015-02-02 17:27:26 -08:00
Patrick Wendell 2321dd1ef9 [HOTFIX] Add jetty references to build for YARN module. 2015-02-02 14:00:49 -08:00
Xutingjun 62a93a1698 [SPARK-5530] Add executor container to executorIdToContainer
when call killExecutor method, it will only go to the else branch, because  the variable executorIdToContainer never be put any value.

Author: Xutingjun <1039320815@qq.com>

Closes #4309 from XuTingjun/dynamicAllocator and squashes the following commits:

c823418 [Xutingjun] fix bugwq
2015-02-02 12:37:51 -08:00
lianhuiwang f5e63751f0 [SPARK-5173]support python application running on yarn cluster mode
now when we run python application on yarn cluster mode through spark-submit, spark-submit does not support python application on yarn cluster mode. so i modify code of submit and yarn's AM in order to support it.
through specifying .py file or primaryResource file via spark-submit, we can make pyspark run in yarn-cluster mode.
example:spark-submit --master yarn-master --num-executors 1 --driver-memory 1g --executor-memory 1g  xx.py --primaryResource yy.conf
this config is same as pyspark on yarn-client mode.
firstly,we put local path of .py or primaryResource to yarn's dist.files.that can be distributed on slave nodes.and then in spark-submit we transfer --py-files and --primaryResource to yarn.Client and use "org.apache.spark.deploy.PythonRunner" to user class that can run .py files on ApplicationMaster.
in yarn.Client we transfer --py-files and --primaryResource to  ApplicationMaster.
in ApplicationMaster, user's class is org.apache.spark.deploy.PythonRunner, and user's args is primaryResource and -py-files. so that can make pyspark run on ApplicationMaster.
JoshRosen tgravescs sryza

Author: lianhuiwang <lianhuiwang09@gmail.com>
Author: Wang Lianhui <lianhuiwang09@gmail.com>

Closes #3976 from lianhuiwang/SPARK-5173 and squashes the following commits:

28a8a58 [lianhuiwang] fix variable name
67f8cee [lianhuiwang] update with andrewor's comments
0319ae3 [lianhuiwang] address with sryza's comments
2385ef6 [lianhuiwang] address with sryza's comments
03640ab [lianhuiwang] add sparkHome to env
47d2fc3 [lianhuiwang] fix test
2adc8f5 [lianhuiwang] add spark.test.home
d60bc60 [lianhuiwang] fix test
5b30064 [lianhuiwang] add test
097a5ec [lianhuiwang] fix line length exceeds 100
905a106 [lianhuiwang] update with sryza and andrewor 's comments
f1f55b6 [lianhuiwang] when yarn-cluster, all python files can be non-local
172eec1 [Wang Lianhui] fix a min submit's bug
9c941bc [lianhuiwang] support python application running on yarn cluster mode
2015-02-02 12:32:28 -08:00
Sandy Ryza b2047b55c5 SPARK-4585. Spark dynamic executor allocation should use minExecutors as...
... initial number

Author: Sandy Ryza <sandy@cloudera.com>

Closes #4051 from sryza/sandy-spark-4585 and squashes the following commits:

d1dd039 [Sandy Ryza] Add spark.dynamicAllocation.initialNumExecutors and make min and max not required
b7c59dc [Sandy Ryza] SPARK-4585. Spark dynamic executor allocation should use minExecutors as initial number
2015-02-02 12:27:08 -08:00
Masayoshi TSUZUKI 7712ed5b16 [SPARK-1825] Make Windows Spark client work fine with Linux YARN cluster
Modified environment strings and path separators to platform-independent style if possible.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>

Closes #3943 from tsudukim/feature/SPARK-1825 and squashes the following commits:

ec4b865 [Masayoshi TSUZUKI] Rebased and modified as comments.
f8a1d5a [Masayoshi TSUZUKI] Merge branch 'master' of github.com:tsudukim/spark into feature/SPARK-1825
3d03d35 [Masayoshi TSUZUKI] [SPARK-1825] Make Windows Spark client work fine with Linux YARN cluster
2015-02-01 18:28:55 -08:00
Sandy Ryza 254eaa4d35 SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714
Previously I had tried to solve this with by adding a line in Spark's log4j-defaults.properties.

The issue with the message in log4j-defaults.properties was that the log4j.properties packaged inside Hadoop was getting picked up instead. While it would be ideal to fix that as well, we still want to quiet this in situations where a user supplies their own custom log4j properties.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #4192 from sryza/sandy-spark-5393 and squashes the following commits:

4d5dedc [Sandy Ryza] Only set log level if unset
46e07c5 [Sandy Ryza] SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714
2015-01-30 11:31:54 -06:00
lianhuiwang 81f8f34062 [SPARK-4955]With executor dynamic scaling enabled,executor shoude be added or killed in yarn-cluster mode.
With executor dynamic scaling enabled, executor number shoude be added or killed in yarn-cluster mode.so in yarn-cluster mode, ApplicationMaster start a AMActor that add or kill a executor. then YarnSchedulerActor  in YarnSchedulerBackend send message to am's AMActor.
andrewor14 ChengXiangLi tdas

Author: lianhuiwang <lianhuiwang09@gmail.com>

Closes #3962 from lianhuiwang/SPARK-4955 and squashes the following commits:

48d9ebb [lianhuiwang] update with andrewor14's comments
12426af [lianhuiwang] refactor am's code
45da3b0 [lianhuiwang] remove unrelated code
9318fc1 [lianhuiwang] update with andrewor14's comments
08ba473 [lianhuiwang] address andrewor14's comments
265c36d [lianhuiwang] fix small change
f43bda8 [lianhuiwang] fix address andrewor14's comments
7a7767a [lianhuiwang] fix address andrewor14's comments
bbc4d5a [lianhuiwang] address andrewor14's comments
1b029a4 [lianhuiwang] fix bug
7d33791 [lianhuiwang] in AM create a new actorSystem
2164ea8 [lianhuiwang] fix a min bug
6dfeeec [lianhuiwang] in yarn-cluster mode,executor number can be added or killed.
2015-01-28 12:51:15 -08:00
Sandy Ryza 820ce03597 SPARK-5370. [YARN] Remove some unnecessary synchronization in YarnAlloca...
...tor

Author: Sandy Ryza <sandy@cloudera.com>

Closes #4164 from sryza/sandy-spark-5370 and squashes the following commits:

0c8d736 [Sandy Ryza] SPARK-5370. [YARN] Remove some unnecessary synchronization in YarnAllocator
2015-01-22 13:49:35 -06:00
Sandy Ryza 2eeada373e SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnA...
...llocator

The goal of this PR is to simplify YarnAllocator as much as possible and get it up to the level of code quality we see in the rest of Spark.

In service of this, it does a few things:
* Uses AMRMClient APIs for matching containers to requests.
* Adds calls to AMRMClient.removeContainerRequest so that, when we use a container, we don't end up requesting it again.
* Removes YarnAllocator's host->rack cache. YARN's RackResolver already does this caching, so this is redundant.
* Adds tests for basic YarnAllocator functionality.
* Breaks up the allocateResources method, which was previously nearly 300 lines.
* A little bit of stylistic cleanup.
* Fixes a bug that causes three times the requests to be filed when preferred host locations are given.

The patch is lossy. In particular, it loses the logic for trying to avoid containers bunching up on nodes. As I understand it, the logic that's gone is:

* If, in a single response from the RM, we receive a set of containers on a node, and prefer some number of containers on that node greater than 0 but less than the number we received, give back the delta between what we preferred and what we received.

This seems like a weird way to avoid bunching E.g. it does nothing to avoid bunching when we don't request containers on particular nodes.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #3765 from sryza/sandy-spark-1714 and squashes the following commits:

32a5942 [Sandy Ryza] Muffle RackResolver logs
74f56dd [Sandy Ryza] Fix a couple comments and simplify requestTotalExecutors
60ea4bd [Sandy Ryza] Fix scalastyle
ca35b53 [Sandy Ryza] Simplify further
e9cf8a6 [Sandy Ryza] Fix YarnClusterSuite
257acf3 [Sandy Ryza] Remove locality stuff and more cleanup
59a3c5e [Sandy Ryza] Take out rack stuff
5f72fd5 [Sandy Ryza] Further documentation and cleanup
89edd68 [Sandy Ryza] SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnAllocator
2015-01-21 10:31:54 -06:00
WangTao 8c06a5faac [SPARK-5336][YARN]spark.executor.cores must not be less than spark.task.cpus
https://issues.apache.org/jira/browse/SPARK-5336

Author: WangTao <barneystinson@aliyun.com>
Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #4123 from WangTaoTheTonic/SPARK-5336 and squashes the following commits:

6c9676a [WangTao] Update ClientArguments.scala
9632d3a [WangTaoTheTonic] minor comment fix
d03d6fa [WangTaoTheTonic] import ordering should be alphabetical'
3112af9 [WangTao] spark.executor.cores must not be less than spark.task.cpus
2015-01-21 09:42:30 -06:00
Patrick Wendell ad16da1bcc [HOTFIX]: Minor clean up regarding skipped artifacts in build files.
There are two relevant 'skip' configurations in the build, the first
is for "mvn install" and the second is for "mvn deploy". As of 1.2,
we actually use "mvn install" to generate our deployed artifcts,
because we have some customization of the nexus upload due to having
to cross compile for Scala 2.10 and 2.11.

There is no reason to have differents settings for these values,
this patch simply cleans this up for the repl/ and yarn/
projects.

Author: Patrick Wendell <patrick@databricks.com>

Closes #4080 from pwendell/master and squashes the following commits:

e21b78b [Patrick Wendell] [HOTFIX]: Minor clean up regarding skipped artifacts in build files.
2015-01-17 23:15:12 -08:00
WangTaoTheTonic 2be82b1e66 [SPARK-1507][YARN]specify # cores for ApplicationMaster
Based on top of changes in https://github.com/apache/spark/pull/3806.

https://issues.apache.org/jira/browse/SPARK-1507

`--driver-cores` and `spark.driver.cores` for all cluster modes and `spark.yarn.am.cores` for yarn client mode.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>

Closes #4018 from WangTaoTheTonic/SPARK-1507 and squashes the following commits:

01419d3 [WangTaoTheTonic] amend the args name
b255795 [WangTaoTheTonic] indet thing
d86557c [WangTaoTheTonic] some comments amend
43c9392 [WangTao] fix compile error
b39a100 [WangTao] specify # cores for ApplicationMaster
2015-01-16 09:16:56 -08:00
WangTaoTheTonic 9dea64e53a [SPARK-4697][YARN]System properties should override environment variables
I found some arguments in yarn module take environment variables before system properties while the latter override the former in core module.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>

Closes #3557 from WangTaoTheTonic/SPARK4697 and squashes the following commits:

836b9ef [WangTaoTheTonic] fix type mismatch
e3e486a [WangTaoTheTonic] remove the comma
1262d57 [WangTaoTheTonic] handle spark.app.name and SPARK_YARN_APP_NAME in SparkSubmitArguments
bee9447 [WangTaoTheTonic] wrong brace
81833bb [WangTaoTheTonic] rebase
40934b4 [WangTaoTheTonic] just switch blocks
5f43f45 [WangTao] System property can override environment variable
2015-01-13 09:50:14 -08:00
WangTaoTheTonic f7741a9a72 [SPARK-5006][Deploy]spark.port.maxRetries doesn't work
https://issues.apache.org/jira/browse/SPARK-5006

I think the issue is produced in https://github.com/apache/spark/pull/1777.

Not digging mesos's backend yet. Maybe should add same logic either.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>

Closes #3841 from WangTaoTheTonic/SPARK-5006 and squashes the following commits:

8cdf96d [WangTao] indent thing
2d86d65 [WangTaoTheTonic] fix line length
7cdfd98 [WangTaoTheTonic] fit for new HttpServer constructor
61a370d [WangTaoTheTonic] some minor fixes
bc6e1ec [WangTaoTheTonic] rebase
67bcb46 [WangTaoTheTonic] put conf at 3rd position, modify suite class, add comments
f450cd1 [WangTaoTheTonic] startServiceOnPort will use a SparkConf arg
29b751b [WangTaoTheTonic] rebase as ExecutorRunnableUtil changed to ExecutorRunnable
396c226 [WangTaoTheTonic] make the grammar more like scala
191face [WangTaoTheTonic] invalid value name
62ec336 [WangTaoTheTonic] spark.port.maxRetries doesn't work
2015-01-13 09:29:25 -08:00
WangTaoTheTonic e966452060 [SPARK-1953][YARN]yarn client mode Application Master memory size is same as driver memory...
... size

Ways to set Application Master's memory on yarn-client mode:
1.  `spark.yarn.am.memory` in SparkConf or System Properties
2.  default value 512m

Note: this arguments is only available in yarn-client mode.

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following commits:

d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes
6c1b264 [WangTaoTheTonic] rebase
b8410c0 [WangTaoTheTonic] minor optiminzation
ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some improvements
3bf70cc [WangTaoTheTonic] rebase and give proper hint
987b99d [WangTaoTheTonic] disable --driver-memory in client mode
2b27928 [WangTaoTheTonic] inaccurate description
b7acbb2 [WangTaoTheTonic] incorrect method invoked
2557c5e [WangTaoTheTonic] missing a single blank
42075b0 [WangTaoTheTonic] arrange the args and warn logging
69c7dba [WangTaoTheTonic] rebase
1960d16 [WangTaoTheTonic] fix wrong comment
7fa9e2e [WangTaoTheTonic] log a warning
f6bee0e [WangTaoTheTonic] docs issue
d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181
b09c309 [WangTaoTheTonic] use code format
ab16bb5 [WangTaoTheTonic] fix bug and add comments
44e48c2 [WangTaoTheTonic] minor fix
6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs
0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size is same as driver memory size
2015-01-09 13:23:13 -08:00
WangTaoTheTonic f3da4bd728 [SPARK-5169][YARN]fetch the correct max attempts
Soryy for fetching the wrong max attempts in this commit 8fdd48959c.
We need to fix it now.

tgravescs

If we set an spark.yarn.maxAppAttempts which is larger than `yarn.resourcemanager.am.max-attempts` in yarn side, it will be overrided as described here:
>The maximum number of application attempts. It's a global setting for all application masters. Each application master can specify its individual maximum number of application attempts via the API, but the individual number cannot be more than the global upper bound. If it is, the resourcemanager will override it. The default number is set to 2, to allow at least one retry for AM.

http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #3942 from WangTaoTheTonic/HOTFIX and squashes the following commits:

9ac16ce [WangTaoTheTonic] fetch the correct max attempts
2015-01-09 08:10:09 -06:00
Marcelo Vanzin 48cecf673c [SPARK-4048] Enhance and extend hadoop-provided profile.
This change does a few things to make the hadoop-provided profile more useful:

- Create new profiles for other libraries / services that might be provided by the infrastructure
- Simplify and fix the poms so that the profiles are only activated while building assemblies.
- Fix tests so that they're able to run when the profiles are activated
- Add a new env variable to be used by distributions that use these profiles to provide the runtime
  classpath for Spark jobs and daemons.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:

82eb688 [Marcelo Vanzin] Add a comment.
eb228c0 [Marcelo Vanzin] Fix borked merge.
4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
371ebee [Marcelo Vanzin] Review feedback.
52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
322f882 [Marcelo Vanzin] Fix merge fail.
f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9640503 [Marcelo Vanzin] Cleanup child process log message.
115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
d1399ed [Marcelo Vanzin] Restore jetty dependency.
82a54b9 [Marcelo Vanzin] Remove unused profile.
5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
2015-01-08 17:15:13 -08:00
Sandy Ryza 8d45834deb SPARK-5087. [YARN] Merge yarn.Client and yarn.ClientBase
Author: Sandy Ryza <sandy@cloudera.com>

Closes #3896 from sryza/sandy-spark-5087 and squashes the following commits:

65611d0 [Sandy Ryza] Review feedback
3294176 [Sandy Ryza] SPARK-5087. [YARN] Merge yarn.Client and yarn.ClientBase
2015-01-08 09:25:43 -08:00
WangTaoTheTonic 8fdd48959c [SPARK-2165][YARN]add support for setting maxAppAttempts in the ApplicationSubmissionContext
...xt

https://issues.apache.org/jira/browse/SPARK-2165

I still have 2 questions:
* If this config is not set, we should use yarn's corresponding value or a default value(like 2) on spark side?
* Is the config name best? Or "spark.yarn.am.maxAttempts"?

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #3878 from WangTaoTheTonic/SPARK-2165 and squashes the following commits:

1416c83 [WangTaoTheTonic] use the name spark.yarn.maxAppAttempts
202ac85 [WangTaoTheTonic] rephrase some
afdfc99 [WangTaoTheTonic] more detailed description
91562c6 [WangTaoTheTonic] add support for setting maxAppAttempts in the ApplicationSubmissionContext
2015-01-07 08:14:39 -06:00
huangzhaowei 5fde66163f [YARN][SPARK-4929] Bug fix: fix the yarn-client code to support HA
Nowadays, yarn-client will exit directly when the HA change happens no matter how many times the am should retry.
The reason may be that the default final status only considerred the sys.exit, and the yarn-client HA cann't benefit from this.
So we should distinct the default final status between client and cluster, because the SUCCEEDED status may cause the HA failed in client mode and UNDEFINED may cause the error reporter in cluster when using sys.exit.

Author: huangzhaowei <carlmartinmax@gmail.com>

Closes #3771 from SaintBacchus/YarnHA and squashes the following commits:

c02bfcc [huangzhaowei] Improve the comment of the funciton 'getDefaultFinalStatus'
0e69924 [huangzhaowei] Bug fix: fix the yarn-client code to support HA
2015-01-07 08:10:42 -06:00
Sean Owen 4cba6eb420 SPARK-4159 [CORE] Maven build doesn't run JUnit test suites
This PR:

- Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
- Tells `surefire` to test only Java tests
- Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.

For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.

Author: Sean Owen <sowen@cloudera.com>

Closes #3651 from srowen/SPARK-4159 and squashes the following commits:

2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
2015-01-06 12:02:08 -08:00
Kostas Sakellis 451546aa6d SPARK-4843 [YARN] Squash ExecutorRunnableUtil and ExecutorRunnable
ExecutorRunnableUtil is a parent of ExecutorRunnable because of the yarn-alpha and yarn-stable split. Now that yarn-alpha is gone, this commit squashes the unnecessary hierarchy. The methods from ExecutorRunnableUtil are added as private.

Author: Kostas Sakellis <kostas@cloudera.com>

Closes #3696 from ksakellis/kostas-spark-4843 and squashes the following commits:

486716f [Kostas Sakellis] Moved prepareEnvironment call to after yarnConf declaration
470e22e [Kostas Sakellis] Fixed indentation and renamed sparkConf variable
9b1b1c9 [Kostas Sakellis] SPARK-4843 [YARN] Squash ExecutorRunnableUtil and ExecutorRunnable
2015-01-05 23:26:33 -08:00
meiyoula 14fa87bdf4 [SPARK-4966][YARN]The MemoryOverhead value is setted not correctly
Author: meiyoula <1039320815@qq.com>

Closes #3797 from XuTingjun/MemoryOverhead and squashes the following commits:

5a780fc [meiyoula] Update ClientArguments.scala
2014-12-29 08:20:30 -06:00
Kousuke Saruta 199e59aacd [SPARK-4881][Minor] Use SparkConf#getBoolean instead of get().toBoolean
It's really a minor issue.

In ApplicationMaster, there is code like as follows.

    val preserveFiles = sparkConf.get("spark.yarn.preserve.staging.files", "false").toBoolean

I think, the code can be simplified like as follows.

    val preserveFiles = sparkConf.getBoolean("spark.yarn.preserve.staging.files", false)

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3733 from sarutak/SPARK-4881 and squashes the following commits:

1771430 [Kousuke Saruta] Modified the code like sparkConf.get(...).toBoolean to sparkConf.getBoolean(...)
c63daa0 [Kousuke Saruta] Simplified code
2014-12-23 19:14:34 -08:00
Andrew Or 27c5399f4d [SPARK-4730][YARN] Warn against deprecated YARN settings
See https://issues.apache.org/jira/browse/SPARK-4730.

Author: Andrew Or <andrew@databricks.com>

Closes #3590 from andrewor14/yarn-settings and squashes the following commits:

36e0753 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-settings
dcd1316 [Andrew Or] Warn against deprecated YARN settings
2014-12-23 14:28:36 -08:00
Sandy Ryza d62da642ac SPARK-4447. Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha
Author: Sandy Ryza <sandy@cloudera.com>

Closes #3652 from sryza/sandy-spark-4447 and squashes the following commits:

2791158 [Sandy Ryza] Review feedback
c23507b [Sandy Ryza] Strip margin from client arguments help string
18be7ba [Sandy Ryza] SPARK-4447
2014-12-22 12:23:43 -08:00
Sandy Ryza 253b72b56f SPARK-3779. yarn spark.yarn.applicationMaster.waitTries config should be...
... changed to a time period

Author: Sandy Ryza <sandy@cloudera.com>

Closes #3471 from sryza/sandy-spark-3779 and squashes the following commits:

20b9887 [Sandy Ryza] Deprecate old property
42b5df7 [Sandy Ryza] Review feedback
9a959a1 [Sandy Ryza] SPARK-3779. yarn spark.yarn.applicationMaster.waitTries config should be changed to a time period
2014-12-18 12:19:07 -06:00
Zhan Zhang 3b764699ff [SPARK-4461][YARN] pass extra java options to yarn application master
Currently, there is no way to pass yarn am specific java options. It cause some potential issues when reading classpath from hadoop configuration file. Hadoop configuration actually replace variables in its property with the system property passed in java options. How to specify the value depends on different hadoop distribution.

The new options are SPARK_YARN_JAVA_OPTS or spark.yarn.extraJavaOptions. I make it as spark global level, because typically we don't want user to specify this in their command line each time submitting spark job after it is setup in spark-defaults.conf.

In addition, with this new extra options enabled to be passed to AM, it provides more flexibility.

For example int the following valid mapred-site.xml file, we have the class path which specify values using system property. Hadoop can correctly handle it because it has java options passed in.

This is the example, currently spark will break due to hadoop.version is not passed in.
  <property>
    <name>mapreduce.application.classpath</name>
    <value>/etc/hadoop/${hadoop.version}/mapreduce/*</value>
  </property>

In the meantime, we cannot relies on  mapreduce.admin.map.child.java.opts in mapred-site.xml, because it has its own extra java options specified, which does not apply to Spark.

Author: Zhan Zhang <zhazhan@gmail.com>

Closes #3409 from zhzhan/Spark-4461 and squashes the following commits:

daec3d0 [Zhan Zhang] solve review comments
08f44a7 [Zhan Zhang] add warning in driver mode if spark.yarn.am.extraJavaOptions is configured
5a505d3 [Zhan Zhang] solve review comments
4ed43ad [Zhan Zhang] solve review comments
ad777ed [Zhan Zhang] Merge branch 'master' into Spark-4461
3e9e574 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
e3f9abe [Zhan Zhang] solve review comments
8963552 [Zhan Zhang] rebase
f8f6700 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
dea1692 [Zhan Zhang] change the option key name to client mode specific
90d5dff [Zhan Zhang] rebase
8ac9254 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
092a25f [Zhan Zhang] solve review comments
bc5a9ae [Zhan Zhang] solve review comments
782b014 [Zhan Zhang] add new configuration to docs/running-on-yarn.md and remove it from spark-defaults.conf.template
6faaa97 [Zhan Zhang] solve review comments
369863f [Zhan Zhang] clean up unnecessary var
733de9c [Zhan Zhang] Merge branch 'master' into Spark-4461
a68e7f0 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
864505a [Zhan Zhang] Add extra java options to be passed to Yarn application master
15830fc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
685d911 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
03ebad3 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark
46d9e3d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
ebb213a [Zhan Zhang] revert
b983ef3 [Zhan Zhang] test
c4efb9b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
779d67b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
4daae6d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
12e1be5 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
ce0ca7b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
93f3081 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
3764505 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
a9d372b [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark
a00f60f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
f6a8a40 [Zhan Zhang] revert
ba14f28 [Zhan Zhang] test
2014-12-18 10:01:46 -06:00
Sandy Ryza 912563aa35 SPARK-4338. [YARN] Ditch yarn-alpha.
Sorry if this is a little premature with 1.2 still not out the door, but it will make other work like SPARK-4136 and SPARK-2089 a lot easier.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #3215 from sryza/sandy-spark-4338 and squashes the following commits:

1c5ac08 [Sandy Ryza] Update building Spark docs and remove unnecessary newline
9c1421c [Sandy Ryza] SPARK-4338. Ditch yarn-alpha.
2014-12-09 11:02:43 -08:00
Andrew Or fd8525334c Revert "SPARK-2624 add datanucleus jars to the container in yarn-cluster"
This reverts commit a975dc3279.
2014-12-04 21:53:49 -08:00
Andrew Or 87437df036 Revert "[HOT FIX] [YARN] Check whether /lib exists before listing its files"
This reverts commit 90ec643e9a.
2014-12-04 21:53:38 -08:00
WangTaoTheTonic 8106b1e36b [SPARK-4253] Ignore spark.driver.host in yarn-cluster and standalone-cluster modes
In yarn-cluster and standalone-cluster modes, we don't know where driver will run until it is launched.  If the `spark.driver.host` property is set on the submitting machine and propagated to the driver through SparkConf then this will lead to errors when the driver launches.

This patch fixes this issue by dropping the `spark.driver.host` property in SparkSubmit when running in a cluster deploy mode.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>

Closes #3112 from WangTaoTheTonic/SPARK4253 and squashes the following commits:

ed1a25c [WangTaoTheTonic] revert unrelated formatting issue
02c4e49 [WangTao] add comment
32a3f3f [WangTaoTheTonic] ingore it in SparkSubmit instead of SparkContext
667cf24 [WangTaoTheTonic] document fix
ff8d5f7 [WangTaoTheTonic] also ignore it in standalone cluster mode
2286e6b [WangTao] ignore spark.driver.host in yarn-cluster mode
2014-12-04 11:53:23 -08:00
Andrew Or 90ec643e9a [HOT FIX] [YARN] Check whether /lib exists before listing its files
This is caused by a975dc3279

Author: Andrew Or <andrew@databricks.com>

Closes #3589 from andrewor14/yarn-hot-fix and squashes the following commits:

a4fad5f [Andrew Or] Check whether lib directory exists before listing its files
2014-12-03 13:56:23 -08:00
Jim Lim a975dc3279 SPARK-2624 add datanucleus jars to the container in yarn-cluster
If `spark-submit` finds the datanucleus jars, it adds them to the driver's classpath, but does not add it to the container.

This patch modifies the yarn deployment class to copy all `datanucleus-*` jars found in `[spark-home]/libs` to the container.

Author: Jim Lim <jim@quixey.com>

Closes #3238 from jimjh/SPARK-2624 and squashes the following commits:

3633071 [Jim Lim] SPARK-2624 update documentation and comments
fe95125 [Jim Lim] SPARK-2624 keep java imports together
6c31fe0 [Jim Lim] SPARK-2624 update documentation
6690fbf [Jim Lim] SPARK-2624 add tests
d28d8e9 [Jim Lim] SPARK-2624 add spark.yarn.datanucleus.dir option
84e6cba [Jim Lim] SPARK-2624 add datanucleus jars to the container in yarn-cluster
2014-12-03 11:16:29 -08:00
Marcelo Vanzin 915f8eeb3a [SPARK-4584] [yarn] Remove security manager from Yarn AM.
The security manager adds a lot of overhead to the runtime of the
app, and causes a severe performance regression. Even stubbing out
all unneeded methods (all except checkExit()) does not help.

So, instead, penalize users who do an explicit System.exit() by leaving
them in "undefined behavior" territory: if they do that, the Yarn
backend won't be able to report the final app status to the RM.
The result is that the final status of the application might not match
the user's expectations.

One side-effect of the change is that users who do an explicit
System.exit() will lose the AM retry functionality. Since there is
no way to know if the exit was because of success or failure, the
AM right now errs on the side of it being a successful exit.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #3484 from vanzin/SPARK-4584 and squashes the following commits:

21f2502 [Marcelo Vanzin] Do not retry apps that use System.exit().
4198b3b [Marcelo Vanzin] [SPARK-4584] [yarn] Remove security manager from Yarn AM.
2014-11-28 15:16:05 -05:00
Marcelo Vanzin 397d3aae5b Bumping version to 1.3.0-SNAPSHOT.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #3277 from vanzin/version-1.3 and squashes the following commits:

7c3c396 [Marcelo Vanzin] Added temp repo to sbt build.
5f404ff [Marcelo Vanzin] Add another exclusion.
19457e7 [Marcelo Vanzin] Update old version to 1.2, add temporary 1.2 repo.
3c8d705 [Marcelo Vanzin] Workaround for MIMA checks.
e940810 [Marcelo Vanzin] Bumping version to 1.3.0-SNAPSHOT.
2014-11-18 21:24:18 -08:00
Kousuke Saruta 7f3718842c [SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile
In YarnClientSchedulerBackend, a variable "stopping" is used as a flag and it's accessed by some threads so it should be volatile.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3143 from sarutak/stopping-flag-volatile and squashes the following commits:

58fdcc9 [Kousuke Saruta] Marked stoppig flag as volatile
2014-11-11 12:33:53 -06:00
Andrew Or 96136f222a [SPARK-3797] Minor addendum to Yarn shuffle service
I did not realize there was a `network.util.JavaUtils` when I wrote this code. This PR moves the `ByteBuffer` string conversion to the appropriate place. I tested the changes on a stable yarn cluster.

Author: Andrew Or <andrew@databricks.com>

Closes #3144 from andrewor14/yarn-shuffle-util and squashes the following commits:

b6c08bf [Andrew Or] Remove unused import
94e205c [Andrew Or] Use netty Unpooled
85202a5 [Andrew Or] Use guava Charsets
057135b [Andrew Or] Reword comment
adf186d [Andrew Or] Move byte buffer String conversion logic to JavaUtils
2014-11-06 17:18:49 -08:00
Andrew Or 61a5cced04 [SPARK-3797] Run external shuffle service in Yarn NM
This creates a new module `network/yarn` that depends on `network/shuffle` recently created in #3001. This PR introduces a custom Yarn auxiliary service that runs the external shuffle service. As of the changes here this shuffle service is required for using dynamic allocation with Spark.

This is still WIP mainly because it doesn't handle security yet. I have tested this on a stable Yarn cluster.

Author: Andrew Or <andrew@databricks.com>

Closes #3082 from andrewor14/yarn-shuffle-service and squashes the following commits:

ef3ddae [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
0ee67a2 [Andrew Or] Minor wording suggestions
1c66046 [Andrew Or] Remove unused provided dependencies
0eb6233 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
6489db5 [Andrew Or] Try catch at the right places
7b71d8f [Andrew Or] Add detailed java docs + reword a few comments
d1124e4 [Andrew Or] Add security to shuffle service (INCOMPLETE)
5f8a96f [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
9b6e058 [Andrew Or] Address various feedback
f48b20c [Andrew Or] Fix tests again
f39daa6 [Andrew Or] Do not make network-yarn an assembly module
761f58a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
15a5b37 [Andrew Or] Fix build for Hadoop 1.x
baff916 [Andrew Or] Fix tests
5bf9b7e [Andrew Or] Address a few minor comments
5b419b8 [Andrew Or] Add missing license header
804e7ff [Andrew Or] Include the Yarn shuffle service jar in the distribution
cd076a4 [Andrew Or] Require external shuffle service for dynamic allocation
ea764e0 [Andrew Or] Connect to Yarn shuffle service only if it's enabled
1bf5109 [Andrew Or] Use the shuffle service port specified through hadoop config
b4b1f0c [Andrew Or] 4 tabs -> 2 tabs
43dcb96 [Andrew Or] First cut integration of shuffle service with Yarn aux service
b54a0c4 [Andrew Or] Initial skeleton for Yarn shuffle service
2014-11-05 15:42:05 -08:00
andrewor14 087e31a7a0 [HOT FIX] Yarn stable tests don't compile
This is caused by this commit: acd4ac7c9a

Author: andrewor14 <andrew@databricks.com>
Author: Andrew Or <andrew@databricks.com>

Closes #3041 from andrewor14/yarn-hot-fix and squashes the following commits:

e5deba1 [andrewor14] Add new line at the end (minor)
aa998e8 [Andrew Or] Compilation hot fix
2014-10-31 14:36:55 -07:00
Sandy Ryza acd4ac7c9a SPARK-3837. Warn when YARN kills containers for exceeding memory limits
I triggered the issue and verified the message gets printed on a pseudo-distributed cluster.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #2744 from sryza/sandy-spark-3837 and squashes the following commits:

858a268 [Sandy Ryza] Review feedback
c937f00 [Sandy Ryza] SPARK-3837. Warn when YARN kills containers for exceeding memory limits
2014-10-31 08:43:06 -05:00
Andrew Or 5231a3f228 [Minor] A few typos in comments and log messages
Author: Andrew Or <andrewor14@gmail.com>
Author: Andrew Or <andrew@databricks.com>

Closes #3021 from andrewor14/typos and squashes the following commits:

daaf417 [Andrew Or] Merge branch 'master' of github.com:apache/spark into typos
4838ae4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into typos
026d426 [Andrew Or] Merge branch 'master' of github.com:andrewor14/spark into typos
a81ae8f [Andrew Or] Some typos
2014-10-30 15:32:11 -07:00
Andrew Or 26f092d4e3 [SPARK-4138][SPARK-4139] Improve dynamic allocation settings
This should be merged after #2746 (SPARK-3795).

**SPARK-4138**. If the user sets both the number of executors and `spark.dynamicAllocation.enabled`, we should throw an exception.

**SPARK-4139**. If the user sets `spark.dynamicAllocation.enabled`, we should use the max number of executors as the starting number of executors because the first job is likely to run immediately after application startup. If the latter is not set, throw an exception.

Author: Andrew Or <andrew@databricks.com>

Closes #3002 from andrewor14/yarn-set-executors and squashes the following commits:

c528fce [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-set-executors
55d4699 [Andrew Or] Bug fix: `isDynamicAllocationEnabled` was always false
2b0ccec [Andrew Or] Start the number of executors at the max
022bfde [Andrew Or] Guard against incompatible settings of number of executors
2014-10-30 15:31:23 -07:00