In SPARK-1946(PR #900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.
Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready.
Author: li-zhihui <zhihui.li@intel.com>
Author: Li Zhihui <zhihui.li@intel.com>
Closes#1525 from li-zhihui/fixre4s and squashes the following commits:
e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes
abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes
ca54bd9 [li-zhihui] Format log with String interpolation
88c7dc6 [li-zhihui] Few codes and docs refactor
41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode
Note that this also documents spark.executorEnv.* which to me means its public. If we don't want that please speak up.
Author: Thomas Graves <tgraves@apache.org>
Closes#1512 from tgravescs/SPARK-1680 and squashes the following commits:
11525df [Thomas Graves] more doc changes
553bad0 [Thomas Graves] fix documentation
152bf7c [Thomas Graves] fix docs
5382326 [Thomas Graves] try fix docs
32f86a4 [Thomas Graves] use configs for specifying environment variables on YARN
It was easier to combine these 2 jira since they touch many of the same places. This pr adds the following:
- adds modify acls
- adds admin acls (list of admins/users that get added to both view and modify acls)
- modify Kill button on UI to take modify acls into account
- changes config name of spark.ui.acls.enable to spark.acls.enable since I choose poorly in original name. We keep backwards compatibility so people can still use spark.ui.acls.enable. The acls should apply to any web ui as well as any CLI interfaces.
- send view and modify acls information on to YARN so that YARN interfaces can use (yarn cli for killing applications for example).
Author: Thomas Graves <tgraves@apache.org>
Closes#1196 from tgravescs/SPARK-1890 and squashes the following commits:
8292eb1 [Thomas Graves] review comments
b92ec89 [Thomas Graves] remove unneeded variable from applistener
4c765f4 [Thomas Graves] Add in admin acls
72eb0ac [Thomas Graves] Add modify acls
Add a config (spark.yarn.access.namenodes) to allow applications running on yarn to access other secure HDFS cluster. User just specifies the namenodes of the other clusters and we get Tokens for those and ship them with the spark application.
Author: Thomas Graves <tgraves@apache.org>
Closes#1159 from tgravescs/spark-1528 and squashes the following commits:
ddbcd16 [Thomas Graves] review comments
0ac8501 [Thomas Graves] SPARK-1528 - add support for accessing remote HDFS
"ERROR yarn.Client: Required AM memory (1024) is above the max threshold (1048) of this cluster" appears if this code is not changed. obviously, 1024 is less than 1048, so change this
Author: derek ma <maji3@asiainfo-linkage.com>
Closes#1494 from maji2014/master and squashes the following commits:
b0f6640 [derek ma] Required AM memory is "amMem", not "args.amMemory"
JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes#1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
(This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes#1600 from liancheng/jdbc and squashes the following commits:
ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
1083e9d [Cheng Lian] Fixed failed test suites
7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
9cc0f06 [Cheng Lian] Starts beeline with spark-submit
cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
061880f [Cheng Lian] Addressed all comments by @pwendell
7755062 [Cheng Lian] Adapts test suites to spark-submit settings
40bafef [Cheng Lian] Fixed more license header issues
e214aab [Cheng Lian] Added missing license headers
b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
This reverts commit 06dc0d2c6b.
#1399 is making Jenkins fail. We should investigate and put this back after its passing tests.
Author: Michael Armbrust <michael@databricks.com>
Closes#1594 from marmbrus/revertJDBC and squashes the following commits:
59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
JIRA issue:
- Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
- Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
(Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
TODO
- [x] Use `spark-submit` to launch the server, the CLI and beeline
- [x] Migration guideline draft for Shark users
----
Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
```bash
$ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
```
This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
**UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes#1399 from liancheng/thriftserver and squashes the following commits:
090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
1083e9d [Cheng Lian] Fixed failed test suites
7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
9cc0f06 [Cheng Lian] Starts beeline with spark-submit
cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
061880f [Cheng Lian] Addressed all comments by @pwendell
7755062 [Cheng Lian] Adapts test suites to spark-submit settings
40bafef [Cheng Lian] Fixed more license header issues
e214aab [Cheng Lian] Added missing license headers
b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
...rce manager UI
Use the event logger directory to provide a direct link to finished
application UI in yarn resourcemanager UI.
Author: Rahul Singhal <rahul.singhal@guavus.com>
Closes#1094 from rahulsinghaliitd/SPARK-2150 and squashes the following commits:
95f230c [Rahul Singhal] SPARK-2150: Provide direct link to finished application UI in yarn resource manager UI
Opting to the option 2 defined in SPARK-2577, i.e., retrieve and pass the correct file system object to addResource.
Author: Gera Shegalov <gera@twitter.com>
Closes#1483 from gerashegalov/master and squashes the following commits:
90c9087 [Gera Shegalov] [YARN] SPARK-2577: File upload to viewfs is broken due to mount point resolution
Author: Sandy Ryza <sandy@cloudera.com>
Closes#634 from sryza/sandy-spark-1707 and squashes the following commits:
2f6e358 [Sandy Ryza] Default min registered executors ratio to .8 for YARN
354c630 [Sandy Ryza] Remove outdated comments
c744ef3 [Sandy Ryza] Take out waitForInitialAllocations
2a4329b [Sandy Ryza] SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler
Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality.
A simple solution is sleeping few seconds in application, so that executors have enough time to register.
The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered.
\# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0
spark.scheduler.minRegisteredExecutorsRatio = 0.8
\# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000
spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000
Author: li-zhihui <zhihui.li@intel.com>
Closes#900 from li-zhihui/master and squashes the following commits:
b9f8326 [li-zhihui] Add logs & edit docs
1ac08b1 [li-zhihui] Add new configs to user docs
22ead12 [li-zhihui] Move waitBackendReady to postStartHook
c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS
4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor
0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks
4261454 [li-zhihui] Add docs for new configs & code style
ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime
6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha
812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode
e7b6272 [li-zhihui] support yarn-cluster
37f7dc2 [li-zhihui] support yarn mode(percentage style)
3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
Patch introduces the new way of working also retaining the existing ways of doing things.
For example build instruction for yarn in maven is
`mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
in sbt it can become
`MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
Also supports
`sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>
Closes#772 from ScrapCodes/sbt-maven and squashes the following commits:
a8ac951 [Prashant Sharma] Updated sbt version.
62b09bb [Prashant Sharma] Improvements.
fa6221d [Prashant Sharma] Excluding sql from mima
4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
72651ca [Prashant Sharma] Addresses code reivew comments.
acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
ac4312c [Prashant Sharma] Revert "minor fix"
6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
446768e [Prashant Sharma] minor fix
89b9777 [Prashant Sharma] Merge conflicts
d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
dccc8ac [Prashant Sharma] updated mima to check against 1.0
a49c61b [Prashant Sharma] Fix for tools jar
a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
cf88758 [Prashant Sharma] cleanup
9439ea3 [Prashant Sharma] Small fix to run-examples script.
96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
4973dbd [Patrick Wendell] Example build using pom reader.
Author: Reynold Xin <rxin@apache.org>
Closes#1260 from rxin/signalhandler1 and squashes the following commits:
8e73552 [Reynold Xin] Uh add Logging back in ApplicationMaster.
0402ba8 [Reynold Xin] Synchronize SignalLogger.register.
dc70705 [Reynold Xin] Added SignalLogger to YARN ApplicationMaster.
79a21b4 [Reynold Xin] Added license header.
0da052c [Reynold Xin] Added the SignalLogger itself.
e587d2e [Reynold Xin] [SPARK-2318] When exiting on a signal, print the signal name first.
spark.worker.instances was added as part of this commit: 1617816090
My understanding is that SPARK_WORKER_INSTANCES is supported for backwards compatibility,
but spark.worker.instances is never used (SparkSubmit.scala sets spark.executor.instances) so should
not have been added.
@sryza @pwendell @tgravescs LMK if I'm understanding this correctly
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes#1214 from kayousterhout/yarn_config and squashes the following commits:
3d7c491 [Kay Ousterhout] Remove use of spark.worker.instances
Recent changes ignored the fact that path may be defined with "local:"
URIs, which means they need to be explicitly added to the classpath
everywhere a remote process is started. This change fixes that by:
- Using the correct methods to add paths to the classpath
- Creating SparkConf settings for the Spark jar itself and for the
user's jar
- Propagating those two settings to the remote processes where needed
This ensures that both in client and in cluster mode, the driver has
the necessary info to build the executor's classpath and have things
still work when they contain "local:" references.
The change also fixes some confusion in ClientBase about whether
to use SparkConf or system properties to propagate config options to
the driver and executors, by standardizing on using data held by
SparkConf.
On the cleanup front, I removed the hacky way that log4j configuration
was being propagated to handle the "local:" case. It's much more cleanly
(and generically) handled by using spark-submit arguments (--files to
upload a config file, or setting spark.executor.extraJavaOptions to pass
JVM arguments and use a local file).
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#560 from vanzin/yarn-local-2 and squashes the following commits:
4e7f066 [Marcelo Vanzin] Correctly propagate SPARK_JAVA_OPTS to driver/executor.
6a454ea [Marcelo Vanzin] Use constants for PWD in test.
6dd5943 [Marcelo Vanzin] Fix propagation of config options to driver / executor.
b2e377f [Marcelo Vanzin] Review feedback.
93c3f85 [Marcelo Vanzin] Fix ClassCastException in test.
e5c682d [Marcelo Vanzin] Fix cluster mode, restore SPARK_LOG4J_CONF.
1dfbb40 [Marcelo Vanzin] Add documentation for spark.yarn.jar.
bbdce05 [Marcelo Vanzin] [SPARK-1395] Fix "local:" URI support in Yarn mode (again).
All the changes is in the package of "org.apache.spark.deploy.yarn":
1) Throw exception in ClinetArguments and ClientBase instead of exit directly.
2) in Client's main method, if exception is caught, it will exit with code 1, otherwise exit with code 0.
After the fix, if user integrate the spark yarn client into their applications, when the argument is wrong or the running is finished, the application won't be terminated.
Author: John Zhao <jzhao@alpinenow.com>
Closes#490 from codeboyyong/jira_1516_systemexit_inyarnclient and squashes the following commits:
138cb48 [John Zhao] [SPARK-1516]Throw exception in yarn clinet instead of run system.exit directly. All the changes is in the package of "org.apache.spark.deploy.yarn": 1) Add a ClientException with an exitCode 2) Throws exception in ClinetArguments and ClientBase instead of exit directly 3) in Client's main method, catch exception and exit with the exitCode.
Yarn client mode was not setting the app's tracking URL to the
History Server's URL when configured by the user. Now client mode
behaves the same as cluster mode.
In SparkContext.scala, the "user.name" system property had precedence
over the SPARK_USER environment variable. This means that SPARK_USER
was never used, since "user.name" is always set by the JVM. In Yarn
cluster mode, this means the application always reported itself as
being run by user "yarn" (or whatever user was running the Yarn NM).
One could argue that the correct fix would be to use UGI.getCurrentUser()
here, but at least for Yarn that will match what SPARK_USER is set
to.
Author: Marcelo Vanzin <vanzin@cloudera.com>
This patch had conflicts when merged, resolved by
Committer: Thomas Graves <tgraves@apache.org>
Closes#1002 from vanzin/yarn-client-url and squashes the following commits:
4046e04 [Marcelo Vanzin] Set HS link in yarn-alpha also.
4c692d9 [Marcelo Vanzin] Yarn: report HS URL in client mode, correct user in cluster mode.
This contains a bunch of small tidyings of the Spark on YARN code.
I focused on the yarn stable code. @tgravescs, let me know if you'd like me to make these for the alpha code as well.
Author: Sandy Ryza <sandy@cloudera.com>
Closes#561 from sryza/sandy-spark-1639 and squashes the following commits:
72b6a02 [Sandy Ryza] Fix comment and set name on driver thread
c2190b2 [Sandy Ryza] SPARK-1639. Tidy up some Spark on YARN code
Author: witgo <witgo@qq.com>
Closes#921 from witgo/allocateExecutors and squashes the following commits:
bc3aa66 [witgo] review commit
8800eba [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
32ac7af [witgo] review commit
056b8c7 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
04c6f7e [witgo] Merge branch 'master' into allocateExecutors
aff827c [witgo] review commit
5c376e0 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
1faf4f4 [witgo] Merge branch 'master' into allocateExecutors
3c464bd [witgo] add time limit to allocateExecutors
e00b656 [witgo] In some cases, yarn does not automatically restart the container
Author: DB Tsai <dbtsai@dbtsai.com>
Closes#1027 from dbtsai/dbtsai-classloader and squashes the following commits:
9ac6be3 [DB Tsai] Fixed line too long
c9c7ad7 [DB Tsai] Make sure that empty string is filtered out when we get the secondary jars from conf.
The current implementation of ClientBase.getDefaultYarnApplicationClasspath inspects
the MRJobConfig class for the field DEFAULT_YARN_APPLICATION_CLASSPATH when it should
be really looking into YarnConfiguration. If the Application Configuration has no
yarn.application.classpath defined a NPE exception will be thrown.
Additional Changes include:
* Test Suite for ClientBase added
[ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522
Author : bernardo.gomezpalacio@gmail.com
Testing : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test
Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com>
Closes#433 from berngp/feature/SPARK-1522 and squashes the following commits:
2c2e118 [Bernardo Gomez Palacio] [SPARK-1522]: YARN ClientBase throws a NPE if there is no YARN Application specific CP
https://issues.apache.org/jira/browse/SPARK-1898
Author: Colin Patrick McCabe <cmccabe@cloudera.com>
Closes#850 from cmccabe/master and squashes the following commits:
d66eddc [Colin Patrick McCabe] SPARK-1898: In deploy.yarn.Client, use YarnClient rather than YarnClientImpl
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes#974 from ueshin/issues/SPARK-2029 and squashes the following commits:
e19e8f4 [Takuya UESHIN] Bump version number to 1.1.0-SNAPSHOT.
Author: witgo <witgo@qq.com>
Closes#786 from witgo/maven_plugin and squashes the following commits:
5de86a2 [witgo] Merge branch 'master' of https://github.com/apache/spark into maven_plugin
c35ef73 [witgo] Improve maven plugin configuration
Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0).
`spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing!
I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet.
CC: @dbtsai @sryza
Author: Xiangrui Meng <meng@databricks.com>
Closes#848 from mengxr/yarn-classpath and squashes the following commits:
23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods
a40f6ed [Xiangrui Meng] standalone -> cluster
65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client
11e5354 [Xiangrui Meng] minor changes
3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf
dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn
Author: Andrew Or <andrewor14@gmail.com>
Closes#847 from andrewor14/yarn-typo and squashes the following commits:
c1906af [Andrew Or] Stoped -> Stopped
SparkSubmit ignores `--jars` for YARN client. This is a bug.
This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either.
Tested on a YARN cluster.
Author: Andrew Or <andrewor14@gmail.com>
Closes#710 from andrewor14/yarn-jars and squashes the following commits:
35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar
c92c5bf [Andrew Or] Minor cleanups
269f9f3 [Andrew Or] Fix format
013d840 [Andrew Or] Fix tests
1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#539 from vanzin/yarn-app-name and squashes the following commits:
7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM.
Pass the configs as java options since the executor needs to know before it registers whether to create the connection using authentication or not. We could see about passing only the authentication configs but for now I just had it pass them all.
I also updating it to use a list to construct the command to make it the same as ClientBase and avoid any issues with spaces.
Author: Thomas Graves <tgraves@apache.org>
Closes#649 from tgravescs/SPARK-1569 and squashes the following commits:
0178ab8 [Thomas Graves] add akka settings
22a8735 [Thomas Graves] Change to only path spark.auth* configs
8ccc1d4 [Thomas Graves] SPARK-1569 Spark on Yarn, authentication broken
We use org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter in spark on yarn but are not included it in the assembly jar.
I tested this on yarn cluster by removing the yarn jars from the classpath and spark runs fine now.
Author: Thomas Graves <tgraves@apache.org>
Closes#406 from tgravescs/SPARK-1474 and squashes the following commits:
1548bf9 [Thomas Graves] SPARK-1474: Spark on yarn assembly doesn't include org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
This is a part of [PR 590](https://github.com/apache/spark/pull/590)
Author: witgo <witgo@qq.com>
Closes#626 from witgo/yarn_version and squashes the following commits:
c390631 [witgo] restore the yarn dependency declarations
f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha
2df6cf5 [witgo] review commit
a1d876a [witgo] review commit
20e7e3e [witgo] review commit
c76763b [witgo] The default value of yarn.version is equal to hadoop.version
Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user.
Note this hasn't been fully tested yet. Need to test in standalone mode.
Putting this up for people to look at and possibly test. I don't have access to a mesos cluster.
This is alternative to https://github.com/apache/spark/pull/607
Author: Thomas Graves <tgraves@apache.org>
Closes#621 from tgravescs/SPARK-1676 and squashes the following commits:
244d55a [Thomas Graves] fix line length
44163d4 [Thomas Graves] Rework
9398853 [Thomas Graves] change to have doAs in executor higher up.
Author: Sandy Ryza <sandy@cloudera.com>
Closes#586 from sryza/sandy-spark-1588 and squashes the following commits:
35eb38e [Sandy Ryza] Scalify
b361684 [Sandy Ryza] SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
This is a straightforward fix.
Author: Patrick Wendell <pwendell@gmail.com>
This patch had conflicts when merged, resolved by
Committer: Patrick Wendell <pwendell@gmail.com>
Closes#578 from pwendell/spark-submit-yarn and squashes the following commits:
96027c7 [Patrick Wendell] Test fixes
b5be173 [Patrick Wendell] Review feedback
4ac9cac [Patrick Wendell] SPARK-1652: spark-submit for yarn prints warnings even though calling as expected
Sorry folks. This should make the change for SPARK-1607 compile again. Verified this time with the yarn build enabled.
Author: Sean Owen <sowen@cloudera.com>
Closes#556 from srowen/SPARK-1607.2 and squashes the following commits:
e3fe7a3 [Sean Owen] Fix syntax adapting Int result to Short
Octal literals like "0700" are deprecated in Scala 2.10, generating a warning. They have been removed entirely in 2.11. See https://issues.scala-lang.org/browse/SI-7618
This change simply replaces two uses of octals with hex literals, which seemed the next-best representation since they express a bit mask (file permission in particular)
Author: Sean Owen <sowen@cloudera.com>
Closes#529 from srowen/SPARK-1607 and squashes the following commits:
1ee0e67 [Sean Owen] Use Integer.parseInt(...,8) for octal literal instead of hex equivalent
0102f3d [Sean Owen] Replace octal literals, removed in Scala 2.11, with hex literals
Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.
Author: Mridul Muralidharan <mridulm80@apache.org>
This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>
Closes#505 from mridulm/windows_fixes and squashes the following commits:
ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
3267f4b [Mridul Muralidharan] Fix build failures
35b277a [Mridul Muralidharan] Fix Scalastyle failures
bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
1337abd [Mridul Muralidharan] fix classpath while running in windows
In particular when the HADOOP_CONF_DIR is not not specified.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#488 from pwendell/hadoop-cleanup and squashes the following commits:
fe95f13 [Patrick Wendell] Changes based on Andrew's feeback
18d09c1 [Patrick Wendell] Review comments from Andrew
17929cc [Patrick Wendell] Assorted clean-up for Spark-on-YARN.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#483 from vanzin/yarn-2.4 and squashes the following commits:
0fc57d8 [Marcelo Vanzin] Fix compilation on Hadoop 2.4.x.
Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:
1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
3. Adds ability to set these same variables for the driver using `spark-submit`.
4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#299 from pwendell/config-cleanup and squashes the following commits:
127f301 [Patrick Wendell] Improvements to testing
a006464 [Patrick Wendell] Moving properties file template.
b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
0086939 [Patrick Wendell] Minor style fixes
af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
af0adf7 [Patrick Wendell] Automatically add user jar
a56b125 [Patrick Wendell] Responses to Tom's review
d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
a762901 [Patrick Wendell] Fixing test failures
ffa00fe [Patrick Wendell] Review feedback
fda0301 [Patrick Wendell] Note
308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
c2a2909 [Patrick Wendell] Test compile fixes
4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
b08893b [Patrick Wendell] Additional improvements.
ace4ead [Patrick Wendell] Responses to review feedback.
b72d183 [Patrick Wendell] Review feedback for spark env file
46555c1 [Patrick Wendell] Review feedback and import clean-ups
437aed1 [Patrick Wendell] Small fix
761ebcd [Patrick Wendell] Library path and classpath for drivers
7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
5b0ba8e [Patrick Wendell] Don't ship executor envs
84cc5e5 [Patrick Wendell] Small clean-up
1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
6eaf7d0 [Patrick Wendell] executorJavaOpts
0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
...finishes
Note this is dependent on https://github.com/apache/spark/pull/204 to have a working history server, but there are no code dependencies.
This also fixes SPARK-1288 yarn stable finishApplicationMaster incomplete. Since I was in there I made the diagnostic message be passed properly.
Author: Thomas Graves <tgraves@apache.org>
Closes#362 from tgravescs/SPARK-1408 and squashes the following commits:
ec89705 [Thomas Graves] Fix typo.
446122d [Thomas Graves] Make config yarn specific
f5d5373 [Thomas Graves] SPARK-1408 Modify Spark on Yarn to point to the history server when app finishes