Commit graph

153 commits

Author SHA1 Message Date
Xiangrui Meng dba314029b [SPARK-1870] Make spark-submit --jars work in yarn-cluster mode.
Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0).

`spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing!

I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet.

CC: @dbtsai @sryza

Author: Xiangrui Meng <meng@databricks.com>

Closes #848 from mengxr/yarn-classpath and squashes the following commits:

23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods
a40f6ed [Xiangrui Meng] standalone -> cluster
65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client
11e5354 [Xiangrui Meng] minor changes
3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf
dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn
2014-05-22 01:52:50 -07:00
Andrew Or ba5d4a9942 [Typo] Stoped -> Stopped
Author: Andrew Or <andrewor14@gmail.com>

Closes #847 from andrewor14/yarn-typo and squashes the following commits:

c1906af [Andrew Or] Stoped -> Stopped
2014-05-21 11:59:05 -07:00
Andrew Or 83e0424d87 [SPARK-1774] Respect SparkSubmit --jars on YARN (client)
SparkSubmit ignores `--jars` for YARN client. This is a bug.

This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either.

Tested on a YARN cluster.

Author: Andrew Or <andrewor14@gmail.com>

Closes #710 from andrewor14/yarn-jars and squashes the following commits:

35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar
c92c5bf [Andrew Or] Minor cleanups
269f9f3 [Andrew Or] Fix format
013d840 [Andrew Or] Fix tests
1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode
2014-05-10 20:58:02 -07:00
Marcelo Vanzin 3f779d872d [SPARK-1631] Correctly set the Yarn app name when launching the AM.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #539 from vanzin/yarn-app-name and squashes the following commits:

7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM.
2014-05-08 20:46:11 -07:00
Thomas Graves 4bec84b6a2 SPARK-1569 Spark on Yarn, authentication broken by pr299
Pass the configs as java options since the executor needs to know before it registers whether to create the connection using authentication or not.    We could see about passing only the authentication configs but for now I just had it pass them all.

I also updating it to use a list to construct the command to make it the same as ClientBase and avoid any issues with spaces.

Author: Thomas Graves <tgraves@apache.org>

Closes #649 from tgravescs/SPARK-1569 and squashes the following commits:

0178ab8 [Thomas Graves] add akka settings
22a8735 [Thomas Graves] Change to only path spark.auth* configs
8ccc1d4 [Thomas Graves] SPARK-1569 Spark on Yarn, authentication broken
2014-05-07 15:51:53 -07:00
Thomas Graves 1e829905c7 SPARK-1474: Spark on yarn assembly doesn't include AmIpFilter
We use org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter in spark on yarn but are not included it in the assembly jar.

I tested this on yarn cluster by removing the yarn jars from the classpath and spark runs fine now.

Author: Thomas Graves <tgraves@apache.org>

Closes #406 from tgravescs/SPARK-1474 and squashes the following commits:

1548bf9 [Thomas Graves] SPARK-1474: Spark on yarn assembly doesn't include org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2014-05-06 12:00:09 -07:00
witgo fb0543224b The default version of yarn is equal to the hadoop version
This is a part of [PR 590](https://github.com/apache/spark/pull/590)

Author: witgo <witgo@qq.com>

Closes #626 from witgo/yarn_version and squashes the following commits:

c390631 [witgo] restore  the yarn dependency declarations
f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha
2df6cf5 [witgo] review commit
a1d876a [witgo] review commit
20e7e3e [witgo] review commit
c76763b [witgo] The default value of yarn.version is equal to hadoop.version
2014-05-03 23:32:12 -07:00
Thomas Graves 3d0a02dff3 [WIP] SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak
Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user.

Note this hasn't been fully tested yet.  Need to test in standalone mode.

Putting this up for people to look at and possibly test.  I don't have access to a mesos cluster.

This is alternative to https://github.com/apache/spark/pull/607

Author: Thomas Graves <tgraves@apache.org>

Closes #621 from tgravescs/SPARK-1676 and squashes the following commits:

244d55a [Thomas Graves] fix line length
44163d4 [Thomas Graves] Rework
9398853 [Thomas Graves] change to have doAs in executor higher up.
2014-05-03 10:59:05 -07:00
Sandy Ryza bf8d0aa278 SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
Author: Sandy Ryza <sandy@cloudera.com>

Closes #586 from sryza/sandy-spark-1588 and squashes the following commits:

35eb38e [Sandy Ryza] Scalify
b361684 [Sandy Ryza] SPARK-1588.  Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
2014-04-29 12:54:02 -07:00
witgo 030f2c2126 Improved build configuration
1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x
2, Fix SPARK-1491: maven hadoop-provided profile fails to build
3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency
4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces)

Author: witgo <witgo@qq.com>

Closes #480 from witgo/format_pom and squashes the following commits:

03f652f [witgo] review commit
b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence
7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence
0da4bc3 [witgo] merge master
d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
e345919 [witgo] add avro dependency to yarn-alpha
77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency
1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
934f24d [witgo] review commit
cf46edc [witgo] exclude jruby
06e7328 [witgo] Merge branch 'SparkBuild' into format_pom
99464d2 [witgo] fix maven hadoop-provided profile fails to build
0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x
6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml
2014-04-28 22:51:46 -07:00
Patrick Wendell 9f7a095184 SPARK-1652: Remove incorrect deprecation warning in spark-submit
This is a straightforward fix.

Author: Patrick Wendell <pwendell@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Patrick Wendell <pwendell@gmail.com>

Closes #578 from pwendell/spark-submit-yarn and squashes the following commits:

96027c7 [Patrick Wendell] Test fixes
b5be173 [Patrick Wendell] Review feedback
4ac9cac [Patrick Wendell] SPARK-1652: spark-submit for yarn prints warnings even though calling as expected
2014-04-28 18:14:59 -07:00
Sean Owen df6d81425b SPARK-1607. HOTFIX: Fix syntax adapting Int result to Short
Sorry folks. This should make the change for SPARK-1607 compile again. Verified this time with the yarn build enabled.

Author: Sean Owen <sowen@cloudera.com>

Closes #556 from srowen/SPARK-1607.2 and squashes the following commits:

e3fe7a3 [Sean Owen] Fix syntax adapting Int result to Short
2014-04-25 14:17:38 -07:00
Sean Owen 6e101f1183 SPARK-1607. Replace octal literals, removed in Scala 2.11, with hex literals
Octal literals like "0700" are deprecated in Scala 2.10, generating a warning. They have been removed entirely in 2.11. See https://issues.scala-lang.org/browse/SI-7618

This change simply replaces two uses of octals with hex literals, which seemed the next-best representation since they express a bit mask (file permission in particular)

Author: Sean Owen <sowen@cloudera.com>

Closes #529 from srowen/SPARK-1607 and squashes the following commits:

1ee0e67 [Sean Owen] Use Integer.parseInt(...,8) for octal literal instead of hex equivalent
0102f3d [Sean Owen] Replace octal literals, removed in Scala 2.11, with hex literals
2014-04-24 23:34:00 -07:00
Mridul Muralidharan 968c0187a1 SPARK-1586 Windows build fixes
Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.

Author: Mridul Muralidharan <mridulm80@apache.org>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes #505 from mridulm/windows_fixes and squashes the following commits:

ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
3267f4b [Mridul Muralidharan] Fix build failures
35b277a [Mridul Muralidharan] Fix Scalastyle failures
bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
1337abd [Mridul Muralidharan] fix classpath while running in windows
2014-04-24 20:48:33 -07:00
Sandeep a03ac222d8 Fix Scala Style
Any comments are welcome

Author: Sandeep <sandeep@techaddict.me>

Closes #531 from techaddict/stylefix-1 and squashes the following commits:

7492730 [Sandeep] Pass 4
98b2428 [Sandeep] fix rxin suggestions
b5e2e6f [Sandeep] Pass 3
05932d7 [Sandeep] fix if else styling 2
08690e5 [Sandeep] fix if else styling
2014-04-24 15:07:23 -07:00
Patrick Wendell 995fdc96bc Assorted clean-up for Spark-on-YARN.
In particular when the HADOOP_CONF_DIR is not not specified.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #488 from pwendell/hadoop-cleanup and squashes the following commits:

fe95f13 [Patrick Wendell] Changes based on Andrew's feeback
18d09c1 [Patrick Wendell] Review comments from Andrew
17929cc [Patrick Wendell] Assorted clean-up for Spark-on-YARN.
2014-04-22 19:22:06 -07:00
Marcelo Vanzin 0ea0b1a2d6 Fix compilation on Hadoop 2.4.x.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #483 from vanzin/yarn-2.4 and squashes the following commits:

0fc57d8 [Marcelo Vanzin] Fix compilation on Hadoop 2.4.x.
2014-04-22 14:28:41 -07:00
Patrick Wendell fb98488fc8 Clean up and simplify Spark configuration
Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:

1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
3. Adds ability to set these same variables for the driver using `spark-submit`.
4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #299 from pwendell/config-cleanup and squashes the following commits:

127f301 [Patrick Wendell] Improvements to testing
a006464 [Patrick Wendell] Moving properties file template.
b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
0086939 [Patrick Wendell] Minor style fixes
af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
af0adf7 [Patrick Wendell] Automatically add user jar
a56b125 [Patrick Wendell] Responses to Tom's review
d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
a762901 [Patrick Wendell] Fixing test failures
ffa00fe [Patrick Wendell] Review feedback
fda0301 [Patrick Wendell] Note
308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
c2a2909 [Patrick Wendell] Test compile fixes
4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
b08893b [Patrick Wendell] Additional improvements.
ace4ead [Patrick Wendell] Responses to review feedback.
b72d183 [Patrick Wendell] Review feedback for spark env file
46555c1 [Patrick Wendell] Review feedback and import clean-ups
437aed1 [Patrick Wendell] Small fix
761ebcd [Patrick Wendell] Library path and classpath for drivers
7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
5b0ba8e [Patrick Wendell] Don't ship executor envs
84cc5e5 [Patrick Wendell] Small clean-up
1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
6eaf7d0 [Patrick Wendell] executorJavaOpts
0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
2014-04-21 10:26:33 -07:00
Thomas Graves 0058b5d2c7 SPARK-1408 Modify Spark on Yarn to point to the history server when app ...
...finishes

Note this is dependent on https://github.com/apache/spark/pull/204 to have a working history server, but there are no code dependencies.

This also fixes SPARK-1288 yarn stable finishApplicationMaster incomplete. Since I was in there I made the diagnostic message be passed properly.

Author: Thomas Graves <tgraves@apache.org>

Closes #362 from tgravescs/SPARK-1408 and squashes the following commits:

ec89705 [Thomas Graves] Fix typo.
446122d [Thomas Graves] Make config yarn specific
f5d5373 [Thomas Graves] SPARK-1408 Modify Spark on Yarn to point to the history server when app finishes
2014-04-17 16:36:37 -05:00
Marcelo Vanzin 69047506bf [SPARK-1395] Allow "local:" URIs to work on Yarn.
This only works for the three paths defined in the environment
(SPARK_JAR, SPARK_YARN_APP_JAR and SPARK_LOG4J_CONF).

Tested by running SparkPi with local: and file: URIs against Yarn cluster (no "upload" shows up in logs in the local case).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #303 from vanzin/yarn-local and squashes the following commits:

82219c1 [Marcelo Vanzin] [SPARK-1395] Allow "local:" URIs to work on Yarn.
2014-04-17 10:29:38 -05:00
xuan 725925cf21 SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 release
YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions.

Author: xuan <xuan@MacBook-Pro.local>
Author: xuan <xuan@macbook-pro.home>

Closes #396 from xgong/master and squashes the following commits:

42b5984 [xuan] Remove two extra imports
bc0926f [xuan] Remove usage of org.apache.hadoop.util.Shell
be89fa7 [xuan] fix Spark compilation is broken with the latest hadoop-2.4.0 release
2014-04-16 14:41:22 -05:00
Sean Owen 77f8367996 SPARK-1497. Fix scalastyle warnings in YARN, Hive code
(I wasn't sure how to automatically set `SPARK_YARN=true` and `SPARK_HIVE=true` when running scalastyle, but these are the errors that turn up.)

Author: Sean Owen <sowen@cloudera.com>

Closes #413 from srowen/SPARK-1497 and squashes the following commits:

f0c9318 [Sean Owen] Fix more scalastyle warnings in yarn
80bf4c3 [Sean Owen] Add YARN alpha / YARN profile to scalastyle check
026319c [Sean Owen] Fix scalastyle warnings in YARN, Hive code
2014-04-16 09:34:59 -07:00
Thomas Graves 446bb3417a SPARK-1417: Spark on Yarn - spark UI link from resourcemanager is broken
Author: Thomas Graves <tgraves@apache.org>

Closes #344 from tgravescs/SPARK-1417 and squashes the following commits:

c450b5f [Thomas Graves] fix test
e1c1d7e [Thomas Graves] add missing $ to appUIAddress
e982ddb [Thomas Graves] use appUIHostPort in appUIAddress
0803ec2 [Thomas Graves] Review comment updates - remove extra newline, simplify assert in test
658a8ec [Thomas Graves] Add a appUIHostPort routine
0614208 [Thomas Graves] Fix test
2a6b1b7 [Thomas Graves] SPARK-1417: Spark on Yarn - spark UI link from resourcemanager is broken
2014-04-11 13:17:48 +05:30
Henry Saputra 3bc054893b Remove extra semicolon in import statement and unused import in ApplicationMaster
Small nit cleanup to remove extra semicolon and unused import in Yarn's stable ApplicationMaster (it bothers me every time I saw it)

Author: Henry Saputra <hsaputra@apache.org>

Closes #358 from hsaputra/nitcleanup_removesemicolon_import_applicationmaster and squashes the following commits:

bffb685 [Henry Saputra] Remove extra semicolon in import statement and unused import in ApplicationMaster.scala
2014-04-08 14:23:16 -07:00
Sandy Ryza 9dd8b91662 SPARK-1252. On YARN, use container-log4j.properties for executors
container-log4j.properties is a file that YARN provides so that containers can have log4j.properties distinct from that of the NodeManagers.

Logs now go to syslog, and stderr and stdout just have the process's standard err and standard out.

I tested this on pseudo-distributed clusters for both yarn (Hadoop 2.2) and yarn-alpha (Hadoop 0.23.7)/

Author: Sandy Ryza <sandy@cloudera.com>

Closes #148 from sryza/sandy-spark-1252 and squashes the following commits:

c0043b8 [Sandy Ryza] Put log4j.properties file under common
55823da [Sandy Ryza] Add license headers to new files
10934b8 [Sandy Ryza] Add log4j-spark-container.properties and support SPARK_LOG4J_CONF
e74450b [Sandy Ryza] SPARK-1252. On YARN, use container-log4j.properties for executors
2014-04-07 13:28:14 -05:00
Sandy Ryza 7f32fd42aa SPARK-1350. Always use JAVA_HOME to run executor container JVMs.
Author: Sandy Ryza <sandy@cloudera.com>

Closes #313 from sryza/sandy-spark-1350 and squashes the following commits:

bb6d187 [Sandy Ryza] SPARK-1350. Always use JAVA_HOME to run executor container JVMs.
2014-04-04 08:54:04 -05:00
Sandy Ryza 564f1c137c SPARK-1376. In the yarn-cluster submitter, rename "args" option to "arg"
Author: Sandy Ryza <sandy@cloudera.com>

Closes #279 from sryza/sandy-spark-1376 and squashes the following commits:

d8aebfa [Sandy Ryza] SPARK-1376. In the yarn-cluster submitter, rename "args" option to "arg"
2014-04-01 08:26:31 +05:30
Patrick Wendell 841721e03c SPARK-1352: Improve robustness of spark-submit script
1. Better error messages when required arguments are missing.
2. Support for unit testing cases where presented arguments are invalid.
3. Bug fix: Only use environment varaibles when they are set (otherwise will cause NPE).
4. A verbose mode to aid debugging.
5. Visibility of several variables is set to private.
6. Deprecation warning for existing scripts.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #271 from pwendell/spark-submit and squashes the following commits:

9146def [Patrick Wendell] SPARK-1352: Improve robustness of spark-submit script
2014-03-31 12:07:14 -07:00
Sandy Ryza 1617816090 SPARK-1126. spark-app preliminary
This is a starting version of the spark-app script for running compiled binaries against Spark.  It still needs tests and some polish.  The only testing I've done so far has been using it to launch jobs in yarn-standalone mode against a pseudo-distributed cluster.

This leaves out the changes required for launching python scripts.  I think it might be best to save those for another JIRA/PR (while keeping to the design so that they won't require backwards-incompatible changes).

Author: Sandy Ryza <sandy@cloudera.com>

Closes #86 from sryza/sandy-spark-1126 and squashes the following commits:

d428d85 [Sandy Ryza] Commenting, doc, and import fixes from Patrick's comments
e7315c6 [Sandy Ryza] Fix failing tests
34de899 [Sandy Ryza] Change --more-jars to --jars and fix docs
299ddca [Sandy Ryza] Fix scalastyle
a94c627 [Sandy Ryza] Add newline at end of SparkSubmit
04bc4e2 [Sandy Ryza] SPARK-1126. spark-submit script
2014-03-29 14:41:36 -07:00
Prashant Sharma 60abc25254 SPARK-1096, a space after comment start style checker.
Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #124 from ScrapCodes/SPARK-1096/scalastyle-comment-check and squashes the following commits:

214135a [Prashant Sharma] Review feedback.
5eba88c [Prashant Sharma] Fixed style checks for ///+ comments.
e54b2f8 [Prashant Sharma] improved message, work around.
83e7144 [Prashant Sharma] removed dependency on scalastyle in plugin, since scalastyle sbt plugin already depends on the right version. Incase we update the plugin we will have to adjust our spark-style project to depend on right scalastyle version.
810a1d6 [Prashant Sharma] SPARK-1096, a space after comment style checker.
ba33193 [Prashant Sharma] scala style as a project
2014-03-28 00:21:49 -07:00
Tianshuo Deng 181b130a0c [bugfix] wrong client arg, should use executor-cores
client arg is wrong, it should be executor-cores. it causes executor fail to start when executor-cores is specified

Author: Tianshuo Deng <tdeng@twitter.com>

Closes #138 from tsdeng/bugfix_wrong_client_args and squashes the following commits:

304826d [Tianshuo Deng] wrong client arg, should use executor-cores
2014-03-13 20:27:36 -07:00
Sandy Ryza 698373211e SPARK-1183. Don't use "worker" to mean executor
Author: Sandy Ryza <sandy@cloudera.com>

Closes #120 from sryza/sandy-spark-1183 and squashes the following commits:

5066a4a [Sandy Ryza] Remove "worker" in a couple comments
0bd1e46 [Sandy Ryza] Remove --am-class from usage
bfc8fe0 [Sandy Ryza] Remove am-class from doc and fix yarn-alpha
607539f [Sandy Ryza] Address review comments
74d087a [Sandy Ryza] SPARK-1183. Don't use "worker" to mean executor
2014-03-13 12:11:33 -07:00
Thomas Graves b5162f4426 [SPARK-1233] Fix running hadoop 0.23 due to java.lang.NoSuchFieldException: DEFAULT_M...
...APREDUCE_APPLICATION_CLASSPATH

Author: Thomas Graves <tgraves@apache.org>

Closes #129 from tgravescs/SPARK-1233 and squashes the following commits:

85ff5a6 [Thomas Graves] Fix running hadoop 0.23 due to java.lang.NoSuchFieldException: DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH
2014-03-12 11:25:41 -07:00
Sandy Ryza 2409af9dcf SPARK-1064
This reopens PR 649 from incubator-spark against the new repo

Author: Sandy Ryza <sandy@cloudera.com>

Closes #102 from sryza/sandy-spark-1064 and squashes the following commits:

270e490 [Sandy Ryza] Handle different application classpath variables in different versions
88b04e0 [Sandy Ryza] SPARK-1064. Make it possible to run on YARN without bundling Hadoop jars in Spark assembly
2014-03-11 22:39:17 -07:00
Sandy Ryza 2a2c9645e4 SPARK-1211. In ApplicationMaster, set spark.master system property to "y...
...arn-cluster"

Author: Sandy Ryza <sandy@cloudera.com>

Closes #118 from sryza/sandy-spark-1211 and squashes the following commits:

d4001c7 [Sandy Ryza] SPARK-1211. In ApplicationMaster, set spark.master system property to "yarn-cluster"
2014-03-10 17:42:33 -07:00
Sandy Ryza a99fb3747a SPARK-1193. Fix indentation in pom.xmls
Author: Sandy Ryza <sandy@cloudera.com>

Closes #91 from sryza/sandy-spark-1193 and squashes the following commits:

a878124 [Sandy Ryza] SPARK-1193. Fix indentation in pom.xmls
2014-03-07 23:10:35 -08:00
Sandy Ryza 328c73d037 SPARK-1197. Change yarn-standalone to yarn-cluster and fix up running on YARN docs
This patch changes "yarn-standalone" to "yarn-cluster" (but still supports the former).  It also cleans up the Running on YARN docs and adds a section on how to view logs.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #95 from sryza/sandy-spark-1197 and squashes the following commits:

563ef3a [Sandy Ryza] Review feedback
6ad06d4 [Sandy Ryza] Change yarn-standalone to yarn-cluster and fix up running on YARN docs
2014-03-06 17:12:58 -08:00
Thomas Graves 7edbea41b4 SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets
resubmit pull request.  was https://github.com/apache/incubator-spark/pull/332.

Author: Thomas Graves <tgraves@apache.org>

Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits:

dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser
05eebed [Thomas Graves] Fix dependency lost in upmerge
d1040ec [Thomas Graves] Fix up various imports
05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase
13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests.
4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets
2f77147 [Thomas Graves] Rework from comments
50dd9f2 [Thomas Graves] fix header in SecurityManager
ecbfb65 [Thomas Graves] Fix spacing and formatting
b514bec [Thomas Graves] Fix reference to config
ed3d1c1 [Thomas Graves] Add security.md
6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments
2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework
5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils
f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 18:27:50 -06:00
Patrick Wendell c3f5e07533 SPARK-1121: Include avro for yarn-alpha builds
This lets us explicitly include Avro based on a profile for 0.23.X
builds. It makes me sad how convoluted it is to express this logic
in Maven. @tgraves and @sryza curious if this works for you.

I'm also considering just reverting to how it was before. The only
real problem was that Spark advertised a dependency on Avro
even though it only really depends transitively on Avro through
other deps.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #49 from pwendell/avro-build-fix and squashes the following commits:

8d6ee92 [Patrick Wendell] SPARK-1121: Add avro to yarn-alpha profile
2014-03-02 15:18:19 -08:00
Patrick Wendell 1fd2bfd3dd Remove remaining references to incubation
This removes some loose ends not caught by the other (incubating -> tlp) patches. @markhamstra this updates the version as you mentioned earlier.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #51 from pwendell/tlp and squashes the following commits:

d553b1b [Patrick Wendell] Remove remaining references to incubation
2014-03-02 01:00:16 -08:00
Sandy Ryza 46dff34458 SPARK-1051. On YARN, executors don't doAs submitting user
This reopens https://github.com/apache/incubator-spark/pull/538 against the new repo

Author: Sandy Ryza <sandy@cloudera.com>

Closes #29 from sryza/sandy-spark-1051 and squashes the following commits:

708ce49 [Sandy Ryza] SPARK-1051. doAs submitting user in YARN
2014-02-28 12:43:01 -06:00
Sandy Ryza 5f419bf9f4 SPARK-1032. If Yarn app fails before registering, app master stays aroun...
...d long after

This reopens https://github.com/apache/incubator-spark/pull/648 against the new repo.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #28 from sryza/sandy-spark-1032 and squashes the following commits:

5953f50 [Sandy Ryza] SPARK-1032. If Yarn app fails before registering, app master stays around long after
2014-02-28 09:40:47 -06:00
Sean Owen 12bbca2065 SPARK 1084.1 (resubmitted)
(Ported from https://github.com/apache/incubator-spark/pull/637 )

Author: Sean Owen <sowen@cloudera.com>

Closes #31 from srowen/SPARK-1084.1 and squashes the following commits:

6c4a32c [Sean Owen] Suppress warnings about legitimate unchecked array creations, or change code to avoid it
f35b833 [Sean Owen] Fix two misc javadoc problems
254e8ef [Sean Owen] Fix one new style error introduced in scaladoc warning commit
5b2fce2 [Sean Owen] Fix scaladoc invocation warning, and enable javac warnings properly, with plugin config updates
007762b [Sean Owen] Remove dead scaladoc links
b8ff8cb [Sean Owen] Replace deprecated Ant <tasks> with <target>
2014-02-27 11:12:21 -08:00
Sandy Ryza b8a1871953 SPARK-1053. Don't require SPARK_YARN_APP_JAR
It looks this just requires taking out the checks.

I verified that, with the patch, I was able to run spark-shell through yarn without setting the environment variable.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #553 from sryza/sandy-spark-1053 and squashes the following commits:

b037676 [Sandy Ryza] SPARK-1053.  Don't require SPARK_YARN_APP_JAR
2014-02-26 10:00:02 -06:00
Mark Hamstra c2341c92bb Merge pull request #542 from markhamstra/versionBump. Closes #542.
Version number to 1.0.0-SNAPSHOT

Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore.

@pwendell

Author: Mark Hamstra <markhamstra@gmail.com>

== Merge branch commits ==

commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
Author: Mark Hamstra <markhamstra@gmail.com>
Date:   Wed Feb 5 09:30:32 2014 -0800

    Version number to 1.0.0-SNAPSHOT
2014-02-08 16:00:43 -08:00
Thomas Graves 38020961d1 Merge pull request #526 from tgravescs/yarn_client_stop_am_fix. Closes #526.
spark on yarn - yarn-client mode doesn't always exit immediately

https://spark-project.atlassian.net/browse/SPARK-1049

If you run in the yarn-client mode but you don't get all the workers you requested right away and then you exit your application, the application master stays around until it gets the number of workers you initially requested. This is a waste of resources.  The AM should exit immediately upon the client going away.

This fix simply checks to see if the driver closed while its waiting for the initial # of workers.

Author: Thomas Graves <tgraves@apache.org>

== Merge branch commits ==

commit 03f40a62584b6bdd094ba91670cd4aa6afe7cd81
Author: Thomas Graves <tgraves@apache.org>
Date:   Fri Jan 31 11:23:10 2014 -0600

    spark on yarn - yarn-client mode doesn't always exit immediately
2014-02-05 23:37:07 -08:00
Sandy Ryza adf42611f1 Incorporate Tom's comments - update doc and code to reflect that core requests may not always be honored 2014-01-21 00:38:02 -08:00
Sandy Ryza 3e85b87d90 SPARK-1033. Ask for cores in Yarn container requests 2014-01-20 14:42:32 -08:00
Raymond Liu 4c22c55ad6 Address comments to fix code formats 2014-01-14 10:41:42 +08:00
Raymond Liu 161ab93989 Yarn workerRunnable refactor 2014-01-14 10:36:00 +08:00