This was added by me in 61a5cced04. The real fix will be added in [SPARK-4281](https://issues.apache.org/jira/browse/SPARK-4281).
Author: Andrew Or <andrew@databricks.com>
Closes#3145 from andrewor14/fix-make-distribution and squashes the following commits:
c78be61 [Andrew Or] Hot fix make distribution
This creates a new module `network/yarn` that depends on `network/shuffle` recently created in #3001. This PR introduces a custom Yarn auxiliary service that runs the external shuffle service. As of the changes here this shuffle service is required for using dynamic allocation with Spark.
This is still WIP mainly because it doesn't handle security yet. I have tested this on a stable Yarn cluster.
Author: Andrew Or <andrew@databricks.com>
Closes#3082 from andrewor14/yarn-shuffle-service and squashes the following commits:
ef3ddae [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
0ee67a2 [Andrew Or] Minor wording suggestions
1c66046 [Andrew Or] Remove unused provided dependencies
0eb6233 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
6489db5 [Andrew Or] Try catch at the right places
7b71d8f [Andrew Or] Add detailed java docs + reword a few comments
d1124e4 [Andrew Or] Add security to shuffle service (INCOMPLETE)
5f8a96f [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
9b6e058 [Andrew Or] Address various feedback
f48b20c [Andrew Or] Fix tests again
f39daa6 [Andrew Or] Do not make network-yarn an assembly module
761f58a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
15a5b37 [Andrew Or] Fix build for Hadoop 1.x
baff916 [Andrew Or] Fix tests
5bf9b7e [Andrew Or] Address a few minor comments
5b419b8 [Andrew Or] Add missing license header
804e7ff [Andrew Or] Include the Yarn shuffle service jar in the distribution
cd076a4 [Andrew Or] Require external shuffle service for dynamic allocation
ea764e0 [Andrew Or] Connect to Yarn shuffle service only if it's enabled
1bf5109 [Andrew Or] Use the shuffle service port specified through hadoop config
b4b1f0c [Andrew Or] 4 tabs -> 2 tabs
43dcb96 [Andrew Or] First cut integration of shuffle service with Yarn aux service
b54a0c4 [Andrew Or] Initial skeleton for Yarn shuffle service
Change 0dc868e removed the `conf/slaves` file and made it a template like most of the other configuration files. This means you can no longer run `make-distribution.sh` unless you manually create a slaves file to be statically bundled in your distribution, which seems at odds with making it a template file.
Author: Sarah Gerweck <sarah.a180@gmail.com>
Closes#2549 from sarahgerweck/noMoreSlaves and squashes the following commits:
d11d99a [Sarah Gerweck] Slaves file is now a template.
Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.
(Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)
Author: Sean Owen <sowen@cloudera.com>
Closes#2014 from srowen/SPARK-3069 and squashes the following commits:
501507e [Sean Owen] Note that Zinc is for Maven builds too
db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc
be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark
91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md
999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT
c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md
8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page
b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md
...
Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...
Author: Prashant Sharma <prashant.s@imaginea.com>
Closes#2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:
d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
`hadoop.version` and `yarn.version` are properties rather then profiles, should use `-D` instead of `-P`.
/cc pwendell
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes#2121 from liancheng/fix-make-dist and squashes the following commits:
4c49158 [Cheng Lian] Also mentions Hadoop version related Maven profiles
ed5b42a [Cheng Lian] Fixed typos in make-distribution.sh
Directory path for dependencies jar and resources in Tachyon 0.5.0 has been changed.
Author: Prudhvi Krishna <prudhvi953@gmail.com>
Closes#2228 from prudhvije/SPARK-3328/make-dist-fix and squashes the following commits:
d1d2c22 [Prudhvi Krishna] SPARK-3328 fixed make-distribution script --with-tachyon option.
Please refer to [SPARK-3234](https://issues.apache.org/jira/browse/SPARK-3234) for details.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes#2208 from liancheng/spark-3234 and squashes the following commits:
fb26de8 [Cheng Lian] Fixed SPARK-3234
Any time you use the directory name (`FWDIR`) it needs to be surrounded
in quotes. If you're also using wildcards, you can safely put the quotes
around just `$FWDIR`.
Author: Sarah Gerweck <sarah.a180@gmail.com>
Closes#1756 from sarahgerweck/folderSpaces and squashes the following commits:
732629d [Sarah Gerweck] Fix some bugs with spaces in directory name.
Author: Haoyuan Li <haoyuan@cs.berkeley.edu>
Closes#1651 from haoyuan/upgrade-tachyon and squashes the following commits:
6f3f98f [Haoyuan Li] upgrade tachyon to 0.5.0
make-distribution.sh gives a slightly off error message when using --with-hive.
Author: Mark Wagner <mwagner@mwagner-ld.linkedin.biz>
Closes#1489 from wagnermarkd/SPARK-2587 and squashes the following commits:
7b5d3ff [Mark Wagner] SPARK-2587: Fix error message in make-distribution.sh
Right now we have a bunch of parallel logic in make-distribution.sh
that's just extra work to maintain. We should just pass through
Maven profiles in this case and keep the script simple. See
the JIRA for more details.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#1445 from pwendell/make-distribution.sh and squashes the following commits:
f1294ea [Patrick Wendell] Simplify options in make-distribution.sh.
This patch adds the git revision hash (short version) to the RELEASE file. It uses git instead of simply checking for the existence of .git, so as to make sure that this is a functional repository.
Author: Guillaume Ballet <gballet@gmail.com>
Closes#1216 from gballet/master and squashes the following commits:
eabc50f [Guillaume Ballet] Refactored the script to take comments into account.
d93e5e8 [Guillaume Ballet] [SPARK 2233] make-distribution script now lists the git hash tag in the RELEASE file.
Author: Matthew Farrellee <matt@redhat.com>
Closes#1185 from mattf/master-1 and squashes the following commits:
42150fc [Matthew Farrellee] Autodetect JAVA_HOME on RPM-based systems
When mvn is not detected (not in executor's path), 'set -e' causes the
detection to terminate the script before the helpful error message can
be displayed.
Author: Matthew Farrellee <matt@redhat.com>
Closes#1181 from mattf/master-0 and squashes the following commits:
506549f [Matthew Farrellee] Fix mvn detection
This commit requires the user to manually say "yes" when buiding Spark
without Java 6. The prompt can be bypassed with a flag (e.g. if the user
is scripting around make-distribution).
Author: Patrick Wendell <pwendell@gmail.com>
Closes#859 from pwendell/java6 and squashes the following commits:
4921133 [Patrick Wendell] Adding Pyspark Notice
fee8c9e [Patrick Wendell] SPARK-1911: Emphasize that Spark jars should be built with Java 6.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#818 from pwendell/reamde and squashes the following commits:
4020b11 [Patrick Wendell] SPARK-1873: Add README.md file when making distributions
Gives a nicely formatted message to the user when `run-example` is run to
tell them to use `spark-submit`.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#704 from pwendell/examples and squashes the following commits:
1996ee8 [Patrick Wendell] Feedback form Andrew
3eb7803 [Patrick Wendell] Suggestions from TD
2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.
Author: Andrew Ash <andrew@andrewash.com>
Closes#680 from ash211/patch-3 and squashes the following commits:
9ce3746 [Andrew Ash] Typo fix: fetchting -> fetching
Also moves a few lines of code around in make-distribution.sh.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#669 from pwendell/make-distribution and squashes the following commits:
8bfac49 [Patrick Wendell] Small fix
46918ec [Patrick Wendell] SPARK-1737: Warn rather than fail when Java 7+ is used to create distributions.
73b0cbcc24 introduced a few special profiles that are not covered in the `make-distribution.sh`. This affects hadoop versions 2.2.x, 2.3.x, and 2.4.x. Without these special profiles, a java version error for protobufs is thrown at run time.
I took the opportunity to rewrite the way we construct the maven command. Previously, the only hadoop version that triggered the `yarn-alpha` profile was 0.23.x, which was inconsistent with the [docs](https://github.com/apache/spark/blob/master/docs/building-with-maven.md). This is now generalized to hadoop versions from 0.23.x to 2.1.x.
Author: Andrew Or <andrewor14@gmail.com>
Closes#660 from andrewor14/hadoop-distribution and squashes the following commits:
6740126 [Andrew Or] Generalize the yarn profile to hadoop versions 2.2+
88f192d [Andrew Or] Add the required special profiles to make-distribution.sh
This copies the datanucleus jars over from `lib_managed` into `dist/lib`, if any. The `CLASSPATH` must also be updated to reflect this change.
Author: Andrew Or <andrewor14@gmail.com>
Closes#610 from andrewor14/hive-distribution and squashes the following commits:
a4bc96f [Andrew Or] Rename search path in jar error check
fa205e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution
7855f58 [Andrew Or] Have jar command respect JAVA_HOME + check for jar errors both cases
c16bbfd [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution
32f6826 [Andrew Or] Leave the double colons
940a1bb [Andrew Or] Add back 2>/dev/null
58357cc [Andrew Or] Include datanucleus jars in Spark distribution built with Hive support
This add some guards and good warning messages if users hit this issue. /cc @aarondav with whom I discussed parts of the design.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#627 from pwendell/jdk6 and squashes the following commits:
a38a958 [Patrick Wendell] Code review feedback
94e9f84 [Patrick Wendell] SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7.
The current test is checking the exit code of "tail" rather than "mvn".
This new check will make sure that mvn is installed and was able to
execute the "version command".
Author: Rahul Singhal <rahul.singhal@guavus.com>
Closes#580 from rahulsinghaliitd/SPARK-1658 and squashes the following commits:
83c0313 [Rahul Singhal] SPARK-1658: Correctly identify if maven is installed and working
bf821b9 [Rahul Singhal] SPARK-1658: Correctly identify if maven is installed and working
Small bug fix to make sure the "spark contents" are copied to the
deployment directory correctly.
Author: Rahul Singhal <rahul.singhal@guavus.com>
Closes#573 from rahulsinghaliitd/SPARK-1651 and squashes the following commits:
402c999 [Rahul Singhal] SPARK-1651: Delete existing deployment directory
Better account for various side-effect outputs while executing
"mvn help:evaluate -Dexpression=project.version"
Author: Rahul Singhal <rahul.singhal@guavus.com>
Closes#572 from rahulsinghaliitd/SPARK-1650 and squashes the following commits:
fd6a611 [Rahul Singhal] SPARK-1650: Correctly identify maven project version
This simplifies the shell a bunch and passes all arguments through to spark-submit.
There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#542 from pwendell/spark-shell and squashes the following commits:
9eb3e6f [Patrick Wendell] Updating Spark docs
b552459 [Patrick Wendell] Andrew's feedback
97720fa [Patrick Wendell] Review feedback
aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit
1. Makes assembly and examples jar naming consistent in maven/sbt.
2. Updates make-distribution.sh to use Maven and fixes some bugs.
3. Updates the create-release script to call make-distribution script.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#502 from pwendell/make-distribution and squashes the following commits:
1a97f0d [Patrick Wendell] SPARK-1119 and other build improvements
Author: Nick Lanham <nick@afternight.org>
Closes#264 from nicklan/make-distribution-fixes and squashes the following commits:
172b981 [Nick Lanham] fix path for jar, make sed actually work on OSX
I don't have access to an OSX machine, so if someone could test this that would be great.
Author: Nick Lanham <nick@afternight.org>
Closes#258 from nicklan/osx-sed-fix and squashes the following commits:
a6f158f [Nick Lanham] Also make mktemp work on OSX
558fd6e [Nick Lanham] Make sed do -i '' on OSX
This should all work as expected with the current version of the tachyon tarball (0.4.1)
Author: Nick Lanham <nick@afternight.org>
Closes#137 from nicklan/bundle-tachyon and squashes the following commits:
2eee15b [Nick Lanham] Put back in exec, start tachyon first
738ba23 [Nick Lanham] Move tachyon out of sbin
f2f9bc6 [Nick Lanham] More checks for tachyon script
111e8e1 [Nick Lanham] Only try tachyon operations if tachyon script exists
0561574 [Nick Lanham] Copy over web resources so web interface can run
4dc9809 [Nick Lanham] Update to tachyon 0.4.1
0a1a20c [Nick Lanham] Add scripts using tachyon tarball
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.
As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.