spark-instrumented-optimizer/docs/building-spark.md

---
layout: global
title: Building Spark
redirect_from: "building-with-maven.html"
---

* This will become a table of contents (this text will be scraped).
{:toc}

# Building Apache Spark

## Apache Maven

The Maven-based build is the build of reference for Apache Spark.
Building Spark using Maven requires Maven 3.3.9 or newer and Java 7+.

### Setting up Maven's Memory Usage

You'll need to configure Maven to use more memory than usual by setting `MAVEN_OPTS`. We recommend the following settings:

    export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

If you don't run this, you may see errors like the following:

    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_BINARY_VERSION}}/classes...
    [ERROR] PermGen space -> [Help 1]

    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_BINARY_VERSION}}/classes...
    [ERROR] Java heap space -> [Help 1]

You can fix this by setting the `MAVEN_OPTS` variable as discussed before.

**Note:**

* For Java 8 and above this step is not required.
* If using `build/mvn` with no `MAVEN_OPTS` set, the script will automate this for you.

### build/mvn

Spark now comes packaged with a self-contained Maven installation to ease building and deployment of Spark from source located under the `build/` directory. This script will automatically download and setup all necessary build requirements ([Maven](https://maven.apache.org/), [Scala](http://www.scala-lang.org/), and [Zinc](https://github.com/typesafehub/zinc)) locally within the `build/` directory itself. It honors any `mvn` binary if present already, however, will pull down its own copy of Scala and Zinc regardless to ensure proper version requirements are met. `build/mvn` execution acts as a pass through to the `mvn` call allowing easy transition from previous build methods. As an example, one can build a version of Spark as follows:

    ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

Other build examples can be found below.

## Building a Runnable Distribution

To create a Spark distribution like those distributed by the
[Spark Downloads](http://spark.apache.org/downloads.html) page, and that is laid out so as
to be runnable, use `./dev/make-distribution.sh` in the project root directory. It can be configured
with Maven profile settings and so on like the direct Maven build. Example:

    ./dev/make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn

For more information on usage, run `./dev/make-distribution.sh --help`

## Specifying the Hadoop Version

Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the `hadoop.version` property. If unset, Spark will build against Hadoop 2.2.0 by default. Note that certain build profiles are required for particular Hadoop versions:

<table class="table">
  <thead>
    <tr><th>Hadoop version</th><th>Profile required</th></tr>
  </thead>
  <tbody>
    <tr><td>2.2.x</td><td>hadoop-2.2</td></tr>
    <tr><td>2.3.x</td><td>hadoop-2.3</td></tr>
    <tr><td>2.4.x</td><td>hadoop-2.4</td></tr>
    <tr><td>2.6.x</td><td>hadoop-2.6</td></tr>
    <tr><td>2.7.x and later 2.x</td><td>hadoop-2.7</td></tr>
  </tbody>
</table>


You can enable the `yarn` profile and optionally set the `yarn.version` property if it is different from `hadoop.version`. Spark only supports YARN versions 2.2.0 and later.

Examples:

    # Apache Hadoop 2.2.X
    ./build/mvn -Pyarn -Phadoop-2.2 -DskipTests clean package

    # Apache Hadoop 2.3.X
    ./build/mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package

    # Apache Hadoop 2.4.X or 2.5.X
    ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean package

    # Apache Hadoop 2.6.X
    ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package

    # Apache Hadoop 2.7.X and later
    ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=VERSION -DskipTests clean package

    # Different versions of HDFS and YARN.
    ./build/mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Dyarn.version=2.2.0 -DskipTests clean package

## Building With Hive and JDBC Support

To enable Hive integration for Spark SQL along with its JDBC server and CLI,
add the `-Phive` and `Phive-thriftserver` profiles to your existing build options.
By default Spark will build with Hive 1.2.1 bindings.

    # Apache Hadoop 2.4.X with Hive 1.2.1 support
    ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package

## Packaging without Hadoop Dependencies for YARN

The assembly directory produced by `mvn package` will, by default, include all of Spark's 
dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this 
causes multiple versions of these to appear on executor classpaths: the version packaged in 
the Spark assembly and the version on each node, included with `yarn.application.classpath`.
The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, 
like ZooKeeper and Hadoop itself.

## Building for Scala 2.10
To produce a Spark package compiled with Scala 2.10, use the `-Dscala-2.10` property:

    ./dev/change-scala-version.sh 2.10
    ./build/mvn -Pyarn -Phadoop-2.4 -Dscala-2.10 -DskipTests clean package

## Building submodules individually

It's possible to build Spark sub-modules using the `mvn -pl` option.

For instance, you can build the Spark Streaming module using:

    ./build/mvn -pl :spark-streaming_2.11 clean install

where `spark-streaming_2.11` is the `artifactId` as defined in `streaming/pom.xml` file.

## Continuous Compilation

We use the scala-maven-plugin which supports incremental and continuous compilation. E.g.

    ./build/mvn scala:cc

should run continuous compilation (i.e. wait for changes). However, this has not been tested
extensively. A couple of gotchas to note:

* it only scans the paths `src/main` and `src/test` (see
[docs](http://scala-tools.org/mvnsites/maven-scala-plugin/usage_cc.html)), so it will only work
from within certain submodules that have that structure.

* you'll typically need to run `mvn install` from the project root for compilation within
specific submodules to work; this is because submodules that depend on other submodules do so via
the `spark-parent` module).

Thus, the full flow for running continuous-compilation of the `core` submodule may look more like:

    $ ./build/mvn install
    $ cd core
    $ ../build/mvn scala:cc

## Speeding up Compilation with Zinc

[Zinc](https://github.com/typesafehub/zinc) is a long-running server version of SBT's incremental
compiler. When run locally as a background process, it speeds up builds of Scala-based projects
like Spark. Developers who regularly recompile Spark with Maven will be the most interested in
Zinc. The project site gives instructions for building and running `zinc`; OS X users can
install it using `brew install zinc`.

If using the `build/mvn` package `zinc` will automatically be downloaded and leveraged for all
builds. This process will auto-start after the first time `build/mvn` is called and bind to port
3030 unless the `ZINC_PORT` environment variable is set. The `zinc` process can subsequently be
shut down at any time by running `build/zinc-<version>/bin/zinc -shutdown` and will automatically
restart whenever `build/mvn` is called.

## Building with SBT

Maven is the official build tool recommended for packaging Spark, and is the *build of reference*.
But SBT is supported for day-to-day development since it can provide much faster iterative
compilation. More advanced developers may wish to use SBT.

The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables
can be set to control the SBT build. For example:

    ./build/sbt -Pyarn -Phadoop-2.3 package

To avoid the overhead of launching sbt each time you need to re-compile, you can launch sbt
in interactive mode by running `build/sbt`, and then run all build commands at the command
prompt. For more recommendations on reducing build time, refer to the
[wiki page](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-ReducingBuildTimes).

## Encrypted Filesystems

When building on an encrypted filesystem (if your home directory is encrypted, for example), then the Spark build might fail with a "Filename too long" error. As a workaround, add the following in the configuration args of the `scala-maven-plugin` in the project `pom.xml`:

    <arg>-Xmax-classfile-name</arg>
    <arg>128</arg>

and in `project/SparkBuild.scala` add:

    scalacOptions in Compile ++= Seq("-Xmax-classfile-name", "128"),

to the `sharedSettings` val. See also [this PR](https://github.com/apache/spark/pull/2883/files) if you are unsure of where to add these lines.

## IntelliJ IDEA or Eclipse

For help in setting up IntelliJ IDEA or Eclipse for Spark development, and troubleshooting, refer to the
[wiki page for IDE setup](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup).


# Running Tests

Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin).

Some of the tests require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time.  The following is an example of a correct (build, test) sequence:

    ./build/mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean package
    ./build/mvn -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test

The ScalaTest plugin also supports running only a specific Scala test suite as follows:

    ./build/mvn -P... -Dtest=none -DwildcardSuites=org.apache.spark.repl.ReplSuite test
    ./build/mvn -P... -Dtest=none -DwildcardSuites=org.apache.spark.repl.* test

or a Java test:

    ./build/mvn test -P... -DwildcardSuites=none -Dtest=org.apache.spark.streaming.JavaAPISuite

## Testing with SBT

Some of the tests require Spark to be packaged first, so always run `build/sbt package` the first time.  The following is an example of a correct (build, test) sequence:

    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver package
    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test

To run only a specific test suite as follows:

    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver "test-only org.apache.spark.repl.ReplSuite"
    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver "test-only org.apache.spark.repl.*"

To run test suites of a specific sub project as follows:

    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver core/test

## Running Java 8 Test Suites

Running only Java 8 tests and nothing else.

    ./build/mvn install -DskipTests
    ./build/mvn -pl :java8-tests_2.11 test

or

    ./build/sbt java8-tests/test

Java 8 tests are automatically enabled when a Java 8 JDK is detected.
If you have JDK 8 installed but it is not the system default, you can set JAVA_HOME to point to JDK 8 before running the tests.

## PySpark Tests with Maven

If you are building PySpark and wish to run the PySpark tests you will need to build Spark with Hive support.

    ./build/mvn -DskipTests clean package -Phive
    ./python/run-tests

The run-tests script also can be limited to a specific Python version or a specific module

    ./python/run-tests --python-executables=python --modules=pyspark-sql

**Note:** You can also run Python tests with an sbt build, provided you build Spark with Hive support.

## Running R Tests

To run the SparkR tests you will need to install the R package `testthat` 
(run `install.packages(testthat)` from R shell).  You can run just the SparkR tests using 
the command:

    ./R/run-tests.sh

## Running Docker-based Integration Test Suites

In order to run Docker integration tests, you have to install the `docker` engine on your box. 
The instructions for installation can be found at [the Docker site](https://docs.docker.com/engine/installation/). 
Once installed, the `docker` service needs to be started, if not already running. 
On Linux, this can be done by `sudo service docker start`.

    ./build/mvn install -DskipTests
    ./build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11

or

    ./build/sbt docker-integration-tests/test
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
+								---
 								layout: global
-												SPARK-3069 [DOCS] Build instructions in README are outdated

Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.

(Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)

Author: Sean Owen <sowen@cloudera.com>

Closes #2014 from srowen/SPARK-3069 and squashes the following commits:

501507e [Sean Owen] Note that Zinc is for Maven builds too
db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc
be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark
91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md
999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT
c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md
8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page
b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md

											
										
										
											2014-09-16 12:18:03 -04:00
+								title: Building Spark
 								redirect_from: "building-with-maven.html"
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
+								---
 								* This will become a table of contents (this text will be scraped).
 								{:toc}
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								# Building Apache Spark
-												[SPARK-4556] [BUILD] binary distribution assembly can't run in local mode

Add note on building a runnable distribution with make-distribution.sh

Author: Sean Owen <sowen@cloudera.com>

Closes #6186 from srowen/SPARK-4556 and squashes the following commits:

4002966 [Sean Owen] Add pointer to --help flag
9fa7883 [Sean Owen] Add note on building a runnable distribution with make-distribution.sh

											
										
										
											2015-05-16 03:18:41 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Apache Maven
-												[SPARK-4556] [BUILD] binary distribution assembly can't run in local mode

Add note on building a runnable distribution with make-distribution.sh

Author: Sean Owen <sowen@cloudera.com>

Closes #6186 from srowen/SPARK-4556 and squashes the following commits:

4002966 [Sean Owen] Add pointer to --help flag
9fa7883 [Sean Owen] Add note on building a runnable distribution with make-distribution.sh

											
										
										
											2015-05-16 03:18:41 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								The Maven-based build is the build of reference for Apache Spark.
 								Building Spark using Maven requires Maven 3.3.9 or newer and Java 7+.
-												[docs] [SPARK-4820] Spark build encounters "File name too long" on some encrypted filesystems

Added a note instructing users how to build Spark in an encrypted file system.

Author: Theodore Vasiloudis <tvas@sics.se>

Closes #5041 from thvasilo/patch-2 and squashes the following commits:

09d890b [Theodore Vasiloudis] Workaroung for buiding in an encrypted filesystem

											
										
										
											2015-03-17 07:25:01 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								### Setting up Maven's Memory Usage
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												Update Maven docs

											
										
										
											2013-08-27 23:02:30 -04:00
+								You'll need to configure Maven to use more memory than usual by setting `MAVEN_OPTS`. We recommend the following settings:
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												Update Maven docs

											
										
										
											2013-08-27 23:02:30 -04:00
+								If you don't run this, you may see errors like the following:
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-1105] fix site scala version error in docs

https://spark-project.atlassian.net/browse/SPARK-1105

fix site scala version error

Author: CodingCat <zhunansjtu@gmail.com>

Closes #618 from CodingCat/doc_version and squashes the following commits:

39bb8aa [CodingCat] more fixes
65bedb0 [CodingCat] fix site scala version error in doc

											
										
										
											2014-02-19 18:54:03 -05:00
+								    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_BINARY_VERSION}}/classes...
-												Update Maven docs

											
										
										
											2013-08-27 23:02:30 -04:00
+								    [ERROR] PermGen space -> [Help 1]
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-1105] fix site scala version error in docs

https://spark-project.atlassian.net/browse/SPARK-1105

fix site scala version error

Author: CodingCat <zhunansjtu@gmail.com>

Closes #618 from CodingCat/doc_version and squashes the following commits:

39bb8aa [CodingCat] more fixes
65bedb0 [CodingCat] fix site scala version error in doc

											
										
										
											2014-02-19 18:54:03 -05:00
+								    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_BINARY_VERSION}}/classes...
-												Update Maven docs

											
										
										
											2013-08-27 23:02:30 -04:00
+								    [ERROR] Java heap space -> [Help 1]
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												Update Maven docs

											
										
										
											2013-08-27 23:02:30 -04:00
+								You can fix this by setting the `MAVEN_OPTS` variable as discussed before.
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac

Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.

Author: Brennon York <brennon.york@capitalone.com>

Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits:

0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this)
9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn
d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn
b979c58 [Brennon York] updated the merge conflict
c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt
b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set
28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead
7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed
14a5da0 [Brennon York] removed unnecessary zinc output on startup
1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable
3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed
a680d12 [Brennon York] Added comments to functions and tested various mvn calls
bb8cc9d [Brennon York] removed package files
ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli
07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version
f914dea [Brennon York] Beginning final portions of localized scala home
69c4e44 [Brennon York] working linux and osx installers for purely local mvn build
4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder
cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability

											
										
										
											2014-12-27 16:25:18 -05:00
+								**Note:**
-												Docs small fixes

Author: Jacek Laskowski <jacek@japila.pl>

Closes #8629 from jaceklaskowski/docs-fixes.

											
										
										
											2015-09-08 09:38:10 -04:00
 								* For Java 8 and above this step is not required.
 								* If using `build/mvn` with no `MAVEN_OPTS` set, the script will automate this for you.
-												[java8API] SPARK-964 Investigate the potential for using JDK 8 lambda expressions for the Java/Scala APIs

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>

Closes #17 from ScrapCodes/java8-lambdas and squashes the following commits:

95850e6 [Patrick Wendell] Some doc improvements and build changes to the Java 8 patch.
85a954e [Prashant Sharma] Nit. import orderings.
673f7ac [Prashant Sharma] Added support for -java-home as well
80a13e8 [Prashant Sharma] Used fake class tag syntax
26eb3f6 [Prashant Sharma] Patrick's comments on PR.
35d8d79 [Prashant Sharma] Specified java 8 building in the docs
31d4cd6 [Prashant Sharma] Maven build to support -Pjava8-tests flag.
4ab87d3 [Prashant Sharma] Review feedback on the pr
c33dc2c [Prashant Sharma] SPARK-964, Java 8 API Support.

											
										
										
											2014-03-04 01:31:30 -05:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								### build/mvn
 								Spark now comes packaged with a self-contained Maven installation to ease building and deployment of Spark from source located under the `build/` directory. This script will automatically download and setup all necessary build requirements ([Maven](https://maven.apache.org/), [Scala](http://www.scala-lang.org/), and [Zinc](https://github.com/typesafehub/zinc)) locally within the `build/` directory itself. It honors any `mvn` binary if present already, however, will pull down its own copy of Scala and Zinc regardless to ensure proper version requirements are met. `build/mvn` execution acts as a pass through to the `mvn` call allowing easy transition from previous build methods. As an example, one can build a version of Spark as follows:
 								    ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
 								Other build examples can be found below.
 								## Building a Runnable Distribution
 								To create a Spark distribution like those distributed by the
 								[Spark Downloads](http://spark.apache.org/downloads.html) page, and that is laid out so as
 								to be runnable, use `./dev/make-distribution.sh` in the project root directory. It can be configured
 								with Maven profile settings and so on like the direct Maven build. Example:
 								    ./dev/make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn
 								For more information on usage, run `./dev/make-distribution.sh --help`
 								## Specifying the Hadoop Version
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												Docs small fixes

Author: Jacek Laskowski <jacek@japila.pl>

Closes #8629 from jaceklaskowski/docs-fixes.

											
										
										
											2015-09-08 09:38:10 -04:00
+								Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the `hadoop.version` property. If unset, Spark will build against Hadoop 2.2.0 by default. Note that certain build profiles are required for particular Hadoop versions:
-												SPARK-1556. jets3t dep doesn't update properly with newer Hadoop versions

See related discussion at https://github.com/apache/spark/pull/468

This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`.

- Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows.
- Removes `hadoop.major.version`
- Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes:
  - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue
  - like the jets3t version issue now
- Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden
- _(YARN profiles in the parent now only exist to add the sub-module)_
- Fixes the jets3t dependency issue
 - and makes it a runtime dependency
 - and centralizes config of this guy in the parent pom
- Updates build docs
- Updates SBT build too
  - and fixes a regex problem along the way

Author: Sean Owen <sowen@cloudera.com>

Closes #629 from srowen/SPARK-1556 and squashes the following commits:

c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc
a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles
274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config
bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build)
f21f356 [Sean Owen] Build changes to set up for jets3t fix

											
										
										
											2014-05-05 13:33:49 -04:00
 								<table class="table">
 								  <thead>
 								    <tr><th>Hadoop version</th><th>Profile required</th></tr>
 								  </thead>
 								  <tbody>
 								    <tr><td>2.2.x</td><td>hadoop-2.2</td></tr>
 								    <tr><td>2.3.x</td><td>hadoop-2.3</td></tr>
 								    <tr><td>2.4.x</td><td>hadoop-2.4</td></tr>
-												[SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile

This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version.

/cc rxin srowen

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10775 from JoshRosen/add-hadoop-2.7-profile.

											
										
										
											2016-01-15 20:07:24 -05:00
+								    <tr><td>2.6.x</td><td>hadoop-2.6</td></tr>
 								    <tr><td>2.7.x and later 2.x</td><td>hadoop-2.7</td></tr>
-												SPARK-1556. jets3t dep doesn't update properly with newer Hadoop versions

See related discussion at https://github.com/apache/spark/pull/468

This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`.

- Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows.
- Removes `hadoop.major.version`
- Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes:
  - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue
  - like the jets3t version issue now
- Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden
- _(YARN profiles in the parent now only exist to add the sub-module)_
- Fixes the jets3t dependency issue
 - and makes it a runtime dependency
 - and centralizes config of this guy in the parent pom
- Updates build docs
- Updates SBT build too
  - and fixes a regex problem along the way

Author: Sean Owen <sowen@cloudera.com>

Closes #629 from srowen/SPARK-1556 and squashes the following commits:

c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc
a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles
274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config
bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build)
f21f356 [Sean Owen] Build changes to set up for jets3t fix

											
										
										
											2014-05-05 13:33:49 -04:00
+								  </tbody>
 								</table>
-												Improved build configuration

1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x
2, Fix SPARK-1491: maven hadoop-provided profile fails to build
3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency
4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces)

Author: witgo <witgo@qq.com>

Closes #480 from witgo/format_pom and squashes the following commits:

03f652f [witgo] review commit
b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence
7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence
0da4bc3 [witgo] merge master
d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
e345919 [witgo] add avro dependency to yarn-alpha
77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency
1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
934f24d [witgo] review commit
cf46edc [witgo] exclude jruby
06e7328 [witgo] Merge branch 'SparkBuild' into format_pom
99464d2 [witgo] fix maven hadoop-provided profile fails to build
0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x
6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml

											
										
										
											2014-04-29 01:50:51 -04:00
-												Docs small fixes

Author: Jacek Laskowski <jacek@japila.pl>

Closes #8629 from jaceklaskowski/docs-fixes.

											
										
										
											2015-09-08 09:38:10 -04:00
+								You can enable the `yarn` profile and optionally set the `yarn.version` property if it is different from `hadoop.version`. Spark only supports YARN versions 2.2.0 and later.
-												[SPARK-4032] Deprecate YARN alpha support in Spark 1.2

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #2878 from ScrapCodes/SPARK-4032/deprecate-yarn-alpha and squashes the following commits:

17e9857 [Prashant Sharma] added deperecated comment to Client and ExecutorRunnable.
3a34b1e [Prashant Sharma] Updated docs...
4608dea [Prashant Sharma] [SPARK-4032] Deprecate YARN alpha support in Spark 1.2

											
										
										
											2014-10-27 13:02:48 -04:00
-												SPARK-1556. jets3t dep doesn't update properly with newer Hadoop versions

See related discussion at https://github.com/apache/spark/pull/468

This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`.

- Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows.
- Removes `hadoop.major.version`
- Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes:
  - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue
  - like the jets3t version issue now
- Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden
- _(YARN profiles in the parent now only exist to add the sub-module)_
- Fixes the jets3t dependency issue
 - and makes it a runtime dependency
 - and centralizes config of this guy in the parent pom
- Updates build docs
- Updates SBT build too
  - and fixes a regex problem along the way

Author: Sean Owen <sowen@cloudera.com>

Closes #629 from srowen/SPARK-1556 and squashes the following commits:

c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc
a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles
274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config
bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build)
f21f356 [Sean Owen] Build changes to set up for jets3t fix

											
										
										
											2014-05-05 13:33:49 -04:00
+								Examples:
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    # Apache Hadoop 2.2.X
 								    ./build/mvn -Pyarn -Phadoop-2.2 -DskipTests clean package
-												[SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versions

Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons.

Changes proposed by vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly.

Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned.

Author: FavioVazquez <favio.vazquezp@gmail.com>

Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits:

11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh
379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior
3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies
31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation
cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about  hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies
83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml
93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM
668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM
fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh  due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml
0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0
a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml
199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that.
88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file
70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles
287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc.
1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation
6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff.
7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)

											
										
										
											2015-05-14 10:22:58 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    # Apache Hadoop 2.3.X
 								    ./build/mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package
-												SPARK-1556. jets3t dep doesn't update properly with newer Hadoop versions

See related discussion at https://github.com/apache/spark/pull/468

This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`.

- Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows.
- Removes `hadoop.major.version`
- Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes:
  - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue
  - like the jets3t version issue now
- Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden
- _(YARN profiles in the parent now only exist to add the sub-module)_
- Fixes the jets3t dependency issue
 - and makes it a runtime dependency
 - and centralizes config of this guy in the parent pom
- Updates build docs
- Updates SBT build too
  - and fixes a regex problem along the way

Author: Sean Owen <sowen@cloudera.com>

Closes #629 from srowen/SPARK-1556 and squashes the following commits:

c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc
a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles
274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config
bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build)
f21f356 [Sean Owen] Build changes to set up for jets3t fix

											
										
										
											2014-05-05 13:33:49 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    # Apache Hadoop 2.4.X or 2.5.X
 								    ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean package
-												SPARK-1556. jets3t dep doesn't update properly with newer Hadoop versions

See related discussion at https://github.com/apache/spark/pull/468

This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`.

- Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows.
- Removes `hadoop.major.version`
- Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes:
  - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue
  - like the jets3t version issue now
- Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden
- _(YARN profiles in the parent now only exist to add the sub-module)_
- Fixes the jets3t dependency issue
 - and makes it a runtime dependency
 - and centralizes config of this guy in the parent pom
- Updates build docs
- Updates SBT build too
  - and fixes a regex problem along the way

Author: Sean Owen <sowen@cloudera.com>

Closes #629 from srowen/SPARK-1556 and squashes the following commits:

c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc
a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles
274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config
bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build)
f21f356 [Sean Owen] Build changes to set up for jets3t fix

											
										
										
											2014-05-05 13:33:49 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    # Apache Hadoop 2.6.X
 								    ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
-												SPARK-4457. Document how to build for Hadoop versions greater than 2.4

Author: Sandy Ryza <sandy@cloudera.com>

Closes #3322 from sryza/sandy-spark-4457 and squashes the following commits:

5e72b77 [Sandy Ryza] Feedback
0cf05c1 [Sandy Ryza] Caveat
be8084b [Sandy Ryza] SPARK-4457. Document how to build for Hadoop versions greater than 2.4

											
										
										
											2014-11-24 14:28:48 -05:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    # Apache Hadoop 2.7.X and later
 								    ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=VERSION -DskipTests clean package
-												[MINOR][DOCS] Update build descriptions and commands

## What changes were proposed in this pull request?

This PR updates Scala and Hadoop versions in the build description and commands in `Building Spark` documents.

## How was this patch tested?

N/A

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11838 from dongjoon-hyun/fix_doc_building_spark.

											
										
										
											2016-03-19 00:32:48 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    # Different versions of HDFS and YARN.
 								    ./build/mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Dyarn.version=2.2.0 -DskipTests clean package
-												Improved build configuration

1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x
2, Fix SPARK-1491: maven hadoop-provided profile fails to build
3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency
4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces)

Author: witgo <witgo@qq.com>

Closes #480 from witgo/format_pom and squashes the following commits:

03f652f [witgo] review commit
b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence
7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence
0da4bc3 [witgo] merge master
d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
e345919 [witgo] add avro dependency to yarn-alpha
77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency
1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
934f24d [witgo] review commit
cf46edc [witgo] exclude jruby
06e7328 [witgo] Merge branch 'SparkBuild' into format_pom
99464d2 [witgo] fix maven hadoop-provided profile fails to build
0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x
6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml

											
										
										
											2014-04-29 01:50:51 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Building With Hive and JDBC Support
-												The default version of yarn is equal to the hadoop version

This is a part of [PR 590](https://github.com/apache/spark/pull/590)

Author: witgo <witgo@qq.com>

Closes #626 from witgo/yarn_version and squashes the following commits:

c390631 [witgo] restore  the yarn dependency declarations
f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha
2df6cf5 [witgo] review commit
a1d876a [witgo] review commit
20e7e3e [witgo] review commit
c76763b [witgo] The default value of yarn.version is equal to hadoop.version

											
										
										
											2014-05-04 02:32:12 -04:00
-												[SPARK-2963] REGRESSION - The description about how to build for using CLI and Thrift JDBC server is absent in proper document  -

The most important things I mentioned in #1885 is as follows.

* People who build Spark is not always programmer.
* If a person who build Spark is not a programmer, he/she won't read programmer's guide before building.

So, how to build for using CLI and JDBC server is not only in programmer's guide.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2080 from sarutak/SPARK-2963 and squashes the following commits:

ee07c76 [Kousuke Saruta] Modified regression of the description about building for using Thrift JDBC server and CLI
ed53329 [Kousuke Saruta] Modified description and notaton of proper noun
07c59fc [Kousuke Saruta] Added a description about how to build to use HiveServer and CLI for SparkSQL to building-with-maven.md
6e6645a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2963
c88fa93 [Kousuke Saruta] Added a description about building to use HiveServer and CLI for SparkSQL

											
										
										
											2014-08-23 01:28:05 -04:00
+								To enable Hive integration for Spark SQL along with its JDBC server and CLI,
-												Support cross building for Scala 2.11

Let's give this another go using a version of Hive that shades its JLine dependency.

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>

Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:

e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
2121071 [Patrick Wendell] Migrating version detection to PySpark
b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
f5cad4e [Patrick Wendell] Add Scala 2.11 docs
210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
e22b104 [Patrick Wendell] Small fix in pom file
ec402ab [Patrick Wendell] Various fixes
0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
4eaec65 [Prashant Sharma] Changed scripts to ignore target.
5167bea [Prashant Sharma] small correction
a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
cb059b0 [Prashant Sharma] Code review
0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.

											
										
										
											2014-11-12 00:36:48 -05:00
+								add the `-Phive` and `Phive-thriftserver` profiles to your existing build options.
-												[MINOR][DOC] Fix supported hive version in doc

## What changes were proposed in this pull request?

Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines.
```
-By default Spark will build with Hive 0.13.1 bindings.
+By default Spark will build with Hive 1.2.1 bindings.
-# Apache Hadoop 2.4.X with Hive 13 support
+# Apache Hadoop 2.4.X with Hive 1.2.1 support
```
`sql/README.md` file also describe

## How was this patch tested?

Manual.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11639 from dongjoon-hyun/fix_doc_hive_version.

											
										
										
											2016-03-10 20:07:18 -05:00
+								By default Spark will build with Hive 1.2.1 bindings.
-												[SPARK-13382][DOCS][PYSPARK] Update pyspark testing notes in build docs

## What changes were proposed in this pull request?

The current build documents don't specify that for PySpark tests we need to include Hive in the assembly otherwise the ORC tests fail.

## How was the this patch tested?

Manually built the docs locally. Ran the provided build command follow by the PySpark SQL tests.

![pyspark2](https://cloud.githubusercontent.com/assets/59893/13190008/8829cde4-d70f-11e5-8ff5-a88b7894d2ad.png)

Author: Holden Karau <holden@us.ibm.com>

Closes #11278 from holdenk/SPARK-13382-update-pyspark-testing-notes-r2.

											
										
										
											2016-05-10 13:29:38 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    # Apache Hadoop 2.4.X with Hive 1.2.1 support
 								    ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
-												[SPARK-13382][DOCS][PYSPARK] Update pyspark testing notes in build docs

## What changes were proposed in this pull request?

The current build documents don't specify that for PySpark tests we need to include Hive in the assembly otherwise the ORC tests fail.

## How was the this patch tested?

Manually built the docs locally. Ran the provided build command follow by the PySpark SQL tests.

![pyspark2](https://cloud.githubusercontent.com/assets/59893/13190008/8829cde4-d70f-11e5-8ff5-a88b7894d2ad.png)

Author: Holden Karau <holden@us.ibm.com>

Closes #11278 from holdenk/SPARK-13382-update-pyspark-testing-notes-r2.

											
										
										
											2016-05-10 13:29:38 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Packaging without Hadoop Dependencies for YARN
-												[SPARK-13382][DOCS][PYSPARK] Update pyspark testing notes in build docs

## What changes were proposed in this pull request?

The current build documents don't specify that for PySpark tests we need to include Hive in the assembly otherwise the ORC tests fail.

## How was the this patch tested?

Manually built the docs locally. Ran the provided build command follow by the PySpark SQL tests.

![pyspark2](https://cloud.githubusercontent.com/assets/59893/13190008/8829cde4-d70f-11e5-8ff5-a88b7894d2ad.png)

Author: Holden Karau <holden@us.ibm.com>

Closes #11278 from holdenk/SPARK-13382-update-pyspark-testing-notes-r2.

											
										
										
											2016-05-10 13:29:38 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								The assembly directory produced by `mvn package` will, by default, include all of Spark's
 								dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this
 								causes multiple versions of these to appear on executor classpaths: the version packaged in
 								the Spark assembly and the version on each node, included with `yarn.application.classpath`.
 								The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects,
 								like ZooKeeper and Hadoop itself.
-												SPARK-2712 - Add a small note to maven doc that mvn package must happen ...

Per request by Reynold adding small note about proper sequencing of build then test.

Author: Stephen Boesch <javadba@gmail.com>

Closes #1615 from javadba/docs and squashes the following commits:

6c3183e [Stephen Boesch] Moved updated testing blurb per PWendell
5764757 [Stephen Boesch] SPARK-2712 - Add a small note to maven doc that mvn package must happen before test

											
										
										
											2014-08-03 13:19:04 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Building for Scala 2.10
 								To produce a Spark package compiled with Scala 2.10, use the `-Dscala-2.10` property:
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./dev/change-scala-version.sh 2.10
 								    ./build/mvn -Pyarn -Phadoop-2.4 -Dscala-2.10 -DskipTests clean package
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Building submodules individually
-												[SPARK-10883] Add a note about how to build Spark sub-modules (reactor)

Author: Jean-Baptiste Onofré <jbonofre@apache.org>

Closes #8993 from jbonofre/SPARK-10883-2.

											
										
										
											2015-10-08 06:38:39 -04:00
 								It's possible to build Spark sub-modules using the `mvn -pl` option.
 								For instance, you can build the Spark Streaming module using:
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./build/mvn -pl :spark-streaming_2.11 clean install
-												[SPARK-10883] Add a note about how to build Spark sub-modules (reactor)

Author: Jean-Baptiste Onofré <jbonofre@apache.org>

Closes #8993 from jbonofre/SPARK-10883-2.

											
										
										
											2015-10-08 06:38:39 -04:00
-												[MINOR][DOCS] Update build descriptions and commands

## What changes were proposed in this pull request?

This PR updates Scala and Hadoop versions in the build description and commands in `Building Spark` documents.

## How was this patch tested?

N/A

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11838 from dongjoon-hyun/fix_doc_building_spark.

											
										
										
											2016-03-19 00:32:48 -04:00
+								where `spark-streaming_2.11` is the `artifactId` as defined in `streaming/pom.xml` file.
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Continuous Compilation
-												Update build docs

											
										
										
											2013-08-21 17:51:56 -04:00
 								We use the scala-maven-plugin which supports incremental and continuous compilation. E.g.
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./build/mvn scala:cc
-												Update build docs

											
										
										
											2013-08-21 17:51:56 -04:00
-												[SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac

Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.

Author: Brennon York <brennon.york@capitalone.com>

Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits:

0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this)
9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn
d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn
b979c58 [Brennon York] updated the merge conflict
c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt
b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set
28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead
7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed
14a5da0 [Brennon York] removed unnecessary zinc output on startup
1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable
3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed
a680d12 [Brennon York] Added comments to functions and tested various mvn calls
bb8cc9d [Brennon York] removed package files
ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli
07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version
f914dea [Brennon York] Beginning final portions of localized scala home
69c4e44 [Brennon York] working linux and osx installers for purely local mvn build
4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder
cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability

											
										
										
											2014-12-27 16:25:18 -05:00
+								should run continuous compilation (i.e. wait for changes). However, this has not been tested
-												[SPARK-4668] Fix some documentation typos.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #3523 from ryan-williams/tweaks and squashes the following commits:

d2eddaa [Ryan Williams] code review feedback
ce27fc1 [Ryan Williams] CoGroupedRDD comment nit
c6cfad9 [Ryan Williams] remove unnecessary if statement
b74ea35 [Ryan Williams] comment fix
b0221f0 [Ryan Williams] fix a gendered pronoun
c71ffed [Ryan Williams] use names on a few boolean parameters
89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager
e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io
83e8358 [Ryan Williams] fix pom.xml typo
dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md

											
										
										
											2014-12-15 17:52:17 -05:00
+								extensively. A couple of gotchas to note:
-												[Doc] Minor: Fixes several formatting issues

Fixes several minor formatting issues in the [Continuous Compilation] [1] section.

[1]: http://spark.apache.org/docs/latest/building-spark.html#continuous-compilation

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4316)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #4316 from liancheng/fix-build-instruction-docs and squashes the following commits:

0a92e01 [Cheng Lian] Fixes several formatting issues

											
										
										
											2015-02-03 00:14:21 -05:00
-												[SPARK-4668] Fix some documentation typos.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #3523 from ryan-williams/tweaks and squashes the following commits:

d2eddaa [Ryan Williams] code review feedback
ce27fc1 [Ryan Williams] CoGroupedRDD comment nit
c6cfad9 [Ryan Williams] remove unnecessary if statement
b74ea35 [Ryan Williams] comment fix
b0221f0 [Ryan Williams] fix a gendered pronoun
c71ffed [Ryan Williams] use names on a few boolean parameters
89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager
e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io
83e8358 [Ryan Williams] fix pom.xml typo
dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md

											
										
										
											2014-12-15 17:52:17 -05:00
+								* it only scans the paths `src/main` and `src/test` (see
 								[docs](http://scala-tools.org/mvnsites/maven-scala-plugin/usage_cc.html)), so it will only work
 								from within certain submodules that have that structure.
-												[Doc] Minor: Fixes several formatting issues

Fixes several minor formatting issues in the [Continuous Compilation] [1] section.

[1]: http://spark.apache.org/docs/latest/building-spark.html#continuous-compilation

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4316)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #4316 from liancheng/fix-build-instruction-docs and squashes the following commits:

0a92e01 [Cheng Lian] Fixes several formatting issues

											
										
										
											2015-02-03 00:14:21 -05:00
-												[SPARK-4668] Fix some documentation typos.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #3523 from ryan-williams/tweaks and squashes the following commits:

d2eddaa [Ryan Williams] code review feedback
ce27fc1 [Ryan Williams] CoGroupedRDD comment nit
c6cfad9 [Ryan Williams] remove unnecessary if statement
b74ea35 [Ryan Williams] comment fix
b0221f0 [Ryan Williams] fix a gendered pronoun
c71ffed [Ryan Williams] use names on a few boolean parameters
89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager
e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io
83e8358 [Ryan Williams] fix pom.xml typo
dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md

											
										
										
											2014-12-15 17:52:17 -05:00
+								* you'll typically need to run `mvn install` from the project root for compilation within
 								specific submodules to work; this is because submodules that depend on other submodules do so via
 								the `spark-parent` module).
 								Thus, the full flow for running continuous-compilation of the `core` submodule may look more like:
-												[Doc] Minor: Fixes several formatting issues

Fixes several minor formatting issues in the [Continuous Compilation] [1] section.

[1]: http://spark.apache.org/docs/latest/building-spark.html#continuous-compilation

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4316)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #4316 from liancheng/fix-build-instruction-docs and squashes the following commits:

0a92e01 [Cheng Lian] Fixes several formatting issues

											
										
										
											2015-02-03 00:14:21 -05:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    $ ./build/mvn install
-												Docs small fixes

Author: Jacek Laskowski <jacek@japila.pl>

Closes #8629 from jaceklaskowski/docs-fixes.

											
										
										
											2015-09-08 09:38:10 -04:00
+								    $ cd core
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    $ ../build/mvn scala:cc
-												Update build docs

											
										
										
											2013-08-21 17:51:56 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Speeding up Compilation with Zinc
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								[Zinc](https://github.com/typesafehub/zinc) is a long-running server version of SBT's incremental
 								compiler. When run locally as a background process, it speeds up builds of Scala-based projects
 								like Spark. Developers who regularly recompile Spark with Maven will be the most interested in
 								Zinc. The project site gives instructions for building and running `zinc`; OS X users can
 								install it using `brew install zinc`.
-												Adds page to docs about building using Maven.
Adds links to new instructions in:
* The main Spark project README.md
* The docs nav menu called "More"
* The docs Overview page under the "Building" and "Where to Go from Here" sections

											
										
										
											2013-03-17 17:47:44 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								If using the `build/mvn` package `zinc` will automatically be downloaded and leveraged for all
 								builds. This process will auto-start after the first time `build/mvn` is called and bind to port
 unless the `ZINC_PORT` environment variable is set. The `zinc` process can subsequently be
 								shut down at any time by running `build/zinc-<version>/bin/zinc -shutdown` and will automatically
 								restart whenever `build/mvn` is called.
-												[java8API] SPARK-964 Investigate the potential for using JDK 8 lambda expressions for the Java/Scala APIs

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>

Closes #17 from ScrapCodes/java8-lambdas and squashes the following commits:

95850e6 [Patrick Wendell] Some doc improvements and build changes to the Java 8 patch.
85a954e [Prashant Sharma] Nit. import orderings.
673f7ac [Prashant Sharma] Added support for -java-home as well
80a13e8 [Prashant Sharma] Used fake class tag syntax
26eb3f6 [Prashant Sharma] Patrick's comments on PR.
35d8d79 [Prashant Sharma] Specified java 8 building in the docs
31d4cd6 [Prashant Sharma] Maven build to support -Pjava8-tests flag.
4ab87d3 [Prashant Sharma] Review feedback on the pr
c33dc2c [Prashant Sharma] SPARK-964, Java 8 API Support.

											
										
										
											2014-03-04 01:31:30 -05:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Building with SBT
-												[java8API] SPARK-964 Investigate the potential for using JDK 8 lambda expressions for the Java/Scala APIs

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>

Closes #17 from ScrapCodes/java8-lambdas and squashes the following commits:

95850e6 [Patrick Wendell] Some doc improvements and build changes to the Java 8 patch.
85a954e [Prashant Sharma] Nit. import orderings.
673f7ac [Prashant Sharma] Added support for -java-home as well
80a13e8 [Prashant Sharma] Used fake class tag syntax
26eb3f6 [Prashant Sharma] Patrick's comments on PR.
35d8d79 [Prashant Sharma] Specified java 8 building in the docs
31d4cd6 [Prashant Sharma] Maven build to support -Pjava8-tests flag.
4ab87d3 [Prashant Sharma] Review feedback on the pr
c33dc2c [Prashant Sharma] SPARK-964, Java 8 API Support.

											
										
										
											2014-03-04 01:31:30 -05:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								Maven is the official build tool recommended for packaging Spark, and is the *build of reference*.
 								But SBT is supported for day-to-day development since it can provide much faster iterative
 								compilation. More advanced developers may wish to use SBT.
-												[SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac

Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.

Author: Brennon York <brennon.york@capitalone.com>

Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits:

0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this)
9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn
d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn
b979c58 [Brennon York] updated the merge conflict
c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt
b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set
28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead
7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed
14a5da0 [Brennon York] removed unnecessary zinc output on startup
1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable
3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed
a680d12 [Brennon York] Added comments to functions and tested various mvn calls
bb8cc9d [Brennon York] removed package files
ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli
07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version
f914dea [Brennon York] Beginning final portions of localized scala home
69c4e44 [Brennon York] working linux and osx installers for purely local mvn build
4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder
cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability

											
										
										
											2014-12-27 16:25:18 -05:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables
 								can be set to control the SBT build. For example:
-												[SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled classes

This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8.

In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml.

http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes.

I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9512 from JoshRosen/SPARK-6152.

											
										
										
											2015-11-11 14:16:39 -05:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./build/sbt -Pyarn -Phadoop-2.3 package
-												[SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled classes

This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8.

In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml.

http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes.

I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9512 from JoshRosen/SPARK-6152.

											
										
										
											2015-11-11 14:16:39 -05:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								To avoid the overhead of launching sbt each time you need to re-compile, you can launch sbt
 								in interactive mode by running `build/sbt`, and then run all build commands at the command
 								prompt. For more recommendations on reducing build time, refer to the
 								[wiki page](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-ReducingBuildTimes).
-												SPARK-1064

This reopens PR 649 from incubator-spark against the new repo

Author: Sandy Ryza <sandy@cloudera.com>

Closes #102 from sryza/sandy-spark-1064 and squashes the following commits:

270e490 [Sandy Ryza] Handle different application classpath variables in different versions
88b04e0 [Sandy Ryza] SPARK-1064. Make it possible to run on YARN without bundling Hadoop jars in Spark assembly

											
										
										
											2014-03-12 01:39:17 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Encrypted Filesystems
-												[SPARK-14738][BUILD] Separate docker integration tests from main build

## What changes were proposed in this pull request?

Create a maven profile for executing the docker integration tests using maven
Remove docker integration tests from main sbt build
Update documentation on how to run docker integration tests from sbt

## How was this patch tested?

Manual test of the docker integration tests as in :
mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test

## Other comments

Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348

Author: Luciano Resende <lresende@apache.org>

Closes #12508 from lresende/docker.

											
										
										
											2016-05-06 07:25:45 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								When building on an encrypted filesystem (if your home directory is encrypted, for example), then the Spark build might fail with a "Filename too long" error. As a workaround, add the following in the configuration args of the `scala-maven-plugin` in the project `pom.xml`:
-												[SPARK-14738][BUILD] Separate docker integration tests from main build

## What changes were proposed in this pull request?

Create a maven profile for executing the docker integration tests using maven
Remove docker integration tests from main sbt build
Update documentation on how to run docker integration tests from sbt

## How was this patch tested?

Manual test of the docker integration tests as in :
mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test

## Other comments

Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348

Author: Luciano Resende <lresende@apache.org>

Closes #12508 from lresende/docker.

											
										
										
											2016-05-06 07:25:45 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    <arg>-Xmax-classfile-name</arg>
 								    <arg>128</arg>
-												[SPARK-14738][BUILD] Separate docker integration tests from main build

## What changes were proposed in this pull request?

Create a maven profile for executing the docker integration tests using maven
Remove docker integration tests from main sbt build
Update documentation on how to run docker integration tests from sbt

## How was this patch tested?

Manual test of the docker integration tests as in :
mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test

## Other comments

Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348

Author: Luciano Resende <lresende@apache.org>

Closes #12508 from lresende/docker.

											
										
										
											2016-05-06 07:25:45 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								and in `project/SparkBuild.scala` add:
-												[SPARK-14738][BUILD] Separate docker integration tests from main build

## What changes were proposed in this pull request?

Create a maven profile for executing the docker integration tests using maven
Remove docker integration tests from main sbt build
Update documentation on how to run docker integration tests from sbt

## How was this patch tested?

Manual test of the docker integration tests as in :
mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test

## Other comments

Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348

Author: Luciano Resende <lresende@apache.org>

Closes #12508 from lresende/docker.

											
										
										
											2016-05-06 07:25:45 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    scalacOptions in Compile ++= Seq("-Xmax-classfile-name", "128"),
-												[SPARK-14738][BUILD] Separate docker integration tests from main build

## What changes were proposed in this pull request?

Create a maven profile for executing the docker integration tests using maven
Remove docker integration tests from main sbt build
Update documentation on how to run docker integration tests from sbt

## How was this patch tested?

Manual test of the docker integration tests as in :
mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test

## Other comments

Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348

Author: Luciano Resende <lresende@apache.org>

Closes #12508 from lresende/docker.

											
										
										
											2016-05-06 07:25:45 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								to the `sharedSettings` val. See also [this PR](https://github.com/apache/spark/pull/2883/files) if you are unsure of where to add these lines.
-												[SPARK-14738][BUILD] Separate docker integration tests from main build

## What changes were proposed in this pull request?

Create a maven profile for executing the docker integration tests using maven
Remove docker integration tests from main sbt build
Update documentation on how to run docker integration tests from sbt

## How was this patch tested?

Manual test of the docker integration tests as in :
mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test

## Other comments

Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348

Author: Luciano Resende <lresende@apache.org>

Closes #12508 from lresende/docker.

											
										
										
											2016-05-06 07:25:45 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## IntelliJ IDEA or Eclipse
-												SPARK-1064

This reopens PR 649 from incubator-spark against the new repo

Author: Sandy Ryza <sandy@cloudera.com>

Closes #102 from sryza/sandy-spark-1064 and squashes the following commits:

270e490 [Sandy Ryza] Handle different application classpath variables in different versions
88b04e0 [Sandy Ryza] SPARK-1064. Make it possible to run on YARN without bundling Hadoop jars in Spark assembly

											
										
										
											2014-03-12 01:39:17 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								For help in setting up IntelliJ IDEA or Eclipse for Spark development, and troubleshooting, refer to the
 								[wiki page for IDE setup](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup).
-												SPARK-1064

This reopens PR 649 from incubator-spark against the new repo

Author: Sandy Ryza <sandy@cloudera.com>

Closes #102 from sryza/sandy-spark-1064 and squashes the following commits:

270e490 [Sandy Ryza] Handle different application classpath variables in different versions
88b04e0 [Sandy Ryza] SPARK-1064. Make it possible to run on YARN without bundling Hadoop jars in Spark assembly

											
										
										
											2014-03-12 01:39:17 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								# Running Tests
-												SPARK-3069 [DOCS] Build instructions in README are outdated

Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.

(Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)

Author: Sean Owen <sowen@cloudera.com>

Closes #2014 from srowen/SPARK-3069 and squashes the following commits:

501507e [Sean Owen] Note that Zinc is for Maven builds too
db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc
be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark
91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md
999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT
c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md
8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page
b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md

											
										
										
											2014-09-16 12:18:03 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin).
-												SPARK-3069 [DOCS] Build instructions in README are outdated

Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.

(Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)

Author: Sean Owen <sowen@cloudera.com>

Closes #2014 from srowen/SPARK-3069 and squashes the following commits:

501507e [Sean Owen] Note that Zinc is for Maven builds too
db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc
be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark
91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md
999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT
c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md
8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page
b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md

											
										
										
											2014-09-16 12:18:03 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								Some of the tests require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time.  The following is an example of a correct (build, test) sequence:
-												SPARK-3069 [DOCS] Build instructions in README are outdated

Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.

(Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)

Author: Sean Owen <sowen@cloudera.com>

Closes #2014 from srowen/SPARK-3069 and squashes the following commits:

501507e [Sean Owen] Note that Zinc is for Maven builds too
db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc
be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark
91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md
999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT
c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md
8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page
b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md

											
										
										
											2014-09-16 12:18:03 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./build/mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean package
 								    ./build/mvn -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test
 								The ScalaTest plugin also supports running only a specific Scala test suite as follows:
 								    ./build/mvn -P... -Dtest=none -DwildcardSuites=org.apache.spark.repl.ReplSuite test
 								    ./build/mvn -P... -Dtest=none -DwildcardSuites=org.apache.spark.repl.* test
 								or a Java test:
-												[SPARK-11056] Improve documentation of SBT build.

This commit improves the documentation around building Spark to
(1) recommend using SBT interactive mode to avoid the overhead of
launching SBT and (2) refer to the wiki page that documents using
SPARK_PREPEND_CLASSES to avoid creating the assembly jar for each
compile.

cc srowen

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #9068 from kayousterhout/SPARK-11056.

											
										
										
											2015-10-12 17:23:29 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./build/mvn test -P... -DwildcardSuites=none -Dtest=org.apache.spark.streaming.JavaAPISuite
 								## Testing with SBT
-												[SPARK-3765][Doc] Add test information to sbt build docs

Add testing with sbt to doc ```building-spark.md```

Author: scwf <wangfei1@huawei.com>

Closes #2629 from scwf/sbt-doc and squashes the following commits:

fd9cf29 [scwf] add testing with sbt to docs

											
										
										
											2014-10-06 00:36:20 -04:00
-												[SPARK-14424][BUILD][DOCS] Update the build docs to switch from assembly to package and add a no…

## What changes were proposed in this pull request?

Change our build docs & shell scripts to that developers are aware of the change from "assembly" to "package"

## How was this patch tested?

Manually ran ./bin/spark-shell after ./build/sbt assembly and verified error message printed, ran new suggested build target and verified ./bin/spark-shell runs after this.

Author: Holden Karau <holden@pigscanfly.ca>
Author: Holden Karau <holden@us.ibm.com>

Closes #12197 from holdenk/SPARK-1424-spark-class-broken-fix-build-docs.

											
										
										
											2016-04-06 19:00:29 -04:00
+								Some of the tests require Spark to be packaged first, so always run `build/sbt package` the first time.  The following is an example of a correct (build, test) sequence:
-												[SPARK-3765][Doc] Add test information to sbt build docs

Add testing with sbt to doc ```building-spark.md```

Author: scwf <wangfei1@huawei.com>

Closes #2629 from scwf/sbt-doc and squashes the following commits:

fd9cf29 [scwf] add testing with sbt to docs

											
										
										
											2014-10-06 00:36:20 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver package
 								    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test
-												[SPARK-3765][Doc] Add test information to sbt build docs

Add testing with sbt to doc ```building-spark.md```

Author: scwf <wangfei1@huawei.com>

Closes #2629 from scwf/sbt-doc and squashes the following commits:

fd9cf29 [scwf] add testing with sbt to docs

											
										
										
											2014-10-06 00:36:20 -04:00
 								To run only a specific test suite as follows:
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver "test-only org.apache.spark.repl.ReplSuite"
 								    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver "test-only org.apache.spark.repl.*"
-												[SPARK-3765][Doc] Add test information to sbt build docs

Add testing with sbt to doc ```building-spark.md```

Author: scwf <wangfei1@huawei.com>

Closes #2629 from scwf/sbt-doc and squashes the following commits:

fd9cf29 [scwf] add testing with sbt to docs

											
										
										
											2014-10-06 00:36:20 -04:00
 								To run test suites of a specific sub project as follows:
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver core/test
-												[SPARK-3765][Doc] Add test information to sbt build docs

Add testing with sbt to doc ```building-spark.md```

Author: scwf <wangfei1@huawei.com>

Closes #2629 from scwf/sbt-doc and squashes the following commits:

fd9cf29 [scwf] add testing with sbt to docs

											
										
										
											2014-10-06 00:36:20 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								## Running Java 8 Test Suites
-												SPARK-3069 [DOCS] Build instructions in README are outdated

Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.

(Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)

Author: Sean Owen <sowen@cloudera.com>

Closes #2014 from srowen/SPARK-3069 and squashes the following commits:

501507e [Sean Owen] Note that Zinc is for Maven builds too
db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc
be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark
91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md
999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT
c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md
8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page
b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md

											
										
										
											2014-09-16 12:18:03 -04:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								Running only Java 8 tests and nothing else.
-												[SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac

Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.

Author: Brennon York <brennon.york@capitalone.com>

Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits:

0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this)
9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn
d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn
b979c58 [Brennon York] updated the merge conflict
c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt
b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set
28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead
7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed
14a5da0 [Brennon York] removed unnecessary zinc output on startup
1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable
3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed
a680d12 [Brennon York] Added comments to functions and tested various mvn calls
bb8cc9d [Brennon York] removed package files
ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli
07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version
f914dea [Brennon York] Beginning final portions of localized scala home
69c4e44 [Brennon York] working linux and osx installers for purely local mvn build
4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder
cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability

											
										
										
											2014-12-27 16:25:18 -05:00
-												[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki

## What changes were proposed in this pull request?

See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:

- Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
  - Docker
  - R
  - Running one test
- standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
- correcting some typos
- standardizing code block formatting

No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki

## How was this patch tested?

Jekyll doc build and inspection of resulting HTML in browser.

Author: Sean Owen <sowen@cloudera.com>

Closes #13124 from srowen/SPARK-15333.

											
										
										
											2016-05-17 11:40:38 -04:00
+								    ./build/mvn install -DskipTests
 								    ./build/mvn -pl :java8-tests_2.11 test
 								or
 								    ./build/sbt java8-tests/test
 								Java 8 tests are automatically enabled when a Java 8 JDK is detected.
 								If you have JDK 8 installed but it is not the system default, you can set JAVA_HOME to point to JDK 8 before running the tests.
 								## PySpark Tests with Maven
 								If you are building PySpark and wish to run the PySpark tests you will need to build Spark with Hive support.
 								    ./build/mvn -DskipTests clean package -Phive
 								    ./python/run-tests
 								The run-tests script also can be limited to a specific Python version or a specific module
 								    ./python/run-tests --python-executables=python --modules=pyspark-sql
 								**Note:** You can also run Python tests with an sbt build, provided you build Spark with Hive support.
 								## Running R Tests
 								To run the SparkR tests you will need to install the R package `testthat`
 								(run `install.packages(testthat)` from R shell).  You can run just the SparkR tests using
 								the command:
 								    ./R/run-tests.sh
 								## Running Docker-based Integration Test Suites
 								In order to run Docker integration tests, you have to install the `docker` engine on your box.
 								The instructions for installation can be found at [the Docker site](https://docs.docker.com/engine/installation/).
 								Once installed, the `docker` service needs to be started, if not already running.
 								On Linux, this can be done by `sudo service docker start`.
 								    ./build/mvn install -DskipTests
 								    ./build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11
 								or
 								    ./build/sbt docker-integration-tests/test