### What changes were proposed in this pull request?
According to the dev mailing list discussion, this PR aims to switch the default Apache Hadoop dependency from 2.7.4 to 3.2.0 for Apache Spark 3.1.0 on December 2020.
| Item | Default Hadoop Dependency |
|------|-----------------------------|
| Apache Spark Website | 3.2.0 |
| Apache Download Site | 3.2.0 |
| Apache Snapshot | 3.2.0 |
| Maven Central | 3.2.0 |
| PyPI | 2.7.4 (We will switch later) |
| CRAN | 2.7.4 (We will switch later) |
| Homebrew | 3.2.0 (already) |
In Apache Spark 3.0.0 release, we focused on the other features. This PR targets for [Apache Spark 3.1.0 scheduled on December 2020](https://spark.apache.org/versioning-policy.html).
### Why are the changes needed?
Apache Hadoop 3.2 has many fixes and new cloud-friendly features.
**Reference**
- 2017-08-04: https://hadoop.apache.org/release/2.7.4.html
- 2019-01-16: https://hadoop.apache.org/release/3.2.0.html
### Does this PR introduce _any_ user-facing change?
Since the default Hadoop dependency changes, the users will get a better support in a cloud environment.
### How was this patch tested?
Pass the Jenkins.
Closes#28897 from dongjoon-hyun/SPARK-32058.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
When removing non-existing files in the release script, do not fail.
### Why are the changes needed?
This is to make the release script more robust, as we don't care if the files exist before we remove them.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
tested when cutting 3.0.0 RC
Closes#28815 from cloud-fan/release.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Only push the release tag after the build has finished.
### Why are the changes needed?
If the build fails we don't need a release tag.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Running locally with a fake user in https://github.com/apache/spark/pull/28667Closes#28700 from holdenk/SPARK-31860-build-master-only-push-tags-on-success.
Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
### What changes were proposed in this pull request?
This PR aims to use the style that is compatible with both python 2 and 3.
### Why are the changes needed?
This will help python 3 migration.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual.
Closes#28632 from williamhyun/use_python3_style.
Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR adds `i` option to ignore additional `build/mvn` output which is irrelevant to version string.
### Why are the changes needed?
SPARK-28963 added additional output message, `Falling back to archive.apache.org to download Maven` in build/mvn. This breaks `dev/create-release/release-build.sh` and currently Spark Packaging Jenkins job is hitting this issue consistently and broken.
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2912/console
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This happens only when the mirror fails. So, this is verified manually hiject the script. It works like the following.
```
$ echo 'Falling back to archive.apache.org to download Maven' > out
$ build/mvn help:evaluate -Dexpression=project.version >> out
Using `mvn` from path: /Users/dongjoon/PRS/SPARK_RELEASE_2/build/apache-maven-3.6.3/bin/mvn
$ cat out | grep -v INFO | grep -v WARNING | grep -v Download
Falling back to archive.apache.org to download Maven
3.1.0-SNAPSHOT
$ cat out | grep -v INFO | grep -v WARNING | grep -vi Download
3.1.0-SNAPSHOT
```
Closes#28514 from dongjoon-hyun/SPARK_RELEASE_2.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR fix incorrect pyspark version when releasing preview versions.
### Why are the changes needed?
Failed to make Spark binary distribution:
```
cp: cannot stat 'spark-3.0.0-preview2-bin-hadoop2.7/python/dist/pyspark-3.0.0.dev02.tar.gz': No such file or directory
gpg: can't open 'pyspark-3.0.0.dev02.tar.gz': No such file or directory
gpg: signing failed: No such file or directory
gpg: pyspark-3.0.0.dev02.tar.gz: No such file or directory
```
```
yumwangubuntu-3513086:~/spark-release/output$ ll spark-3.0.0-preview2-bin-hadoop2.7/python/dist/
total 214140
drwxr-xr-x 2 yumwang stack 4096 Dec 16 06:17 ./
drwxr-xr-x 9 yumwang stack 4096 Dec 16 06:17 ../
-rw-r--r-- 1 yumwang stack 219267173 Dec 16 06:17 pyspark-3.0.0.dev2.tar.gz
```
```
/usr/local/lib/python3.6/dist-packages/setuptools/dist.py:476: UserWarning: Normalizing '3.0.0.dev02' to '3.0.0.dev2'
normalized_version,
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
manual test:
```
LM-SHC-16502798:spark yumwang$ SPARK_VERSION=3.0.0-preview2
LM-SHC-16502798:spark yumwang$ echo "$SPARK_VERSION" | sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/"
3.0.0.dev2
```
Closes#26909 from wangyum/SPARK-30268.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to add another pre-built binary distribution with `-Phadoop-2.7 -Phive-1.2` at `Apache Spark 3.0.0`.
**PRE-BUILT BINARY DISTRIBUTION**
```
spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz
spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz.asc
spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz.sha512
```
**CONTENTS (snippet)**
```
$ ls *hadoop-*
hadoop-annotations-2.7.4.jar hadoop-mapreduce-client-shuffle-2.7.4.jar
hadoop-auth-2.7.4.jar hadoop-yarn-api-2.7.4.jar
hadoop-client-2.7.4.jar hadoop-yarn-client-2.7.4.jar
hadoop-common-2.7.4.jar hadoop-yarn-common-2.7.4.jar
hadoop-hdfs-2.7.4.jar hadoop-yarn-server-common-2.7.4.jar
hadoop-mapreduce-client-app-2.7.4.jar hadoop-yarn-server-web-proxy-2.7.4.jar
hadoop-mapreduce-client-common-2.7.4.jar parquet-hadoop-1.10.1.jar
hadoop-mapreduce-client-core-2.7.4.jar parquet-hadoop-bundle-1.6.0.jar
hadoop-mapreduce-client-jobclient-2.7.4.jar
$ ls *hive-*
hive-beeline-1.2.1.spark2.jar hive-jdbc-1.2.1.spark2.jar
hive-cli-1.2.1.spark2.jar hive-metastore-1.2.1.spark2.jar
hive-exec-1.2.1.spark2.jar spark-hive-thriftserver_2.12-3.0.0-SNAPSHOT.jar
```
### Why are the changes needed?
Since Apache Spark switched to use `-Phive-2.3` by default, all pre-built binary distribution will use `-Phive-2.3`. This PR adds `hadoop-2.7/hive-1.2` distribution to provide a similar combination like `Apache Spark 2.4` line.
### Does this PR introduce any user-facing change?
Yes. This is additional distribution which resembles to `Apache Spark 2.4` line in terms of `hive` version.
### How was this patch tested?
Manual.
Please note that we need a dry-run mode, but the AS-IS release script do not generate additional combinations including this in `dry-run` mode.
Closes#26688 from dongjoon-hyun/SPARK-29989.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
### What changes were proposed in this pull request?
This PR aims to add `-Phive-2.3` to publish profiles.
Since Apache Spark 3.0.0, Maven artifacts will be publish with Apache Hive 2.3 profile only.
This PR also will recover `SNAPSHOT` publishing Jenkins job.
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
We will provide the pre-built distributions (with Hive 1.2.1 also) like Apache Spark 2.4.
SPARK-29989 will update the release script to generate all combinations.
### Why are the changes needed?
This will reduce the explicit dependency on the illegitimate Hive fork in Maven repository.
### Does this PR introduce any user-facing change?
Yes, but this is dev only changes.
### How was this patch tested?
Manual.
Closes#26648 from dongjoon-hyun/SPARK-30007.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
`release-build.sh` fail to publish release under dry run mode with the following error message:
```
/opt/spark-rm/release-build.sh: line 429: pushd: spark-repo-g4MBm/org/apache/spark: No such file or directory
```
We need to at least run the `mvn clean install` command once to create the `$tmp_repo` path, but now those steps are all skipped under dry-run mode. This PR fixes the issue.
### How was this patch tested?
Tested locally.
Closes#26329 from jiangxb1987/dryrun.
Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Update `release-build.sh`, to allow pyspark version name format `${versionNumber}-preview`, otherwise the release script won't generate pyspark release tarballs.
### How was this patch tested?
Tested locally.
Closes#26306 from jiangxb1987/buildPython.
Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to add `hadoop-3.2` profile to pre-built binary package releases.
### Why are the changes needed?
Since Apache Spark 3.0.0, we provides Hadoop 3.2 pre-built binary.
### Does this PR introduce any user-facing change?
No. (Although the artifacts are available, this change is for release managers).
### How was this patch tested?
Manual. Please note that `DRY_RUN=0` disables these combination.
```
$ dev/create-release/release-build.sh package
...
Packages to build: without-hadoop hadoop3.2 hadoop2.7
make_binary_release without-hadoop -Pscala-2.12 -Phadoop-provided 2.12
make_binary_release hadoop3.2 -Pscala-2.12 -Phadoop-3.2 -Phive -Phive-thriftserver 2.12
make_binary_release hadoop2.7 -Pscala-2.12 -Phadoop-2.7 -Phive -Phive-thriftserver withpip,withr 2.12
```
Closes#26260 from dongjoon-hyun/SPARK-29608.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR allows `bin/spark-submit --version` to show the correct information while the previous versions, which were created by `dev/create-release/do-release-docker.sh`, show incorrect information.
There are two root causes to show incorrect information:
1. Did not pass `USER` environment variable to the docker container
1. Did not keep `.git` directory in the work directory
### Why are the changes needed?
The information is missing while the previous versions show the correct information.
### Does this PR introduce any user-facing change?
Yes, the following is the console output in branch-2.3
```
$ bin/spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.4
/_/
Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_212
Branch HEAD
Compiled by user ishizaki on 2019-09-02T02:18:10Z
Revision 8c6f8150f3
Url https://gitbox.apache.org/repos/asf/spark.git
Type --help for more information.
```
Without this PR, the console output is as follows
```
$ spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.4
/_/
Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_212
Branch
Compiled by user on 2019-08-26T08:29:39Z
Revision
Url
Type --help for more information.
```
### How was this patch tested?
After building the package, I manually executed `bin/spark-submit --version`
Closes#25655 from kiszk/SPARK-28906.
Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below.
## How was this patch tested?
Existing tests.
Closes#23098 from srowen/SPARK-26132.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Upgrade Docker image for release build to Ubuntu 18.04LTS
## How was this patch tested?
Manually tested.
Closes#23932 from dbtsai/ubuntu18.04.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
To skip some steps to remove binary license/notice files in a source release for branch2.3 (these files only exist in master/branch-2.4 now), this pr checked a Spark release version in `dev/create-release/release-build.sh`.
## How was this patch tested?
Manually checked.
Closes#23538 from maropu/FixReleaseScript.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
When running the release script, you will be interrupted unexpectedly
```
ATTENTION! Your password for authentication realm:
<https://dist.apache.org:443> ASF Committers
can only be stored to disk unencrypted! You are advised to configure
your system so that Subversion can store passwords encrypted, if
possible. See the documentation for details.
You can avoid future appearances of this warning by setting the value
of the 'store-plaintext-passwords' option to either 'yes' or 'no' in
'/home/spark-rm/.subversion/servers'.
-----------------------------------------------------------------------
Store password unencrypted (yes/no)?
```
We can avoid it by adding `--no-auth-cache` when running svn command.
## How was this patch tested?
manually verified with 2.4.0 RC5
Closes#22885 from cloud-fan/svn.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
Remove Kafka 0.8 integration
## How was this patch tested?
Existing tests, build scripts
Closes#22703 from srowen/SPARK-25705.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Removes all vestiges of Flume in the build, for Spark 3.
I don't think this needs Jenkins config changes.
## How was this patch tested?
Existing tests.
Closes#22692 from srowen/SPARK-25598.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Remove Hadoop 2.6 references and make 2.7 the default.
Obviously, this is for master/3.0.0 only.
After this we can also get rid of the separate test jobs for Hadoop 2.6.
## How was this patch tested?
Existing tests
Closes#22615 from srowen/SPARK-25016.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This is a follow up for #22441.
1. Remove flag "-Pkafka-0-8" for Scala 2.12 build.
2. Clean up the script, simpler logic.
3. Switch to Scala version to 2.11 before script exit.
## How was this patch tested?
Manual test.
Closes#22454 from gengliangwang/revise_release_build.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
update the package and publish steps, to support scala 2.12
## How was this patch tested?
manual test
Closes#22441 from cloud-fan/scala.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
These 2 changes are required to build the docs for Spark 2.4.0 RC1:
1. install `mkdocs` in the docker image
2. set locale to C.UTF-8. Otherwise jekyll fails to build the doc.
## How was this patch tested?
tested manually when doing the 2.4.0 RC1
Closes#22438 from cloud-fan/infra.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
Fix location of licenses-binary in binary release, and remove binary items from source release
## How was this patch tested?
N/A
Closes#22436 from srowen/SPARK-24654.2.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
`without-hadoop` profile doesn't exist in Maven, instead the name should be `hadoop-provided`, this is a regression introduced by SPARK-24685. So here fix it.
## How was this patch tested?
Local test.
Closes#22434 from jerryshao/SPARK-24685-followup.
Authored-by: jerryshao <sshao@hortonworks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Update the release scripts to build binary packages for older versions
of Hadoop when building Spark 2.1. Also did some minor refactoring of that
part of the script so that changing these later is easier.
This was used to build the missing packages from 2.1.3-rc2.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#21661 from vanzin/SPARK-24685.
The "do-release.sh" script asks questions about the RC being prepared,
trying to find out as much as possible automatically, and then executes
the existing scripts with proper arguments to prepare the release. This
script was used to prepare the 2.3.1 release candidates, so was tested
in that context.
The docker version runs that same script inside a docker image especially
crafted for building Spark releases. That image is based on the work
by Felix C. linked in the bug. At this point is has been only midly
tested.
I also added a template for the vote e-mail, with placeholders for
things that need to be replaced, although there is no automation around
that for the moment. It shouldn't be hard to hook up certain things like
version and tags to this, or to figure out certain things like the
repo URL from the output of the release scripts.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#21515 from vanzin/SPARK-24372.
The repository.apache.org server still requires md5 checksums or
it won't publish the staging repo.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#21338 from vanzin/SPARK-23601.
## What changes were proposed in this pull request?
Remove .md5 files from release artifacts
## How was this patch tested?
N/A
Author: Sean Owen <sowen@cloudera.com>
Closes#20737 from srowen/SPARK-23601.
## What changes were proposed in this pull request?
Including the `-Pkubernetes` flag in a few places it was missed.
## How was this patch tested?
checkstyle, mima through manual tests.
Author: foxish <ramanathana@google.com>
Closes#20256 from foxish/SPARK-23063.
## What changes were proposed in this pull request?
Change to dist.apache.org instead of home directory
sha512 should have .sha512 extension. From ASF release signing doc: "The checksum SHOULD be generated using SHA-512. A .sha file SHOULD contain a SHA-1 checksum, for historical reasons."
NOTE: I *think* should require some changes to work with Jenkins' release build
## How was this patch tested?
manually
Author: Felix Cheung <felixcheung_m@hotmail.com>
Closes#19754 from felixcheung/releasescript.
## What changes were proposed in this pull request?
Move flume behind a profile, take 2. See https://github.com/apache/spark/pull/19365 for most of the back-story.
This change should fix the problem by removing the examples module dependency and moving Flume examples to the module itself. It also adds deprecation messages, per a discussion on dev about deprecating for 2.3.0.
## How was this patch tested?
Existing tests, which still enable flume integration.
Author: Sean Owen <sowen@cloudera.com>
Closes#19412 from srowen/SPARK-22142.2.
## What changes were proposed in this pull request?
Use the GPG_KEY param, fix lsof to non-hardcoded path, remove version swap since it wasn't really needed. Use EXPORT on JAVA_HOME for downstream scripts as well.
## How was this patch tested?
Rolled 2.1.2 RC2
Author: Holden Karau <holden@us.ibm.com>
Closes#19359 from holdenk/SPARK-22129-fix-signing.
## What changes were proposed in this pull request?
Add 'flume' profile to enable Flume-related integration modules
## How was this patch tested?
Existing tests; no functional change
Author: Sean Owen <sowen@cloudera.com>
Closes#19365 from srowen/SPARK-22142.
## What changes were proposed in this pull request?
Check JDK version (with javac) and use SPARK_VERSION for publish-release
## How was this patch tested?
Manually tried local build with wrong JDK / JAVA_HOME & built a local release (LFTP disabled)
Author: Holden Karau <holden@us.ibm.com>
Closes#19312 from holdenk/improve-release-scripts-r2.
## What changes were proposed in this pull request?
Put Kafka 0.8 support behind a kafka-0-8 profile.
## How was this patch tested?
Existing tests, but, until PR builder and Jenkins configs are updated the effect here is to not build or test Kafka 0.8 support at all.
Author: Sean Owen <sowen@cloudera.com>
Closes#19134 from srowen/SPARK-21893.
…build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure
## What changes were proposed in this pull request?
This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.
In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.
It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release.
- Scalatest 2.x -> 3.0.3
- Chill 0.8.0 -> 0.8.4
- Clapper 1.0.x -> 1.1.2
- json4s 3.2.x -> 3.4.2
- Jackson 2.6.x -> 2.7.9 (required by json4s)
This change does _not_ fully enable a Scala 2.12 build:
- It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here
- It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too.
What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.
## How was this patch tested?
Existing tests and build. Manually tested with `./dev/change-scala-version.sh 2.12` to verify it compiles, modulo the exceptions above.
Author: Sean Owen <sowen@cloudera.com>
Closes#18645 from srowen/SPARK-14280.
## What changes were proposed in this pull request?
- Remove Scala 2.10 build profiles and support
- Replace some 2.10 support in scripts with commented placeholders for 2.12 later
- Remove deprecated API calls from 2.10 support
- Remove usages of deprecated context bounds where possible
- Remove Scala 2.10 workarounds like ScalaReflectionLock
- Other minor Scala warning fixes
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#17150 from srowen/SPARK-19810.
## What changes were proposed in this pull request?
Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar.
## How was this patch tested?
Ran `make-distribution` locally
Author: Holden Karau <holden@us.ibm.com>
Closes#17885 from holdenk/SPARK-20627-remove-pip-local-version-string.
## What changes were proposed in this pull request?
The master snapshot publisher builds are currently broken due to two minor build issues:
1. For unknown reasons, the LFTP `mkdir -p` command began throwing errors when the remote directory already exists. This change of behavior might have been caused by configuration changes in the ASF's SFTP server, but I'm not entirely sure of that. To work around this problem, this patch updates the script to ignore errors from the `lftp mkdir -p` commands.
2. The PySpark `setup.py` file references a non-existent `pyspark.ml.stat` module, causing Python packaging to fail by complaining about a missing directory. The fix is to simply drop that line from the setup script.
## How was this patch tested?
The LFTP fix was tested by manually running the failing commands on AMPLab Jenkins against the ASF SFTP server. The PySpark fix was tested locally.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#17437 from JoshRosen/spark-20102.
- Move external/java8-tests tests into core, streaming, sql and remove
- Remove MaxPermGen and related options
- Fix some reflection / TODOs around Java 8+ methods
- Update doc references to 1.7/1.8 differences
- Remove Java 7/8 related build profiles
- Update some plugins for better Java 8 compatibility
- Fix a few Java-related warnings
For the future:
- Update Java 8 examples to fully use Java 8
- Update Java tests to use lambdas for simplicity
- Update Java internal implementations to use lambdas
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#16871 from srowen/SPARK-19493.
## What changes were proposed in this pull request?
- Remove support for Hadoop 2.5 and earlier
- Remove reflection and code constructs only needed to support multiple versions at once
- Update docs to reflect newer versions
- Remove older versions' builds and profiles.
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#16810 from srowen/SPARK-19464.
Fix SparkR package copy regex. The existing code leads to
```
Copying release tarballs to /home/****/public_html/spark-nightly/spark-branch-2.1-bin/spark-2.1.1-SNAPSHOT-2016_12_08_22_38-e8f351f-bin
mput: SparkR-*: no files found
```
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#16231 from shivaram/typo-sparkr-build.
## What changes were proposed in this pull request?
Copy pyspark and SparkR packages to latest release dir, as per comment [here](https://github.com/apache/spark/pull/16226#discussion_r91664822)
Author: Felix Cheung <felixcheung_m@hotmail.com>
Closes#16227 from felixcheung/pyrftp.
This PR adds a line in release-build.sh to copy the SparkR source archive using LFTP
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#16226 from shivaram/fix-sparkr-copy-build.