Commit graph

206 commits

Author SHA1 Message Date
Nicholas Chammas bda0669110 [SPARK-30665][DOCS][BUILD][PYTHON] Eliminate pypandoc dependency
### What changes were proposed in this pull request?

This PR removes any dependencies on pypandoc. It also makes related tweaks to the docs README to clarify the dependency on pandoc (not pypandoc).

### Why are the changes needed?

We are using pypandoc to convert the Spark README from Markdown to ReST for PyPI. PyPI now natively supports Markdown, so we don't need pypandoc anymore. The dependency on pypandoc also sometimes causes issues when installing Python packages that depend on PySpark, as described in #18981.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually:

```sh
python -m venv venv
source venv/bin/activate
pip install -U pip

cd python/
python setup.py sdist
pip install dist/pyspark-3.0.0.dev0.tar.gz
pyspark --version
```

I also built the PySpark and R API docs with `jekyll` and reviewed them locally.

It would be good if a maintainer could also test this by creating a PySpark distribution and uploading it to [Test PyPI](https://test.pypi.org) to confirm the README looks as it should.

Closes #27376 from nchammas/SPARK-30665-pypandoc.

Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-01-30 16:40:38 +09:00
Yuming Wang 5de5e46624 [SPARK-30268][INFRA] Fix incorrect pyspark version when releasing preview versions
### What changes were proposed in this pull request?

This PR fix incorrect pyspark version when releasing preview versions.

### Why are the changes needed?

Failed to make Spark binary distribution:
```
cp: cannot stat 'spark-3.0.0-preview2-bin-hadoop2.7/python/dist/pyspark-3.0.0.dev02.tar.gz': No such file or directory
gpg: can't open 'pyspark-3.0.0.dev02.tar.gz': No such file or directory
gpg: signing failed: No such file or directory
gpg: pyspark-3.0.0.dev02.tar.gz: No such file or directory
```

```
yumwangubuntu-3513086:~/spark-release/output$ ll spark-3.0.0-preview2-bin-hadoop2.7/python/dist/
total 214140
drwxr-xr-x 2 yumwang stack      4096 Dec 16 06:17 ./
drwxr-xr-x 9 yumwang stack      4096 Dec 16 06:17 ../
-rw-r--r-- 1 yumwang stack 219267173 Dec 16 06:17 pyspark-3.0.0.dev2.tar.gz
```

```
/usr/local/lib/python3.6/dist-packages/setuptools/dist.py:476: UserWarning: Normalizing '3.0.0.dev02' to '3.0.0.dev2'
  normalized_version,
```

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
```
LM-SHC-16502798:spark yumwang$ SPARK_VERSION=3.0.0-preview2
LM-SHC-16502798:spark yumwang$ echo "$SPARK_VERSION" |  sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/"
3.0.0.dev2

```

Closes #26909 from wangyum/SPARK-30268.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-17 10:22:29 +09:00
Yuming Wang ba0f59bfaf [SPARK-30265][INFRA] Do not change R version when releasing preview versions
### What changes were proposed in this pull request?
This PR makes it do not change R version when releasing preview versions.

### Why are the changes needed?
Failed to make Spark binary distribution:
```
++ . /opt/spark-rm/output/spark-3.0.0-preview2-bin-hadoop2.7/R/find-r.sh
+++ '[' -z /usr/bin ']'
++ /usr/bin/Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }'
Loading required package: usethis
Updating SparkR documentation
First time using roxygen2. Upgrading automatically...
Loading SparkR
Invalid DESCRIPTION:
Malformed package version.

See section 'The DESCRIPTION file' in the 'Writing R Extensions'
manual.

Error: invalid version specification '3.0.0-preview2'
In addition: Warning message:
roxygen2 requires Encoding: UTF-8
Execution halted
[ERROR] Command execution failed.
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
    at org.apache.commons.exec.DefaultExecutor.executeInternal (DefaultExecutor.java:404)
    at org.apache.commons.exec.DefaultExecutor.execute (DefaultExecutor.java:166)
    at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:804)
    at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:751)
    at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:313)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:193)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-preview2:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 18.619 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 13.652 s]
[INFO] Spark Project Sketch ............................... SUCCESS [  5.673 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  2.081 s]
[INFO] Spark Project Networking ........................... SUCCESS [  3.509 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  0.993 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  7.556 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  5.522 s]
[INFO] Spark Project Core ................................. FAILURE [01:06 min]
[INFO] Spark Project ML Local Library ..................... SKIPPED
[INFO] Spark Project GraphX ............................... SKIPPED
[INFO] Spark Project Streaming ............................ SKIPPED
[INFO] Spark Project Catalyst ............................. SKIPPED
[INFO] Spark Project SQL .................................. SKIPPED
[INFO] Spark Project ML Library ........................... SKIPPED
[INFO] Spark Project Tools ................................ SKIPPED
[INFO] Spark Project Hive ................................. SKIPPED
[INFO] Spark Project Graph API ............................ SKIPPED
[INFO] Spark Project Cypher ............................... SKIPPED
[INFO] Spark Project Graph ................................ SKIPPED
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project Assembly ............................. SKIPPED
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SKIPPED
[INFO] Spark Integration for Kafka 0.10 ................... SKIPPED
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED
[INFO] Spark Avro ......................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  02:04 min
[INFO] Finished at: 2019-12-16T08:02:45Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:exec (sparkr-pkg) on project spark-core_2.12: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :spark-core_2.12
```

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
```diff
diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R
index cdb59093781..b648c51e010 100644
--- a/R/pkg/R/sparkR.R
+++ b/R/pkg/R/sparkR.R
 -336,8 +336,8  sparkR.session <- function(

   # Check if version number of SparkSession matches version number of SparkR package
   jvmVersion <- callJMethod(sparkSession, "version")
-  # Remove -SNAPSHOT from jvm versions
-  jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion)
+  # Remove -preview2 from jvm versions
+  jvmVersionStrip <- gsub("-preview2", "", jvmVersion)
   rPackageVersion <- paste0(packageVersion("SparkR"))

   if (jvmVersionStrip != rPackageVersion) {

```

Closes #26904 from wangyum/SPARK-30265.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-16 04:54:12 -07:00
Yuming Wang 26b658f6fb [SPARK-30253][INFRA] Do not add commits when releasing preview version
### What changes were proposed in this pull request?

This PR add support do not add commits to master branch when releasing preview version.

### Why are the changes needed?

We need manual revert this change, example:
![image](https://user-images.githubusercontent.com/5399861/70788945-f9d15180-1dcc-11ea-81f5-c0d89c28440a.png)

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?

manual test

Closes #26879 from wangyum/SPARK-30253.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-15 19:44:29 -07:00
Yuming Wang e1ee3fb72f [SPARK-30216][INFRA] Use python3 in Docker release image
### What changes were proposed in this pull request?

- Reverts commit 1f94bf4 and d6be46e
- Switches python to python3 in Docker release image.

### Why are the changes needed?
`dev/make-distribution.sh` and `python/setup.py` are use python3.
https://github.com/apache/spark/pull/26844/files#diff-ba2c046d92a1d2b5b417788bfb5cb5f8L236
https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?

manual test:
```
yumwangubuntu-3513086:~/spark$ dev/create-release/do-release-docker.sh -n -d /home/yumwang/spark-release
Output directory already exists. Overwrite and continue? [y/n] y
Branch [branch-2.4]: master
Current branch version is 3.0.0-SNAPSHOT.
Release [3.0.0]: 3.0.0-preview2
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [master]:
ASF user [yumwang]:
Full name [Yuming Wang]:
GPG key [yumwangapache.org]: DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338
================
Release details:
BRANCH:     master
VERSION:    3.0.0-preview2
TAG:        v3.0.0-preview2-rc1
NEXT:       3.0.1-SNAPSHOT

ASF USER:   yumwang
GPG KEY:    DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338
FULL NAME:  Yuming Wang
E-MAIL:     yumwangapache.org
================
Is this info correct [y/n]? y
GPG passphrase:

========================
= Building spark-rm image with tag latest...
Command: docker build -t spark-rm:latest --build-arg UID=110302528 /home/yumwang/spark/dev/create-release/spark-rm
Log file: docker-build.log
Building v3.0.0-preview2-rc1; output will be at /home/yumwang/spark-release/output

gpg: directory '/home/spark-rm/.gnupg' created
gpg: keybox '/home/spark-rm/.gnupg/pubring.kbx' created
gpg: /home/spark-rm/.gnupg/trustdb.gpg: trustdb created
gpg: key 6E1B4122F6A3A338: public key "Yuming Wang <yumwangapache.org>" imported
gpg: key 6E1B4122F6A3A338: secret key imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg:       secret keys read: 1
gpg:   secret keys imported: 1
========================
= Creating release tag v3.0.0-preview2-rc1...
Command: /opt/spark-rm/release-tag.sh
Log file: tag.log
It may take some time for the tag to be synchronized to github.
Press enter when you've verified that the new tag (v3.0.0-preview2-rc1) is available.
========================
= Building Spark...
Command: /opt/spark-rm/release-build.sh package
Log file: build.log
========================
= Building documentation...
Command: /opt/spark-rm/release-build.sh docs
Log file: docs.log
========================
= Publishing release
Command: /opt/spark-rm/release-build.sh publish-release
Log file: publish.log
```
Generated doc:
![image](https://user-images.githubusercontent.com/5399861/70693075-a7723100-1cf7-11ea-9f88-9356a02349a1.png)

Closes #26848 from wangyum/SPARK-30216.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-13 11:31:31 -08:00
Dongjoon Hyun 9459833eae [SPARK-29989][INFRA] Add hadoop-2.7/hive-2.3 pre-built distribution
### What changes were proposed in this pull request?

This PR aims to add another pre-built binary distribution with `-Phadoop-2.7 -Phive-1.2` at `Apache Spark 3.0.0`.

**PRE-BUILT BINARY DISTRIBUTION**
```
spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz
spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz.asc
spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz.sha512
```

**CONTENTS (snippet)**
```
$ ls *hadoop-*
hadoop-annotations-2.7.4.jar                hadoop-mapreduce-client-shuffle-2.7.4.jar
hadoop-auth-2.7.4.jar                       hadoop-yarn-api-2.7.4.jar
hadoop-client-2.7.4.jar                     hadoop-yarn-client-2.7.4.jar
hadoop-common-2.7.4.jar                     hadoop-yarn-common-2.7.4.jar
hadoop-hdfs-2.7.4.jar                       hadoop-yarn-server-common-2.7.4.jar
hadoop-mapreduce-client-app-2.7.4.jar       hadoop-yarn-server-web-proxy-2.7.4.jar
hadoop-mapreduce-client-common-2.7.4.jar    parquet-hadoop-1.10.1.jar
hadoop-mapreduce-client-core-2.7.4.jar      parquet-hadoop-bundle-1.6.0.jar
hadoop-mapreduce-client-jobclient-2.7.4.jar

$ ls *hive-*
hive-beeline-1.2.1.spark2.jar                   hive-jdbc-1.2.1.spark2.jar
hive-cli-1.2.1.spark2.jar                       hive-metastore-1.2.1.spark2.jar
hive-exec-1.2.1.spark2.jar                      spark-hive-thriftserver_2.12-3.0.0-SNAPSHOT.jar
```

### Why are the changes needed?

Since Apache Spark switched to use `-Phive-2.3` by default, all pre-built binary distribution will use `-Phive-2.3`. This PR adds `hadoop-2.7/hive-1.2` distribution to provide a similar combination like `Apache Spark 2.4` line.

### Does this PR introduce any user-facing change?

Yes. This is additional distribution which resembles to `Apache Spark 2.4` line in terms of `hive` version.

### How was this patch tested?

Manual.

Please note that we need a dry-run mode, but the AS-IS release script do not generate additional combinations including this in `dry-run` mode.

Closes #26688 from dongjoon-hyun/SPARK-29989.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-11-27 15:55:52 -08:00
Dongjoon Hyun bdf0c606b6 [SPARK-28752][BUILD][FOLLOWUP] Fix to install rouge instead of rogue
### What changes were proposed in this pull request?

This PR aims to fix a type; `rogue` -> `rouge` .
This is a follow-up of https://github.com/apache/spark/pull/26521.

### Why are the changes needed?

To support `Python 3`, we upgraded from `pygments` to `rouge`.

### Does this PR introduce any user-facing change?

No. (This is for only document generation.)

### How was this patch tested?

Manually.
```
$ docker build -t test dev/create-release/spark-rm/
...
1 gem installed
Successfully installed rouge-3.13.0
Parsing documentation for rouge-3.13.0
Installing ri documentation for rouge-3.13.0
Done installing documentation for rouge after 4 seconds
1 gem installed
Removing intermediate container 9bd8707d9e84
 ---> a18b2f6b0bb9
...
```

Closes #26686 from dongjoon-hyun/SPARK-28752.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-26 14:58:20 -08:00
Dongjoon Hyun a60da23d64 [SPARK-30007][INFRA] Publish snapshot/release artifacts with -Phive-2.3 only
### What changes were proposed in this pull request?

This PR aims to add `-Phive-2.3` to publish profiles.
Since Apache Spark 3.0.0, Maven artifacts will be publish with Apache Hive 2.3 profile only.

This PR also will recover `SNAPSHOT` publishing Jenkins job.
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/

We will provide the pre-built distributions (with Hive 1.2.1 also) like Apache Spark 2.4.
SPARK-29989 will update the release script to generate all combinations.

### Why are the changes needed?

This will reduce the explicit dependency on the illegitimate Hive fork in Maven repository.

### Does this PR introduce any user-facing change?

Yes, but this is dev only changes.

### How was this patch tested?

Manual.

Closes #26648 from dongjoon-hyun/SPARK-30007.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-23 22:34:21 -08:00
HyukjinKwon d1ac25ba33 [SPARK-28752][BUILD][DOCS] Documentation build to support Python 3
### What changes were proposed in this pull request?

This PR proposes to switch `pygments.rb`, which only support Python 2 and seems inactive for the last few years (https://github.com/tmm1/pygments.rb), to Rouge which is pure Ruby code highlighter that is compatible with Pygments.

I thought it would be pretty difficult to change but thankfully Rouge does a great job as the alternative.

### Why are the changes needed?

We're moving to Python 3 and drop Python 2 completely.

### Does this PR introduce any user-facing change?

Maybe a little bit of different syntax style but should not have a notable change.

### How was this patch tested?

Manually tested the build and checked the documentation.

Closes #26521 from HyukjinKwon/SPARK-28752.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-15 13:44:20 +09:00
Xingbo Jiang 155a67d00c [SPARK-29666][BUILD] Fix the publish release failure under dry-run mode
### What changes were proposed in this pull request?

`release-build.sh` fail to publish release under dry run mode with the following error message:
```
/opt/spark-rm/release-build.sh: line 429: pushd: spark-repo-g4MBm/org/apache/spark: No such file or directory
```

We need to at least run the `mvn clean install` command once to create the `$tmp_repo` path, but now those steps are all skipped under dry-run mode. This PR fixes the issue.

### How was this patch tested?

Tested locally.

Closes #26329 from jiangxb1987/dryrun.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-30 14:57:51 -07:00
Xingbo Jiang fd6cfb1be3 [SPARK-29646][BUILD] Allow pyspark version name format ${versionNumber}-preview in release script
### What changes were proposed in this pull request?

Update `release-build.sh`, to allow pyspark version name format `${versionNumber}-preview`, otherwise the release script won't generate pyspark release tarballs.

### How was this patch tested?

Tested locally.

Closes #26306 from jiangxb1987/buildPython.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-30 14:51:50 -07:00
Dongjoon Hyun 2baf7a1d8f [SPARK-29608][BUILD] Add hadoop-3.2 profile to release build
### What changes were proposed in this pull request?

This PR aims to add `hadoop-3.2` profile to pre-built binary package releases.

### Why are the changes needed?

Since Apache Spark 3.0.0, we provides Hadoop 3.2 pre-built binary.

### Does this PR introduce any user-facing change?

No. (Although the artifacts are available, this change is for release managers).

### How was this patch tested?

Manual. Please note that `DRY_RUN=0` disables these combination.
```
$ dev/create-release/release-build.sh package
...
Packages to build: without-hadoop hadoop3.2 hadoop2.7
make_binary_release without-hadoop -Pscala-2.12 -Phadoop-provided  2.12
make_binary_release hadoop3.2 -Pscala-2.12 -Phadoop-3.2 -Phive -Phive-thriftserver  2.12
make_binary_release hadoop2.7 -Pscala-2.12 -Phadoop-2.7 -Phive -Phive-thriftserver withpip,withr 2.12
```

Closes #26260 from dongjoon-hyun/SPARK-29608.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-25 13:57:26 -07:00
Kazuaki Ishizaki 8d1b5ba766 [SPARK-28906][BUILD] Fix incorrect information in bin/spark-submit --version
### What changes were proposed in this pull request?
This PR allows `bin/spark-submit --version` to show the correct information while the previous versions, which were created by `dev/create-release/do-release-docker.sh`, show incorrect information.

There are two root causes to show incorrect information:

1. Did not pass `USER` environment variable to the docker container
1. Did not keep `.git` directory in the work directory

### Why are the changes needed?
The information is missing while the previous versions show the correct information.

### Does this PR introduce any user-facing change?
Yes, the following is the console output in branch-2.3

```
$ bin/spark-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.4
      /_/

Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_212
Branch HEAD
Compiled by user ishizaki on 2019-09-02T02:18:10Z
Revision 8c6f8150f3
Url https://gitbox.apache.org/repos/asf/spark.git
Type --help for more information.
```

Without this PR, the console output is as follows
```
$ spark-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.4
      /_/

Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_212
Branch
Compiled by user on 2019-08-26T08:29:39Z
Revision
Url
Type --help for more information.
```

### How was this patch tested?
After building the package, I manually executed `bin/spark-submit --version`

Closes #25655 from kiszk/SPARK-28906.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-11 08:12:44 -05:00
Dongjoon Hyun 560df0ea8e [SPARK-28951][INFRA] Add release announce template
### What changes were proposed in this pull request?

This PR adds a release announce template.

### Why are the changes needed?

- We want to use a formal template including HTTPS in the future release.
- The future release managers don't need to search mailing list to find this form.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A.

Closes #25656 from dongjoon-hyun/SPARK-28951.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-02 14:55:05 -07:00
Dongjoon Hyun 6214b6a541 [SPARK-28868][INFRA] Specify Jekyll version to 3.8.6 in release docker image
### What changes were proposed in this pull request?

This PR aims to specify Jekyll Version explicitly in our release docker image.

### Why are the changes needed?
Recently, Jekyll 4.0 is released and it dropped Ruby 2.3 support.
This breaks our release docker image build.
```
Building native extensions.  This could take a while...
ERROR:  Error installing jekyll:
        jekyll-sass-converter requires Ruby version >= 2.4.0.
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

The following should succeed.
```
$ docker build -t spark-rm:test --build-arg UID=501 dev/create-release/spark-rm
...
Successfully tagged spark-rm:test
```

Closes #25578 from dongjoon-hyun/SPARK-28868.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-25 15:38:41 -07:00
Dongjoon Hyun 0c6874fb37 [SPARK-28606][INFRA] Update CRAN key to recover docker image generation
## What changes were proposed in this pull request?

CRAN repo changed the key and it causes our release script failure. This is a release blocker for Apache Spark 2.4.4 and 3.0.0.
- https://cran.r-project.org/bin/linux/ubuntu/README.html
```
Err:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9
...
W: GPG error: https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9
E: The repository 'https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease' is not signed.
```

Note that they are reusing `cran35` for R 3.6 although they changed the key.
```
Even though R has moved to version 3.6, for compatibility the sources.list entry still uses the cran3.5 designation.
```

This PR aims to recover the docker image generation first. We will verify the R doc generation in a separate JIRA and PR.

## How was this patch tested?

Manual. After `docker-build.log`, it should continue to the next stage, `Building v3.0.0-rc1`.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n -s docs
...
Log file: docker-build.log
Building v3.0.0-rc1; output will be at /tmp/spark-3.0.0/output
```

Closes #25339 from dongjoon-hyun/SPARK-28606.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2019-08-02 23:41:00 +00:00
Dongjoon Hyun dbd0a2aa37 [SPARK-28511][INFRA] Get REV from RELEASE_VERSION instead of VERSION
## What changes were proposed in this pull request?

Unlike the other versions, `x.x.0-SNAPSHOT` causes `x.x.-1`. Although this will not happen in the tags (there is no `SNAPSHOT` postfix), we had better fix this.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n
Output directory already exists. Overwrite and continue? [y/n] y
Branch [branch-2.4]: master
Current branch version is 3.0.0-SNAPSHOT.
Release [3.0.-1]:
```

Since we already have `RELEASE_VERSION` by removing `SNAPSHOT`. This PR uses `RELEASE_VERSION` instead of `VERSION`.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n
Branch [branch-2.4]: master
Current branch version is 3.0.0-SNAPSHOT.
Release [3.0.0]:
```

## How was this patch tested?

Manually do `dev/create-release/do-release-docker.sh -d /tmp/spark-3.0.0 -n` and see the default value of `Release`.

Closes #25254 from dongjoon-hyun/SPARK-28511.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-07-25 10:54:24 -07:00
Dongjoon Hyun cfca26e973 [SPARK-28496][INFRA] Use branch name instead of tag during dry-run
## What changes were proposed in this pull request?

There are two cases when we use `dry run`.

First, when the tag already exists, we can ask `confirmation` on the existing tag name.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
Output directory already exists. Overwrite and continue? [y/n] y
Branch [branch-2.4]:
Current branch version is 2.4.4-SNAPSHOT.
Release [2.4.4]: 2.4.3
RC # [1]:
v2.4.3-rc1 already exists. Continue anyway [y/n]? y
This is a dry run. Please confirm the ref that will be built for testing.
Ref [v2.4.3-rc1]:
```

Second, when the tag doesn't exist, we had better ask `confirmation` on the branch name. If we do not change the default value, it will fail eventually.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
Branch [branch-2.4]:
Current branch version is 2.4.4-SNAPSHOT.
Release [2.4.4]:
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [v2.4.4-rc1]:
```

This PR improves the second case by providing the branch name instead. This helps the release testing before tagging.

## How was this patch tested?

Manually do the following and check the default value of `Ref` field.
```
$ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
Branch [branch-2.4]:
Current branch version is 2.4.4-SNAPSHOT.
Release [2.4.4]:
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [branch-2.4]:
...
```

Closes #25240 from dongjoon-hyun/SPARK-28496.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-07-24 14:20:25 -07:00
Sean Owen 6c5827c723 [SPARK-27794][R][DOCS] Use https URL for CRAN repo
## What changes were proposed in this pull request?

Use https URL for CRAN repo (and for a Scala download in a Dockerfile)

## How was this patch tested?

Existing tests.

Closes #24664 from srowen/SPARK-27794.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-22 14:28:21 -07:00
Sean Owen eed6de1a65 [MINOR][DOCS] Tighten up some key links to the project and download pages to use HTTPS
## What changes were proposed in this pull request?

Tighten up some key links to the project and download pages to use HTTPS

## How was this patch tested?

N/A

Closes #24665 from srowen/HTTPSURLs.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-21 10:56:42 -07:00
Sean Owen 8bc304f97e [SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0
## What changes were proposed in this pull request?

Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below.

## How was this patch tested?

Existing tests.

Closes #23098 from srowen/SPARK-26132.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-25 10:46:42 -05:00
DB Tsai 2b9ad2516e [MINOR][BUILD] Add Scala 2.12 profile back for branch-2.4 build
Closes #24074 from dbtsai/scala-2.12.

Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-12 20:08:52 -07:00
DB Tsai b6375097bc [SPARK-27026][BUILD] Upgrade Docker image for release build to Ubuntu 18.04 LTS
## What changes were proposed in this pull request?

Upgrade Docker image for release build to Ubuntu 18.04LTS

## How was this patch tested?

Manually tested.

Closes #23932 from dbtsai/ubuntu18.04.

Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-06 13:58:21 -08:00
Marcelo Vanzin d00eca75b3 [SPARK-26048][BUILD] Enable flume profile when creating 2.x releases.
Closes #23931 from vanzin/SPARK-26048.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-02 08:14:06 -08:00
wright 4915cb3adf [MINOR][BUILD] ensure call to translate_component has correct number of arguments
## What changes were proposed in this pull request?

The call to `translate_component` only supplied 2 out of the 3 required arguments. I added a default empty list for the missing argument to avoid a run-time error.

I work for Semmle, and noticed the bug with our LGTM code analyzer:
0655f1624f/files/dev/create-release/releaseutils.py?sort=name&dir=ASC&mode=heatmap#x1434915b6576fb40:1

## How was this patch tested?

I checked that  `./dev/run-tests` pass OK.

Closes #23567 from ipwright/wrong-number-of-arguments-fix.

Authored-by: wright <wright@semmle.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-01-16 21:00:58 -06:00
Takeshi Yamamuro abc937b247 [MINOR][BUILD] Remove binary license/notice files in a source release for branch-2.4+ only
## What changes were proposed in this pull request?
To skip some steps to remove binary license/notice files in a source release for branch2.3 (these files only exist in master/branch-2.4 now), this pr checked a Spark release version in `dev/create-release/release-build.sh`.

## How was this patch tested?
Manually checked.

Closes #23538 from maropu/FixReleaseScript.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-01-14 19:17:39 -06:00
Dongjoon Hyun 6f35ede31c
[SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER
## What changes were proposed in this pull request?

This PR uses GitHub repository instead of GitBox because GitHub repo returns HTTP header status correctly.

## How was this patch tested?

Manual.

```
$ ./do-release-docker.sh -d /tmp/test -n
Branch [branch-2.4]:
Current branch version is 2.4.1-SNAPSHOT.
Release [2.4.1]:
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [v2.4.1-rc1]:
```

Closes #23482 from dongjoon-hyun/SPARK-26554-2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2019-01-07 17:54:05 -08:00
Dongjoon Hyun 468d25ec74
[MINOR][BUILD] Fix script name in release-tag.sh usage message
## What changes were proposed in this pull request?

This PR fixes the old script name in `release-tag.sh`.

    $ ./release-tag.sh --help | head -n1
    usage: tag-release.sh

## How was this patch tested?

Manual.

    $ ./release-tag.sh --help | head -n1
    usage: release-tag.sh

Closes #23477 from dongjoon-hyun/SPARK-RELEASE-TAG.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2019-01-06 22:45:18 -08:00
Dongjoon Hyun fe039faddf
[SPARK-26554][BUILD] Update release-util.sh to avoid GitBox fake 200 headers
## What changes were proposed in this pull request?

Unlike the previous Apache Git repository, new GitBox repository returns a fake HTTP 200 header instead of `404 Not Found` header. This makes release scripts out of order. This PR aims to fix it to handle the html body message instead of the fake HTTP headers. This is a release blocker.

```bash
$ curl -s --head --fail "https://gitbox.apache.org/repos/asf?p=spark.git;a=commit;h=v3.0.0"
HTTP/1.1 200 OK
Date: Sun, 06 Jan 2019 22:42:39 GMT
Server: Apache/2.4.18 (Ubuntu)
Vary: Accept-Encoding
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Allow-Headers: X-PINGOTHER
Access-Control-Max-Age: 1728000
Content-Type: text/html; charset=utf-8
```

**BEFORE**
```bash
$ ./do-release-docker.sh -d /tmp/test -n
Branch [branch-2.4]:
Current branch version is 2.4.1-SNAPSHOT.
Release [2.4.1]:
RC # [1]:
v2.4.1-rc1 already exists. Continue anyway [y/n]?
```

**AFTER**
```bash
$ ./do-release-docker.sh -d /tmp/test -n
Branch [branch-2.4]:
Current branch version is 2.4.1-SNAPSHOT.
Release [2.4.1]:
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [v2.4.1-rc1]:
```

## How was this patch tested?

Manual.

Closes #23476 from dongjoon-hyun/SPARK-26554.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2019-01-06 19:59:31 -08:00
shane knapp bccb8602d7
[SPARK-26537][BUILD] change git-wip-us to gitbox
## What changes were proposed in this pull request?

due to apache recently moving from git-wip-us.apache.org to gitbox.apache.org, we need to update the packaging scripts to point to the new repo location.

this will also need to be backported to 2.4, 2.3, 2.1, 2.0 and 1.6.

## How was this patch tested?

the build system will test this.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes #23454 from shaneknapp/update-apache-repo.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2019-01-04 18:27:26 -08:00
Wenchen Fan a241a150d5 [MINOR] update known_translations
## What changes were proposed in this pull request?

update known_translations after running `translate-contributors.py` during 2.4.0 release

## How was this patch tested?

N/A

Closes #22949 from cloud-fan/contributors.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2018-11-06 14:52:02 -08:00
Wenchen Fan 327456b482 [BUILD][MINOR] release script should not interrupt by svn
## What changes were proposed in this pull request?

When running the release script, you will be interrupted unexpectedly
```
ATTENTION!  Your password for authentication realm:

   <https://dist.apache.org:443> ASF Committers

can only be stored to disk unencrypted!  You are advised to configure
your system so that Subversion can store passwords encrypted, if
possible.  See the documentation for details.

You can avoid future appearances of this warning by setting the value
of the 'store-plaintext-passwords' option to either 'yes' or 'no' in
'/home/spark-rm/.subversion/servers'.
-----------------------------------------------------------------------
Store password unencrypted (yes/no)?
```

We can avoid it by adding `--no-auth-cache` when running svn command.

## How was this patch tested?

manually verified with 2.4.0 RC5

Closes #22885 from cloud-fan/svn.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-10-30 21:17:40 +08:00
Wenchen Fan ac586bbb01 fix security issue of zinc(simplier version) 2018-10-19 23:54:15 +08:00
Sean Owen 703e6da1ec [SPARK-25705][BUILD][STREAMING][TEST-MAVEN] Remove Kafka 0.8 integration
## What changes were proposed in this pull request?

Remove Kafka 0.8 integration

## How was this patch tested?

Existing tests, build scripts

Closes #22703 from srowen/SPARK-25705.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-16 09:10:24 -05:00
Sean Owen a001814189 [SPARK-25598][STREAMING][BUILD][TEST-MAVEN] Remove flume connector in Spark 3
## What changes were proposed in this pull request?

Removes all vestiges of Flume in the build, for Spark 3.
I don't think this needs Jenkins config changes.

## How was this patch tested?

Existing tests.

Closes #22692 from srowen/SPARK-25598.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-11 14:28:06 -07:00
Sean Owen 80813e1980 [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
## What changes were proposed in this pull request?

Remove Hadoop 2.6 references and make 2.7 the default.
Obviously, this is for master/3.0.0 only.
After this we can also get rid of the separate test jobs for Hadoop 2.6.

## How was this patch tested?

Existing tests

Closes #22615 from srowen/SPARK-25016.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-10-10 12:07:53 -07:00
Wenchen Fan d6be46eb9c [SPARK-24530][FOLLOWUP] run Sphinx with python 3 in docker
## What changes were proposed in this pull request?

SPARK-24530 discovered a problem of generation python doc, and provided a fix: setting SPHINXPYTHON to python 3.

This PR makes this fix automatic in the release script using docker.

## How was this patch tested?

verified by the 2.4.0 rc2

Closes #22607 from cloud-fan/python.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2018-10-02 10:10:22 -07:00
Gengliang Wang 5534a3a58e [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build
## What changes were proposed in this pull request?

This is a follow up for #22441.

1. Remove flag "-Pkafka-0-8" for Scala 2.12 build.
2. Clean up the script, simpler logic.
3. Switch to Scala version to 2.11 before script exit.

## How was this patch tested?

Manual test.

Closes #22454 from gengliangwang/revise_release_build.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-09-19 18:30:46 +08:00
Wenchen Fan 1c0423b287 [SPARK-25445][BUILD] the release script should be able to publish a scala-2.12 build
## What changes were proposed in this pull request?

update the package and publish steps, to support scala 2.12

## How was this patch tested?

manual test

Closes #22441 from cloud-fan/scala.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-09-18 22:29:00 +08:00
Wenchen Fan 0f1413e320 [SPARK-25443][BUILD] fix issues when building docs with release scripts in docker
## What changes were proposed in this pull request?

These 2 changes are required to build the docs for Spark 2.4.0 RC1:
1. install `mkdocs` in the docker image
2. set locale to C.UTF-8. Otherwise jekyll fails to build the doc.

## How was this patch tested?

tested manually when doing the 2.4.0 RC1

Closes #22438 from cloud-fan/infra.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-09-18 10:10:20 +08:00
Sean Owen 30aa37fca4 [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and NOTICE, and specialize for source vs binary
## What changes were proposed in this pull request?

Fix location of licenses-binary in binary release, and remove binary items from source release

## How was this patch tested?

N/A

Closes #22436 from srowen/SPARK-24654.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-09-17 08:54:44 -05:00
jerryshao b66e14dc96 [SPARK-24685][BUILD][FOLLOWUP] Fix the nonexist profile name in release script
## What changes were proposed in this pull request?

`without-hadoop` profile doesn't exist in Maven, instead the name should be `hadoop-provided`, this is a regression introduced by SPARK-24685. So here fix it.

## How was this patch tested?

Local test.

Closes #22434 from jerryshao/SPARK-24685-followup.

Authored-by: jerryshao <sshao@hortonworks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-09-17 15:21:18 +08:00
Sean Owen 08c76b5d39 [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4
(This change is a subset of the changes needed for the JIRA; see https://github.com/apache/spark/pull/22231)

## What changes were proposed in this pull request?

Use raw strings and simpler regex syntax consistently in Python, which also avoids warnings from pycodestyle about accidentally relying Python's non-escaping of non-reserved chars in normal strings. Also, fix a few long lines.

## How was this patch tested?

Existing tests, and some manual double-checking of the behavior of regexes in Python 2/3 to be sure.

Closes #22400 from srowen/SPARK-25238.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
2018-09-13 11:19:43 +08:00
cclauss 71f38ac242 [SPARK-23698][PYTHON] Resolve undefined names in Python 3
## What changes were proposed in this pull request?

Fix issues arising from the fact that builtins __file__, __long__, __raw_input()__, __unicode__, __xrange()__, etc. were all removed from Python 3.  __Undefined names__ have the potential to raise [NameError](https://docs.python.org/3/library/exceptions.html#NameError) at runtime.

## How was this patch tested?
* $ __python2 -m flake8 . --count --select=E9,F82 --show-source --statistics__
* $ __python3 -m flake8 . --count --select=E9,F82 --show-source --statistics__

holdenk

flake8 testing of https://github.com/apache/spark on Python 3.6.3

$ __python3 -m flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__
```
./dev/merge_spark_pr.py:98:14: F821 undefined name 'raw_input'
    result = raw_input("\n%s (y/n): " % prompt)
             ^
./dev/merge_spark_pr.py:136:22: F821 undefined name 'raw_input'
    primary_author = raw_input(
                     ^
./dev/merge_spark_pr.py:186:16: F821 undefined name 'raw_input'
    pick_ref = raw_input("Enter a branch name [%s]: " % default_branch)
               ^
./dev/merge_spark_pr.py:233:15: F821 undefined name 'raw_input'
    jira_id = raw_input("Enter a JIRA id [%s]: " % default_jira_id)
              ^
./dev/merge_spark_pr.py:278:20: F821 undefined name 'raw_input'
    fix_versions = raw_input("Enter comma-separated fix version(s) [%s]: " % default_fix_versions)
                   ^
./dev/merge_spark_pr.py:317:28: F821 undefined name 'raw_input'
            raw_assignee = raw_input(
                           ^
./dev/merge_spark_pr.py:430:14: F821 undefined name 'raw_input'
    pr_num = raw_input("Which pull request would you like to merge? (e.g. 34): ")
             ^
./dev/merge_spark_pr.py:442:18: F821 undefined name 'raw_input'
        result = raw_input("Would you like to use the modified title? (y/n): ")
                 ^
./dev/merge_spark_pr.py:493:11: F821 undefined name 'raw_input'
    while raw_input("\n%s (y/n): " % pick_prompt).lower() == "y":
          ^
./dev/create-release/releaseutils.py:58:16: F821 undefined name 'raw_input'
    response = raw_input("%s [y/n]: " % msg)
               ^
./dev/create-release/releaseutils.py:152:38: F821 undefined name 'unicode'
        author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
                                     ^
./python/setup.py:37:11: F821 undefined name '__version__'
VERSION = __version__
          ^
./python/pyspark/cloudpickle.py:275:18: F821 undefined name 'buffer'
        dispatch[buffer] = save_buffer
                 ^
./python/pyspark/cloudpickle.py:807:18: F821 undefined name 'file'
        dispatch[file] = save_file
                 ^
./python/pyspark/sql/conf.py:61:61: F821 undefined name 'unicode'
        if not isinstance(obj, str) and not isinstance(obj, unicode):
                                                            ^
./python/pyspark/sql/streaming.py:25:21: F821 undefined name 'long'
    intlike = (int, long)
                    ^
./python/pyspark/streaming/dstream.py:405:35: F821 undefined name 'long'
        return self._sc._jvm.Time(long(timestamp * 1000))
                                  ^
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:21:10: F821 undefined name 'xrange'
for i in xrange(50):
         ^
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:22:14: F821 undefined name 'xrange'
    for j in xrange(5):
             ^
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:23:18: F821 undefined name 'xrange'
        for k in xrange(20022):
                 ^
20    F821 undefined name 'raw_input'
20
```

Closes #20838 from cclauss/fix-undefined-names.

Authored-by: cclauss <cclauss@bluewin.ch>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
2018-08-22 10:06:59 -07:00
Marcelo Vanzin 717f58e9ce [SPARK-24685][BUILD] Restore support for building old Hadoop versions of 2.1.
Update the release scripts to build binary packages for older versions
of Hadoop when building Spark 2.1. Also did some minor refactoring of that
part of the script so that changing these later is easier.

This was used to build the missing packages from 2.1.3-rc2.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #21661 from vanzin/SPARK-24685.
2018-08-15 14:42:48 -07:00
cclauss b42fda8ab3 [SPARK-23698] Remove raw_input() from Python 2
Signed-off-by: cclauss <cclaussbluewin.ch>

## What changes were proposed in this pull request?

Humans will be able to enter text in Python 3 prompts which they can not do today.
The Python builtin __raw_input()__ was removed in Python 3 in favor of __input()__.  This PR does the same thing in Python 2.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
flake8 testing

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: cclauss <cclauss@bluewin.ch>

Closes #21702 from cclauss/python-fix-raw_input.
2018-07-04 09:40:58 +08:00
Marcelo Vanzin 4e7d8678a3 [SPARK-24372][BUILD] Add scripts to help with preparing releases.
The "do-release.sh" script asks questions about the RC being prepared,
trying to find out as much as possible automatically, and then executes
the existing scripts with proper arguments to prepare the release. This
script was used to prepare the 2.3.1 release candidates, so was tested
in that context.

The docker version runs that same script inside a docker image especially
crafted for building Spark releases. That image is based on the work
by Felix C. linked in the bug. At this point is has been only midly
tested.

I also added a template for the vote e-mail, with placeholders for
things that need to be replaced, although there is no automation around
that for the moment. It shouldn't be hard to hook up certain things like
version and tags to this, or to figure out certain things like the
repo URL from the output of the release scripts.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #21515 from vanzin/SPARK-24372.
2018-06-22 12:38:34 -05:00
Marcelo Vanzin 8e60a16b73 [SPARK-23601][BUILD][FOLLOW-UP] Keep md5 checksums for nexus artifacts.
The repository.apache.org server still requires md5 checksums or
it won't publish the staging repo.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #21338 from vanzin/SPARK-23601.
2018-05-16 13:34:54 -07:00
Sean Owen 8bceb899dc [SPARK-23601][BUILD] Remove .md5 files from release
## What changes were proposed in this pull request?

Remove .md5 files from release artifacts

## How was this patch tested?

N/A

Author: Sean Owen <sowen@cloudera.com>

Closes #20737 from srowen/SPARK-23601.
2018-03-06 08:52:28 -06:00
foxish c3548d11c3 [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)
## What changes were proposed in this pull request?

Including the `-Pkubernetes` flag in a few places it was missed.

## How was this patch tested?

checkstyle, mima through manual tests.

Author: foxish <ramanathana@google.com>

Closes #20256 from foxish/SPARK-23063.
2018-01-13 21:34:28 -08:00
Felix Cheung ab1b6ee731 [BUILD] update release scripts
## What changes were proposed in this pull request?

Change to dist.apache.org instead of home directory
sha512 should have .sha512 extension. From ASF release signing doc: "The checksum SHOULD be generated using SHA-512. A .sha file SHOULD contain a SHA-1 checksum, for historical reasons."

NOTE: I *think* should require some changes to work with Jenkins' release build

## How was this patch tested?

manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #19754 from felixcheung/releasescript.
2017-12-09 09:28:46 -06:00
hyukjinkwon c8b7f97b8a [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh
## What changes were proposed in this pull request?

This PR proposes to use `/usr/sbin/lsof` if `lsof` is missing in the path to fix nightly snapshot jenkins jobs. Please refer https://github.com/apache/spark/pull/19359#issuecomment-340139557:

> Looks like some of the snapshot builds are having lsof issues:
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.1-maven-snapshots/182/console
>
>https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.2-maven-snapshots/134/console
>
>spark-build/dev/create-release/release-build.sh: line 344: lsof: command not found
>usage: kill [ -s signal | -p ] [ -a ] pid ...
>kill -l [ signal ]

Up to my knowledge,  the full path of `lsof` is required for non-root user in few OSs.

## How was this patch tested?

Manually tested as below:

```bash
#!/usr/bin/env bash

LSOF=lsof
if ! hash $LSOF 2>/dev/null; then
  echo "a"
  LSOF=/usr/sbin/lsof
fi

$LSOF -P | grep "a"
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #19695 from HyukjinKwon/SPARK-22377.
2017-11-14 08:28:13 +09:00
Sean Owen 0c03297bf0 [SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile, take 2
## What changes were proposed in this pull request?

Move flume behind a profile, take 2. See https://github.com/apache/spark/pull/19365 for most of the back-story.

This change should fix the problem by removing the examples module dependency and moving Flume examples to the module itself. It also adds deprecation messages, per a discussion on dev about deprecating for 2.3.0.

## How was this patch tested?

Existing tests, which still enable flume integration.

Author: Sean Owen <sowen@cloudera.com>

Closes #19412 from srowen/SPARK-22142.2.
2017-10-06 15:08:28 +01:00
gatorsmile 472864014c Revert "[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile"
This reverts commit a2516f41ae.
2017-09-29 11:45:58 -07:00
Holden Karau ecbe416ab5 [SPARK-22129][SPARK-22138] Release script improvements
## What changes were proposed in this pull request?

Use the GPG_KEY param, fix lsof to non-hardcoded path, remove version swap since it wasn't really needed. Use EXPORT on JAVA_HOME for downstream scripts as well.

## How was this patch tested?

Rolled 2.1.2 RC2

Author: Holden Karau <holden@us.ibm.com>

Closes #19359 from holdenk/SPARK-22129-fix-signing.
2017-09-29 08:04:14 -07:00
Sean Owen a2516f41ae [SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile
## What changes were proposed in this pull request?

Add 'flume' profile to enable Flume-related integration modules

## How was this patch tested?

Existing tests; no functional change

Author: Sean Owen <sowen@cloudera.com>

Closes #19365 from srowen/SPARK-22142.
2017-09-29 08:26:53 +01:00
Holden Karau 8f130ad401 [SPARK-22072][SPARK-22071][BUILD] Improve release build scripts
## What changes were proposed in this pull request?

Check JDK version (with javac) and use SPARK_VERSION for publish-release

## How was this patch tested?

Manually tried local build with wrong JDK / JAVA_HOME & built a local release (LFTP disabled)

Author: Holden Karau <holden@us.ibm.com>

Closes #19312 from holdenk/improve-release-scripts-r2.
2017-09-22 00:14:57 -07:00
Sean Owen 4fbf748bf8 [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile
## What changes were proposed in this pull request?

Put Kafka 0.8 support behind a kafka-0-8 profile.

## How was this patch tested?

Existing tests, but, until PR builder and Jenkins configs are updated the effect here is to not build or test Kafka 0.8 support at all.

Author: Sean Owen <sowen@cloudera.com>

Closes #19134 from srowen/SPARK-21893.
2017-09-13 10:10:40 +01:00
Sean Owen 12ab7f7e89 [SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation
…build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure

## What changes were proposed in this pull request?

This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.

In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.

It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release.

- Scalatest 2.x -> 3.0.3
- Chill 0.8.0 -> 0.8.4
- Clapper 1.0.x -> 1.1.2
- json4s 3.2.x -> 3.4.2
- Jackson 2.6.x -> 2.7.9 (required by json4s)

This change does _not_ fully enable a Scala 2.12 build:

- It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here
- It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too.

What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.

## How was this patch tested?

Existing tests and build. Manually tested with `./dev/change-scala-version.sh 2.12` to verify it compiles, modulo the exceptions above.

Author: Sean Owen <sowen@cloudera.com>

Closes #18645 from srowen/SPARK-14280.
2017-09-01 19:21:21 +01:00
Sean Owen 425c4ada4c [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10
## What changes were proposed in this pull request?

- Remove Scala 2.10 build profiles and support
- Replace some 2.10 support in scripts with commented placeholders for 2.12 later
- Remove deprecated API calls from 2.10 support
- Remove usages of deprecated context bounds where possible
- Remove Scala 2.10 workarounds like ScalaReflectionLock
- Other minor Scala warning fixes

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #17150 from srowen/SPARK-19810.
2017-07-13 17:06:24 +08:00
Holden Karau 1b85bcd929 [SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version
## What changes were proposed in this pull request?

Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar.

## How was this patch tested?

Ran `make-distribution` locally

Author: Holden Karau <holden@us.ibm.com>

Closes #17885 from holdenk/SPARK-20627-remove-pip-local-version-string.
2017-05-09 11:25:29 -07:00
Josh Rosen 314cf51ded [SPARK-20102] Fix nightly packaging and RC packaging scripts w/ two minor build fixes
## What changes were proposed in this pull request?

The master snapshot publisher builds are currently broken due to two minor build issues:

1. For unknown reasons, the LFTP `mkdir -p` command began throwing errors when the remote directory already exists. This change of behavior might have been caused by configuration changes in the ASF's SFTP server, but I'm not entirely sure of that. To work around this problem, this patch updates the script to ignore errors from the `lftp mkdir -p` commands.
2. The PySpark `setup.py` file references a non-existent `pyspark.ml.stat` module, causing Python packaging to fail by complaining about a missing directory. The fix is to simply drop that line from the setup script.

## How was this patch tested?

The LFTP fix was tested by manually running the failing commands on AMPLab Jenkins against the ASF SFTP server. The PySpark fix was tested locally.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #17437 from JoshRosen/spark-20102.
2017-03-27 10:23:28 -07:00
Sean Owen 0e2405490f
[SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support
- Move external/java8-tests tests into core, streaming, sql and remove
- Remove MaxPermGen and related options
- Fix some reflection / TODOs around Java 8+ methods
- Update doc references to 1.7/1.8 differences
- Remove Java 7/8 related build profiles
- Update some plugins for better Java 8 compatibility
- Fix a few Java-related warnings

For the future:

- Update Java 8 examples to fully use Java 8
- Update Java tests to use lambdas for simplicity
- Update Java internal implementations to use lambdas

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16871 from srowen/SPARK-19493.
2017-02-16 12:32:45 +00:00
Sean Owen e8d3fca450
[SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier
## What changes were proposed in this pull request?

- Remove support for Hadoop 2.5 and earlier
- Remove reflection and code constructs only needed to support multiple versions at once
- Update docs to reflect newer versions
- Remove older versions' builds and profiles.

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16810 from srowen/SPARK-19464.
2017-02-08 12:20:07 +00:00
Yin Huai 0c92318588 Update known_translations for contributor names
## What changes were proposed in this pull request?
Update known_translations per https://github.com/apache/spark/pull/16423#issuecomment-269739634

Author: Yin Huai <yhuai@databricks.com>

Closes #16628 from yhuai/known_translations.
2017-01-18 18:18:51 -08:00
hyukjinkwon 46b2126024
[SPARK-19002][BUILD][PYTHON] Check pep8 against all Python scripts
## What changes were proposed in this pull request?

This PR proposes to check pep8 against all other Python scripts and fix the errors as below:

```bash
./dev/create-release/generate-contributors.py
./dev/create-release/releaseutils.py
./dev/create-release/translate-contributors.py
./dev/lint-python
./python/docs/epytext.py
./examples/src/main/python/mllib/decision_tree_classification_example.py
./examples/src/main/python/mllib/decision_tree_regression_example.py
./examples/src/main/python/mllib/gradient_boosting_classification_example.py
./examples/src/main/python/mllib/gradient_boosting_regression_example.py
./examples/src/main/python/mllib/linear_regression_with_sgd_example.py
./examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
./examples/src/main/python/mllib/naive_bayes_example.py
./examples/src/main/python/mllib/random_forest_classification_example.py
./examples/src/main/python/mllib/random_forest_regression_example.py
./examples/src/main/python/mllib/svm_with_sgd_example.py
./examples/src/main/python/streaming/network_wordjoinsentiments.py
./sql/hive/src/test/resources/data/scripts/cat.py
./sql/hive/src/test/resources/data/scripts/cat_error.py
./sql/hive/src/test/resources/data/scripts/doubleescapedtab.py
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py
./sql/hive/src/test/resources/data/scripts/escapedcarriagereturn.py
./sql/hive/src/test/resources/data/scripts/escapednewline.py
./sql/hive/src/test/resources/data/scripts/escapedtab.py
./sql/hive/src/test/resources/data/scripts/input20_script.py
./sql/hive/src/test/resources/data/scripts/newline.py
```

## How was this patch tested?

- `./python/docs/epytext.py`

  ```bash
  cd ./python/docs $$ make html
  ```

- pep8 check (Python 2.7 / Python 3.3.6)

  ```
  ./dev/lint-python
  ```

- `./dev/merge_spark_pr.py` (Python 2.7 only / Python 3.3.6 not working)

  ```bash
  python -m doctest -v ./dev/merge_spark_pr.py
  ```

- `./dev/create-release/releaseutils.py` `./dev/create-release/generate-contributors.py` `./dev/create-release/translate-contributors.py` (Python 2.7 only / Python 3.3.6 not working)

  ```bash
  python generate-contributors.py
  python translate-contributors.py
  ```

- Examples (Python 2.7 / Python 3.3.6)

  ```bash
  ./bin/spark-submit examples/src/main/python/mllib/decision_tree_classification_example.py
  ./bin/spark-submit examples/src/main/python/mllib/decision_tree_regression_example.py
  ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_classification_example.py
  ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_regression_example.p
  ./bin/spark-submit examples/src/main/python/mllib/random_forest_classification_example.py
  ./bin/spark-submit examples/src/main/python/mllib/random_forest_regression_example.py
  ```

- Examples (Python 2.7 only / Python 3.3.6 not working)
  ```
  ./bin/spark-submit examples/src/main/python/mllib/linear_regression_with_sgd_example.py
  ./bin/spark-submit examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
  ./bin/spark-submit examples/src/main/python/mllib/naive_bayes_example.py
  ./bin/spark-submit examples/src/main/python/mllib/svm_with_sgd_example.py
  ```

- `sql/hive/src/test/resources/data/scripts/*.py` (Python 2.7 / Python 3.3.6 within suggested changes)

  Manually tested only changed ones.

- `./dev/github_jira_sync.py` (Python 2.7 only / Python 3.3.6 not working)

  Manually tested this after disabling actually adding comments and links.

And also via Jenkins tests.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #16405 from HyukjinKwon/minor-pep8.
2017-01-02 15:23:19 +00:00
Yin Huai 63036aee22 Update known_translations for contributor names and also fix a small issue in translate-contributors.py
## What changes were proposed in this pull request?
This PR updates dev/create-release/known_translations to add more contributor name mapping. It also fixes a small issue in translate-contributors.py

## How was this patch tested?
manually tested

Author: Yin Huai <yhuai@databricks.com>

Closes #16423 from yhuai/contributors.
2016-12-29 14:20:56 -08:00
Shivaram Venkataraman be5fc6ef72 [MINOR][SPARKR] Fix SparkR regex in copy command
Fix SparkR package copy regex. The existing code leads to
```
Copying release tarballs to /home/****/public_html/spark-nightly/spark-branch-2.1-bin/spark-2.1.1-SNAPSHOT-2016_12_08_22_38-e8f351f-bin
mput: SparkR-*: no files found
```

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #16231 from shivaram/typo-sparkr-build.
2016-12-09 10:12:56 -08:00
Felix Cheung c074c96dc5 Copy pyspark and SparkR packages to latest release dir too
## What changes were proposed in this pull request?

Copy pyspark and SparkR packages to latest release dir, as per comment [here](https://github.com/apache/spark/pull/16226#discussion_r91664822)

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16227 from felixcheung/pyrftp.
2016-12-08 22:52:34 -08:00
Shivaram Venkataraman 934035ae7c Copy the SparkR source package with LFTP
This PR adds a line in release-build.sh to copy the SparkR source archive using LFTP

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #16226 from shivaram/fix-sparkr-copy-build.
2016-12-08 22:21:24 -08:00
Shivaram Venkataraman 202fcd21ce [SPARK-18590][SPARKR] Change the R source build to Hadoop 2.6
This PR changes the SparkR source release tarball to be built using the Hadoop 2.6 profile. Previously it was using the without hadoop profile which leads to an error as discussed in https://github.com/apache/spark/pull/16014#issuecomment-265843991

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #16218 from shivaram/fix-sparkr-release-build.
2016-12-08 13:01:46 -08:00
Felix Cheung c3d3a9d0e8 [SPARK-18590][SPARKR] build R source package when making distribution
## What changes were proposed in this pull request?

This PR has 2 key changes. One, we are building source package (aka bundle package) for SparkR which could be released on CRAN. Two, we should include in the official Spark binary distributions SparkR installed from this source package instead (which would have help/vignettes rds needed for those to work when the SparkR package is loaded in R, whereas earlier approach with devtools does not)

But, because of various differences in how R performs different tasks, this PR is a fair bit more complicated. More details below.

This PR also includes a few minor fixes.

### more details

These are the additional steps in make-distribution; please see [here](https://github.com/apache/spark/blob/master/R/CRAN_RELEASE.md) on what's going to a CRAN release, which is now run during make-distribution.sh.
1. package needs to be installed because the first code block in vignettes is `library(SparkR)` without lib path
2. `R CMD build` will build vignettes (this process runs Spark/SparkR code and captures outputs into pdf documentation)
3. `R CMD check` on the source package will install package and build vignettes again (this time from source packaged) - this is a key step required to release R package on CRAN
 (will skip tests here but tests will need to pass for CRAN release process to success - ideally, during release signoff we should install from the R source package and run tests)
4. `R CMD Install` on the source package (this is the only way to generate doc/vignettes rds files correctly, not in step # 1)
 (the output of this step is what we package into Spark dist and sparkr.zip)

Alternatively,
   R CMD build should already be installing the package in a temp directory though it might just be finding this location and set it to lib.loc parameter; another approach is perhaps we could try calling `R CMD INSTALL --build pkg` instead.
 But in any case, despite installing the package multiple times this is relatively fast.
Building vignettes takes a while though.

## How was this patch tested?

Manually, CI.

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16014 from felixcheung/rdist.
2016-12-08 11:29:31 -08:00
Reynold Xin 37e52f8793 [SPARK-18639] Build only a single pip package
## What changes were proposed in this pull request?
We current build 5 separate pip binary tar balls, doubling the release script runtime. It'd be better to build one, especially for use cases that are just using Spark locally. In the long run, it would make more sense to have Hadoop support be pluggable.

## How was this patch tested?
N/A - this is a release build script that doesn't have any automated test coverage. We will know if it goes wrong when we prepare releases.

Author: Reynold Xin <rxin@databricks.com>

Closes #16072 from rxin/SPARK-18639.
2016-12-01 17:58:28 -08:00
Holden Karau a36a76ac43 [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed
## What changes were proposed in this pull request?

This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).

Done:
- pip installable on conda [manual tested]
- setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
- Automated testing of this (virtualenv)
- packaging and signing with release-build*

Possible follow up work:
- release-build update to publish to PyPI (SPARK-18128)
- figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
- Windows support and or testing ( SPARK-18136 )
- investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
- consider how we want to number our dev/snapshot versions

Explicitly out of scope:
- Using pip installed PySpark to start a standalone cluster
- Using pip installed PySpark for non-Python Spark programs

*I've done some work to test release-build locally but as a non-committer I've just done local testing.
## How was this patch tested?

Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.

release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)

Author: Holden Karau <holden@us.ibm.com>
Author: Juliet Hougland <juliet@cloudera.com>
Author: Juliet Hougland <not@myemail.com>

Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
2016-11-16 14:22:15 -08:00
Holden Karau 1386fd28da [SPARK-18418] Fix flags for make_binary_release for hadoop profile
## What changes were proposed in this pull request?

Fix the flags used to specify the hadoop version

## How was this patch tested?

Manually tested as part of https://github.com/apache/spark/pull/15659 by having the build succeed.

cc joshrosen

Author: Holden Karau <holden@us.ibm.com>

Closes #15860 from holdenk/minor-fix-release-build-script.
2016-11-12 14:50:37 -08:00
Adam Roberts 3f8a0222e2
[SPARK-17828][DOCS] Remove unused generate-changelist.py
## What changes were proposed in this pull request?
We can remove this file based on discussion at https://issues.apache.org/jira/browse/SPARK-17828 it's evident this file has been redundant for a while, JIRA release notes serves this purpose for us already.

For ease of future reference you can find detailed release notes at, for example:

http://spark.apache.org/downloads.html -> http://spark.apache.org/releases/spark-release-2-0-1.html -> "Detailed changes" which links to https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12336857

## How was this patch tested?
Searched the codebase and saw nothing referencing this, hasn't been used in a while (probably manually invoked a long time ago)

Author: Adam Roberts <aroberts@uk.ibm.com>

Closes #15419 from a-roberts/patch-7.
2016-10-10 23:16:40 +02:00
Shivaram Venkataraman 7c382524a9 [SPARK-17651][SPARKR] Set R package version number along with mvn
## What changes were proposed in this pull request?

This PR sets the R package version while tagging releases. Note that since R doesn't accept `-SNAPSHOT` in version number field, we remove that while setting the next version

## How was this patch tested?

Tested manually by running locally

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #15223 from shivaram/sparkr-version-change.
2016-09-23 14:35:18 -07:00
Michael Gummelt 8e5475be3c [SPARK-16967] move mesos to module
## What changes were proposed in this pull request?

Move Mesos code into a mvn module

## How was this patch tested?

unit tests
manually submitting a client mode and cluster mode job
spark/mesos integration test suite

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #14637 from mgummelt/mesos-module.
2016-08-26 12:25:22 -07:00
Yin Huai 60ba436b70 [SPARK-16453][BUILD] release-build.sh is missing hive-thriftserver for scala 2.10
## What changes were proposed in this pull request?
This PR adds hive-thriftserver profile to scala 2.10 build created by release-build.sh.

Author: Yin Huai <yhuai@databricks.com>

Closes #14108 from yhuai/SPARK-16453.
2016-07-08 15:56:46 -07:00
Sean Owen 0eea12a3d9 [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs
## What changes were proposed in this pull request?

Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.

I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.

## How was this patch tested?

`./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.

Author: Sean Owen <sowen@cloudera.com>

Closes #11522 from srowen/SPARK-13596.
2016-03-07 14:48:02 -08:00
Josh Rosen f77dc4e1e2 [SPARK-13474][PROJECT INFRA] Update packaging scripts to push artifacts to home.apache.org
Due to the people.apache.org -> home.apache.org migration, we need to update our packaging scripts to publish artifacts to the new server. Because the new server only supports sftp instead of ssh, we need to update the scripts to use lftp instead of ssh + rsync.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #11350 from JoshRosen/update-release-scripts-for-apache-home.
2016-02-26 18:40:00 -08:00
Josh Rosen 289373b28c [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version
This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).

The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).

After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10608 from JoshRosen/SPARK-6363.
2016-01-30 00:20:28 -08:00
Josh Rosen 8dbbf3e75e [SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile
This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version.

/cc rxin srowen

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10775 from JoshRosen/add-hadoop-2.7-profile.
2016-01-15 17:07:24 -08:00
Reynold Xin 5b0d544339 [SPARK-12735] Consolidate & move spark-ec2 to AMPLab managed repository.
Author: Reynold Xin <rxin@databricks.com>

Closes #10673 from rxin/SPARK-12735.
2016-01-09 20:28:20 -08:00
Reynold Xin 0a38637d05 [SPARK-11807] Remove support for Hadoop < 2.2
i.e. Hadoop 1 and Hadoop 2.0

Author: Reynold Xin <rxin@databricks.com>

Closes #10404 from rxin/SPARK-11807.
2015-12-21 22:15:52 -08:00
Josh Rosen d0482f6af3 [SPARK-10932] [PROJECT INFRA] Port two minor changes to release-build.sh from scripts' old repo
Spark's release packaging scripts used to live in a separate repository. Although these scripts are now part of the Spark repo, there are some minor patches made against the old repos that are missing in Spark's copy of the script. This PR ports those changes.

/cc shivaram, who originally submitted these changes against https://github.com/rxin/spark-utils

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8986 from JoshRosen/port-release-build-fixes-from-rxin-repo.
2015-10-13 15:18:20 -07:00
Luciano Resende 1894653edc [SPARK-10511] [BUILD] Reset git repository before packaging source distro
The calculation of Spark version is downloading
Scala and Zinc in the build directory which is
inflating the size of the source distribution.

Reseting the repo before packaging the source
distribution fix this issue.

Author: Luciano Resende <lresende@apache.org>

Closes #8774 from lresende/spark-10511.
2015-09-16 10:47:30 +01:00
Holden Karau 48817cc111 [SPARK-10497] [BUILD] [TRIVIAL] Handle both locations for JIRAError with python-jira
Location of JIRAError has moved between old and new versions of python-jira package.
Longer term it probably makes sense to pin to specific versions (as mentioned in https://issues.apache.org/jira/browse/SPARK-10498 ) but for now, making release tools works with both new and old versions of python-jira.

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8661 from holdenk/SPARK-10497-release-utils-does-not-work-with-new-jira-python.
2015-09-10 16:42:12 +02:00
Reynold Xin ae74c3fa84 [RELEASE] Add more contributors & only show names in release notes.
Author: Reynold Xin <rxin@databricks.com>

Closes #8660 from rxin/contrib.
2015-09-08 17:36:00 -07:00
Josh Rosen 12de348332 [SPARK-10126] [PROJECT INFRA] Fix typo in release-build.sh which broke snapshot publishing for Scala 2.11
The current `release-build.sh` has a typo which breaks snapshot publication for Scala 2.11. We should change the Scala version to 2.11 and clean before building a 2.11 snapshot.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8325 from JoshRosen/fix-2.11-snapshots.
2015-08-20 11:31:03 -07:00
Patrick Wendell 3ef0f32928 [SPARK-1517] Refactor release scripts to facilitate nightly publishing
This update contains some code changes to the release scripts that allow easier nightly publishing. I've been using these new scripts on Jenkins for cutting and publishing nightly snapshots for the last month or so, and it has been going well. I'd like to get them merged back upstream so this can be maintained by the community.

The main changes are:
1. Separates the release tagging from various build possibilities for an already tagged release (`release-tag.sh` and `release-build.sh`).
2. Allow for injecting credentials through the environment, including GPG keys. This is then paired with secure key injection in Jenkins.
3. Support for copying build results to a remote directory, and also "rotating" results, e.g. the ability to keep the last N copies of binary or doc builds.

I'm happy if anyone wants to take a look at this - it's not user facing but an internal utility used for generating releases.

Author: Patrick Wendell <patrick@databricks.com>

Closes #7411 from pwendell/release-script-updates and squashes the following commits:

74f9beb [Patrick Wendell] Moving maven build command to a variable
233ce85 [Patrick Wendell] [SPARK-1517] Refactor release scripts to facilitate nightly publishing
2015-08-11 21:16:48 -07:00
Sean Owen 6e5fd613ea [SPARK-9507] [BUILD] Remove dependency reduced POM hack now that shade plugin is updated
Update to shade plugin 2.4.1, which removes the need for the dependency-reduced-POM workaround and the 'release' profile. Fix management of shade plugin version so children inherit it; bump assembly plugin version while here

See https://issues.apache.org/jira/browse/SPARK-8819

I verified that `mvn clean package -DskipTests` works with Maven 3.3.3.

pwendell are you up for trying this for the 1.5.0 release?

Author: Sean Owen <sowen@cloudera.com>

Closes #7826 from srowen/SPARK-9507 and squashes the following commits:

e0b0fd2 [Sean Owen] Update to shade plugin 2.4.1, which removes the need for the dependency-reduced-POM workaround and the 'release' profile. Fix management of shade plugin version so children inherit it; bump assembly plugin version while here
2015-07-31 21:51:55 +01:00
Michael Allman f5b6dc5e3e [SPARK-8401] [BUILD] Scala version switching build enhancements
These commits address a few minor issues in the Scala cross-version support in the build:

  1. Correct two missing `${scala.binary.version}` pom file substitutions.
  2. Don't update `scala.binary.version` in parent POM. This property is set through profiles.
  3. Update the source of the generated scaladocs in `docs/_plugins/copy_api_dirs.rb`.
  4. Factor common code out of `dev/change-version-to-*.sh` and add some validation. We also test `sed` to see if it's GNU sed and try `gsed` as an alternative if not. This prevents the script from running with a non-GNU sed.

This is my original work and I license this work to the Spark project under the Apache License.

Author: Michael Allman <michael@videoamp.com>

Closes #6832 from mallman/scala-versions and squashes the following commits:

cde2f17 [Michael Allman] Delete dev/change-version-to-*.sh, replacing them with single dev/change-scala-version.sh script that takes a version as argument
02296f2 [Michael Allman] Make the scala version change scripts cross-platform by restricting ourselves to POSIX sed syntax instead of looking for GNU sed
ad9b40a [Michael Allman] Factor change-scala-version.sh out of change-version-to-*.sh, adding command line argument validation and testing for GNU sed
bdd20bf [Michael Allman] Update source of scaladocs when changing Scala version
475088e [Michael Allman] Replace jackson-module-scala_2.10 with jackson-module-scala_${scala.binary.version}
2015-07-21 11:14:31 +01:00
Patrick Wendell 5572fd0c51 [HOTFIX] Adding new names to known contributors 2015-07-14 21:44:47 -07:00
Patrick Wendell 1cb2629f1a [HOTFIX] Rename release-profile to release
when publishing releases. We named it as 'release-profile' because that is
the Maven convention. However, it turns out this special name causes several
other things to kick-in when we are creating releases that are not desirable.
For instance, it triggers the javadoc plugin to run, which actually fails
in our current build set-up.

The fix is just to rename this to a different profile to have no
collateral damage associated with its use.
2015-07-06 22:17:30 -07:00
Andrew Or 9eae5fa642 [SPARK-8819] Fix build for maven 3.3.x
This is a workaround for MSHADE-148, which leads to an infinite loop when building Spark with maven 3.3.x. This was originally caused by #6441, which added a bunch of test dependencies on the spark-core test module. Recently, it was revealed by #7193.

This patch adds a `-Prelease` profile. If present, it will set `createDependencyReducedPom` to true. The consequences are:
- If you are releasing Spark with this profile, you are fine as long as you use maven 3.2.x or before.
- If you are releasing Spark without this profile, you will run into SPARK-8781.
- If you are not releasing Spark but you are using this profile, you may run into SPARK-8819.
- If you are not releasing Spark and you did not include this profile, you are fine.

This is all documented in `pom.xml` and tested locally with both versions of maven.

Author: Andrew Or <andrew@databricks.com>

Closes #7219 from andrewor14/fix-maven-build and squashes the following commits:

1d37e87 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-maven-build
3574ae4 [Andrew Or] Review comments
f39199c [Andrew Or] Create a -Prelease profile that flags `createDependencyReducedPom`
2015-07-06 19:22:30 -07:00
Patrick Wendell e84545fa77 [HOTFIX] Fixing errors in name mappings 2015-06-10 22:56:36 -07:00
Patrick Wendell a777eb04bf [HOTFIX] Adding more contributor name bindings 2015-06-10 21:14:03 -07:00
Shivaram Venkataraman 3dc005282a [SPARK-8027] [SPARKR] Move man pages creation to install-dev.sh
This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available

Related to discussion in #6567

cc pwendell srowen -- Let me know if this looks better

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:

b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
2015-06-04 12:52:16 -07:00
Shivaram Venkataraman cae9306c4f [SPARK-8027] [SPARKR] Add maven profile to build R package docs
Also use that profile in create-release.sh

cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that.

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6567 from shivaram/SPARK-8027 and squashes the following commits:

8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh
2015-06-01 21:21:45 -07:00