Commit graph

791 commits

Author SHA1 Message Date
Yuming Wang 5de5e46624 [SPARK-30268][INFRA] Fix incorrect pyspark version when releasing preview versions
### What changes were proposed in this pull request?

This PR fix incorrect pyspark version when releasing preview versions.

### Why are the changes needed?

Failed to make Spark binary distribution:
```
cp: cannot stat 'spark-3.0.0-preview2-bin-hadoop2.7/python/dist/pyspark-3.0.0.dev02.tar.gz': No such file or directory
gpg: can't open 'pyspark-3.0.0.dev02.tar.gz': No such file or directory
gpg: signing failed: No such file or directory
gpg: pyspark-3.0.0.dev02.tar.gz: No such file or directory
```

```
yumwangubuntu-3513086:~/spark-release/output$ ll spark-3.0.0-preview2-bin-hadoop2.7/python/dist/
total 214140
drwxr-xr-x 2 yumwang stack      4096 Dec 16 06:17 ./
drwxr-xr-x 9 yumwang stack      4096 Dec 16 06:17 ../
-rw-r--r-- 1 yumwang stack 219267173 Dec 16 06:17 pyspark-3.0.0.dev2.tar.gz
```

```
/usr/local/lib/python3.6/dist-packages/setuptools/dist.py:476: UserWarning: Normalizing '3.0.0.dev02' to '3.0.0.dev2'
  normalized_version,
```

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
```
LM-SHC-16502798:spark yumwang$ SPARK_VERSION=3.0.0-preview2
LM-SHC-16502798:spark yumwang$ echo "$SPARK_VERSION" |  sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/"
3.0.0.dev2

```

Closes #26909 from wangyum/SPARK-30268.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-17 10:22:29 +09:00
Yuming Wang ba0f59bfaf [SPARK-30265][INFRA] Do not change R version when releasing preview versions
### What changes were proposed in this pull request?
This PR makes it do not change R version when releasing preview versions.

### Why are the changes needed?
Failed to make Spark binary distribution:
```
++ . /opt/spark-rm/output/spark-3.0.0-preview2-bin-hadoop2.7/R/find-r.sh
+++ '[' -z /usr/bin ']'
++ /usr/bin/Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }'
Loading required package: usethis
Updating SparkR documentation
First time using roxygen2. Upgrading automatically...
Loading SparkR
Invalid DESCRIPTION:
Malformed package version.

See section 'The DESCRIPTION file' in the 'Writing R Extensions'
manual.

Error: invalid version specification '3.0.0-preview2'
In addition: Warning message:
roxygen2 requires Encoding: UTF-8
Execution halted
[ERROR] Command execution failed.
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
    at org.apache.commons.exec.DefaultExecutor.executeInternal (DefaultExecutor.java:404)
    at org.apache.commons.exec.DefaultExecutor.execute (DefaultExecutor.java:166)
    at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:804)
    at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:751)
    at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:313)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:193)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-preview2:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 18.619 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 13.652 s]
[INFO] Spark Project Sketch ............................... SUCCESS [  5.673 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  2.081 s]
[INFO] Spark Project Networking ........................... SUCCESS [  3.509 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  0.993 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  7.556 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  5.522 s]
[INFO] Spark Project Core ................................. FAILURE [01:06 min]
[INFO] Spark Project ML Local Library ..................... SKIPPED
[INFO] Spark Project GraphX ............................... SKIPPED
[INFO] Spark Project Streaming ............................ SKIPPED
[INFO] Spark Project Catalyst ............................. SKIPPED
[INFO] Spark Project SQL .................................. SKIPPED
[INFO] Spark Project ML Library ........................... SKIPPED
[INFO] Spark Project Tools ................................ SKIPPED
[INFO] Spark Project Hive ................................. SKIPPED
[INFO] Spark Project Graph API ............................ SKIPPED
[INFO] Spark Project Cypher ............................... SKIPPED
[INFO] Spark Project Graph ................................ SKIPPED
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project Assembly ............................. SKIPPED
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SKIPPED
[INFO] Spark Integration for Kafka 0.10 ................... SKIPPED
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED
[INFO] Spark Avro ......................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  02:04 min
[INFO] Finished at: 2019-12-16T08:02:45Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:exec (sparkr-pkg) on project spark-core_2.12: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :spark-core_2.12
```

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
```diff
diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R
index cdb59093781..b648c51e010 100644
--- a/R/pkg/R/sparkR.R
+++ b/R/pkg/R/sparkR.R
 -336,8 +336,8  sparkR.session <- function(

   # Check if version number of SparkSession matches version number of SparkR package
   jvmVersion <- callJMethod(sparkSession, "version")
-  # Remove -SNAPSHOT from jvm versions
-  jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion)
+  # Remove -preview2 from jvm versions
+  jvmVersionStrip <- gsub("-preview2", "", jvmVersion)
   rPackageVersion <- paste0(packageVersion("SparkR"))

   if (jvmVersionStrip != rPackageVersion) {

```

Closes #26904 from wangyum/SPARK-30265.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-16 04:54:12 -07:00
Yuming Wang 1fc353d51a Revert "[SPARK-30056][INFRA] Skip building test artifacts in dev/make-distribution.sh
### What changes were proposed in this pull request?

This reverts commit 7c0ce285.

### Why are the changes needed?

Failed to make distribution:
```
[INFO] -----------------< org.apache.spark:spark-sketch_2.12 >-----------------
[INFO] Building Spark Project Sketch 3.0.0-preview2                      [3/33]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] Downloading from central: https://repo.maven.apache.org/maven2/org/apache/spark/spark-tags_2.12/3.0.0-preview2/spark-tags_2.12-3.0.0-preview2-tests.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-preview2:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 26.513 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 48.393 s]
[INFO] Spark Project Sketch ............................... FAILURE [  0.034 s]
[INFO] Spark Project Local DB ............................. SKIPPED
[INFO] Spark Project Networking ........................... SKIPPED
[INFO] Spark Project Shuffle Streaming Service ............ SKIPPED
[INFO] Spark Project Unsafe ............................... SKIPPED
[INFO] Spark Project Launcher ............................. SKIPPED
[INFO] Spark Project Core ................................. SKIPPED
[INFO] Spark Project ML Local Library ..................... SKIPPED
[INFO] Spark Project GraphX ............................... SKIPPED
[INFO] Spark Project Streaming ............................ SKIPPED
[INFO] Spark Project Catalyst ............................. SKIPPED
[INFO] Spark Project SQL .................................. SKIPPED
[INFO] Spark Project ML Library ........................... SKIPPED
[INFO] Spark Project Tools ................................ SKIPPED
[INFO] Spark Project Hive ................................. SKIPPED
[INFO] Spark Project Graph API ............................ SKIPPED
[INFO] Spark Project Cypher ............................... SKIPPED
[INFO] Spark Project Graph ................................ SKIPPED
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project YARN Shuffle Service ................. SKIPPED
[INFO] Spark Project YARN ................................. SKIPPED
[INFO] Spark Project Mesos ................................ SKIPPED
[INFO] Spark Project Kubernetes ........................... SKIPPED
[INFO] Spark Project Hive Thrift Server ................... SKIPPED
[INFO] Spark Project Assembly ............................. SKIPPED
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SKIPPED
[INFO] Spark Integration for Kafka 0.10 ................... SKIPPED
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED
[INFO] Spark Avro ......................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:15 min
[INFO] Finished at: 2019-12-16T05:29:43Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project spark-sketch_2.12: Could not resolve dependencies for project org.apache.spark:spark-sketch_2.12🫙3.0.0-preview2: Could not find artifact org.apache.spark:spark-tags_2.12🫙tests:3.0.0-preview2 in central (https://repo.maven.apache.org/maven2) -> [Help 1]
[ERROR]
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

manual test.

Closes #26902 from wangyum/SPARK-30056.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-15 23:16:17 -07:00
Yuming Wang 26b658f6fb [SPARK-30253][INFRA] Do not add commits when releasing preview version
### What changes were proposed in this pull request?

This PR add support do not add commits to master branch when releasing preview version.

### Why are the changes needed?

We need manual revert this change, example:
![image](https://user-images.githubusercontent.com/5399861/70788945-f9d15180-1dcc-11ea-81f5-c0d89c28440a.png)

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?

manual test

Closes #26879 from wangyum/SPARK-30253.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-15 19:44:29 -07:00
Yuming Wang e1ee3fb72f [SPARK-30216][INFRA] Use python3 in Docker release image
### What changes were proposed in this pull request?

- Reverts commit 1f94bf4 and d6be46e
- Switches python to python3 in Docker release image.

### Why are the changes needed?
`dev/make-distribution.sh` and `python/setup.py` are use python3.
https://github.com/apache/spark/pull/26844/files#diff-ba2c046d92a1d2b5b417788bfb5cb5f8L236
https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?

manual test:
```
yumwangubuntu-3513086:~/spark$ dev/create-release/do-release-docker.sh -n -d /home/yumwang/spark-release
Output directory already exists. Overwrite and continue? [y/n] y
Branch [branch-2.4]: master
Current branch version is 3.0.0-SNAPSHOT.
Release [3.0.0]: 3.0.0-preview2
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [master]:
ASF user [yumwang]:
Full name [Yuming Wang]:
GPG key [yumwangapache.org]: DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338
================
Release details:
BRANCH:     master
VERSION:    3.0.0-preview2
TAG:        v3.0.0-preview2-rc1
NEXT:       3.0.1-SNAPSHOT

ASF USER:   yumwang
GPG KEY:    DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338
FULL NAME:  Yuming Wang
E-MAIL:     yumwangapache.org
================
Is this info correct [y/n]? y
GPG passphrase:

========================
= Building spark-rm image with tag latest...
Command: docker build -t spark-rm:latest --build-arg UID=110302528 /home/yumwang/spark/dev/create-release/spark-rm
Log file: docker-build.log
Building v3.0.0-preview2-rc1; output will be at /home/yumwang/spark-release/output

gpg: directory '/home/spark-rm/.gnupg' created
gpg: keybox '/home/spark-rm/.gnupg/pubring.kbx' created
gpg: /home/spark-rm/.gnupg/trustdb.gpg: trustdb created
gpg: key 6E1B4122F6A3A338: public key "Yuming Wang <yumwangapache.org>" imported
gpg: key 6E1B4122F6A3A338: secret key imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg:       secret keys read: 1
gpg:   secret keys imported: 1
========================
= Creating release tag v3.0.0-preview2-rc1...
Command: /opt/spark-rm/release-tag.sh
Log file: tag.log
It may take some time for the tag to be synchronized to github.
Press enter when you've verified that the new tag (v3.0.0-preview2-rc1) is available.
========================
= Building Spark...
Command: /opt/spark-rm/release-build.sh package
Log file: build.log
========================
= Building documentation...
Command: /opt/spark-rm/release-build.sh docs
Log file: docs.log
========================
= Publishing release
Command: /opt/spark-rm/release-build.sh publish-release
Log file: publish.log
```
Generated doc:
![image](https://user-images.githubusercontent.com/5399861/70693075-a7723100-1cf7-11ea-9f88-9356a02349a1.png)

Closes #26848 from wangyum/SPARK-30216.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-13 11:31:31 -08:00
Dongjoon Hyun cc276f8a6e [SPARK-30243][BUILD][K8S] Upgrade K8s client dependency to 4.6.4
### What changes were proposed in this pull request?

This PR aims to upgrade K8s client library from 4.6.1 to 4.6.4 for `3.0.0-preview2`.

### Why are the changes needed?

This will bring the latest bug fixes.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.4
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.3
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.2

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with K8s integration test.

Closes #26874 from dongjoon-hyun/SPARK-30243.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-13 08:25:51 -08:00
Yuming Wang 39c0696a39 [MINOR] Fix google style guide address
### What changes were proposed in this pull request?

This PR update  google style guide address to `https://google.github.io/styleguide/javaguide.html`.

### Why are the changes needed?

`https://google-styleguide.googlecode.com/svn-history/r130/trunk/javaguide.html` **404**:

![image](https://user-images.githubusercontent.com/5399861/70717915-431c9500-1d2a-11ea-895b-024be953a116.png)

### Does this PR introduce any user-facing change?
No

### How was this patch tested?

Closes #26865 from wangyum/fix-google-styleguide.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2019-12-12 11:04:01 -06:00
Dongjoon Hyun b709091b4f [SPARK-30228][BUILD] Update zstd-jni to 1.4.4-3
### What changes were proposed in this pull request?

This PR aims to update zstd-jni library to 1.4.4-3.

### Why are the changes needed?

This will bring the latest bug fixes in zstd itself and some performance improvement.
- https://github.com/facebook/zstd/releases/tag/v1.4.4

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #26856 from dongjoon-hyun/SPARK-ZSTD-144.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-12 14:16:32 +09:00
Yuming Wang eb509968a7 [SPARK-30211][INFRA] Use python3 in make-distribution.sh
### What changes were proposed in this pull request?

This PR switches python to python3 in `make-distribution.sh`.

### Why are the changes needed?

SPARK-29672 changed this
- https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #26844 from wangyum/SPARK-30211.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-10 23:30:12 -08:00
Takeshi Yamamuro be867e8a9e [SPARK-30196][BUILD] Bump lz4-java version to 1.7.0
### What changes were proposed in this pull request?

This pr intends to upgrade lz4-java from 1.6.0 to 1.7.0.

### Why are the changes needed?

This release includes a performance bug (https://github.com/lz4/lz4-java/pull/143) fixed by JoshRosen and some improvements (e.g., LZ4 binary update). You can see the link below for the changes;
https://github.com/lz4/lz4-java/blob/master/CHANGES.md#170

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #26823 from maropu/LZ4_1_7_0.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-10 12:22:03 +09:00
Dongjoon Hyun afc4fa02bd [SPARK-30156][BUILD] Upgrade Jersey from 2.29 to 2.29.1
### What changes were proposed in this pull request?

This PR aims to upgrade `Jersey` from 2.29 to 2.29.1.

### Why are the changes needed?

This will bring several bug fixes and important dependency upgrades.
- https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.29.1.html

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #26785 from dongjoon-hyun/SPARK-30156.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-06 18:49:43 -08:00
Dongjoon Hyun 1e0037b5e9 [SPARK-30157][BUILD][TEST-HADOOP3.2][TEST-JAVA11] Upgrade Apache HttpCore from 4.4.10 to 4.4.12
### What changes were proposed in this pull request?

This PR aims to upgrade `Apache HttpCore` from 4.4.10 to 4.4.12.

### Why are the changes needed?

`Apache HttpCore v4.4.11` is the first official release for JDK11.
> This is a maintenance release that corrects a number of defects in non-blocking SSL session code that caused compatibility issues with TLSv1.3 protocol implementation shipped with Java 11.

For the full release note, please see the following.
- https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins.

Closes #26786 from dongjoon-hyun/SPARK-30157.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-07 10:59:10 +09:00
Dongjoon Hyun 1595e46a4e [SPARK-30142][TEST-MAVEN][BUILD] Upgrade Maven to 3.6.3
### What changes were proposed in this pull request?

This PR aims to upgrade Maven from 3.6.2 to 3.6.3.

### Why are the changes needed?

This will bring bug fixes like the following.
- MNG-6759 Maven fails to use <repositories> section from dependency when resolving transitive dependencies in some cases
- MNG-6760 ExclusionArtifactFilter result invalid when wildcard exclusion is followed by other exclusions

The following is the full release note.
- https://maven.apache.org/docs/3.6.3/release-notes.html

### Does this PR introduce any user-facing change?

No. (This is a dev-environment change.)

### How was this patch tested?

Pass the Jenkins with both SBT and Maven.

Closes #26770 from dongjoon-hyun/SPARK-30142.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-06 23:41:59 +09:00
Dongjoon Hyun f3abee377d [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency
### What changes were proposed in this pull request?

This PR aims to cut `org.eclipse.jetty:jetty-webapp`and `org.eclipse.jetty:jetty-xml` transitive dependency from `hadoop-common`.

### Why are the changes needed?

This will simplify our dependency management by the removal of unused dependencies.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action with all combinations and the Jenkins UT with (Hadoop-3.2).

Closes #26742 from dongjoon-hyun/SPARK-30051.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-03 14:33:36 -08:00
Sean Owen 4193d2f4cc [SPARK-30012][CORE][SQL] Change classes extending scala collection classes to work with 2.13
### What changes were proposed in this pull request?

Move some classes extending Scala collections into parallel source trees, to support 2.13; other minor collection-related modifications.

Modify some classes extending Scala collections to work with 2.13 as well as 2.12. In many cases, this means introducing parallel source trees, as the type hierarchy changed in ways that one class can't support both.

### Why are the changes needed?

To support building for Scala 2.13 in the future.

### Does this PR introduce any user-facing change?

There should be no behavior change.

### How was this patch tested?

Existing tests. Note that the 2.13 changes are not tested by the PR builder, of course. They compile in 2.13 but can't even be tested locally. Later, once the project can be compiled for 2.13, thus tested, it's possible the 2.13 implementations will need updates.

Closes #26728 from srowen/SPARK-30012.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-03 08:59:43 -08:00
HyukjinKwon 32af7004a2 [SPARK-25016][INFRA][FOLLOW-UP] Remove leftover for dropping Hadoop 2.6 in Jenkins's test script
### What changes were proposed in this pull request?

This PR proposes to remove the leftover. After https://github.com/apache/spark/pull/22615, we don't have Hadoop 2.6 profile anymore in master.

### Why are the changes needed?

Using "test-hadoop2.6" against master branch in a PR wouldn't work.

### Does this PR introduce any user-facing change?

No (dev only).

### How was this patch tested?

Manually tested at https://github.com/apache/spark/pull/26707 and Jenkins build will test.

Without this fix, and hadoop2.6 in the pr title, it shows as below:

```
========================================================================
Building Spark
========================================================================
[error] Could not find hadoop2.6 in the list. Valid options  are dict_keys(['hadoop2.7', 'hadoop3.2'])
Attempting to post to Github...
```

Closes #26708 from HyukjinKwon/SPARK-25016.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-30 12:49:14 +09:00
HyukjinKwon 4a73bed318 [SPARK-29991][INFRA] Support Hive 1.2 and Hive 2.3 (default) in PR builder
### What changes were proposed in this pull request?

Currently, Apache Spark PR Builder using `hive-1.2` for `hadoop-2.7` and `hive-2.3` for `hadoop-3.2`. This PR aims to support

- `[test-hive1.2]`  in PR builder
- `[test-hive2.3]` in PR builder to be consistent and independent of the default profile
- After this PR, all PR builders will use Hive 2.3 by default (because Spark uses Hive 2.3 by default as of c98e5eb339)
- Use default profile in AppVeyor build.

Note that this was reverted due to unexpected test failure at `ThriftServerPageSuite`, which was investigated in https://github.com/apache/spark/pull/26706 . This PR fixed it by letting it use their own forked JVM. There is no explicit evidence for this fix and it was just my speculation, and thankfully it fixed at least.

### Why are the changes needed?
This new tag allows us more flexibility.

### Does this PR introduce any user-facing change?
No. (This is a dev-only change.)

### How was this patch tested?
Check the Jenkins triggers in this PR.

Default:

```
========================================================================
Building Spark
========================================================================
[info] Building Spark using SBT with these arguments:  -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver -Pmesos -Pspark-ganglia-lgpl -Phadoop-cloud -Phive -Pkubernetes -Pkinesis-asl -Pyarn test:package streaming-kinesis-asl-assembly/assembly
```

`[test-hive1.2][test-hadoop3.2]`:

```
========================================================================
Building Spark
========================================================================
[info] Building Spark using SBT with these arguments:  -Phadoop-3.2 -Phive-1.2 -Phadoop-cloud -Pyarn -Pspark-ganglia-lgpl -Phive -Phive-thriftserver -Pmesos -Pkubernetes -Pkinesis-asl test:package streaming-kinesis-asl-assembly/assembly
```

`[test-maven][test-hive-2.3]`:

```
========================================================================
Building Spark
========================================================================
[info] Building Spark using Maven with these arguments:  -Phadoop-2.7 -Phive-2.3 -Pspark-ganglia-lgpl -Pyarn -Phive -Phadoop-cloud -Pkinesis-asl -Pmesos -Pkubernetes -Phive-thriftserver clean package -DskipTests
```

Closes #26710 from HyukjinKwon/SPARK-29991.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-30 12:48:15 +09:00
HyukjinKwon 9351e3e76f Revert "[SPARK-29991][INFRA] Support test-hive1.2 in PR Builder"
This reverts commit dde0d2fcad.
2019-11-29 13:23:22 +09:00
Dongjoon Hyun dde0d2fcad [SPARK-29991][INFRA] Support test-hive1.2 in PR Builder
### What changes were proposed in this pull request?

Currently, Apache Spark PR Builder using `hive-1.2` for `hadoop-2.7` and `hive-2.3` for `hadoop-3.2`. This PR aims to support `[test-hive1.2]` in PR Builder in order to cut the correlation between `hive-1.2/2.3` to `hadoop-2.7/3.2`. After this PR, the PR Builder will use `hive-2.3` by default for all profiles (if there is no `test-hive1.2`.)

### Why are the changes needed?

This new tag allows us more flexibility.

### Does this PR introduce any user-facing change?

No. (This is a dev-only change.)

### How was this patch tested?

Check the Jenkins triggers in this PR.

**BEFORE**
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using SBT with these arguments:  -Phadoop-2.7 -Phive-1.2 -Pyarn -Pkubernetes -Phive -Phadoop-cloud -Pspark-ganglia-lgpl -Phive-thriftserver -Pkinesis-asl -Pmesos test:package streaming-kinesis-asl-assembly/assembly
```

**AFTER**
1. Title: [[SPARK-29991][INFRA][test-hive1.2] Support `test-hive1.2` in PR Builder](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114550/testReport)
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using SBT with these arguments:  -Phadoop-2.7 -Phive-1.2 -Pkinesis-asl -Phadoop-cloud -Pyarn -Phive -Pmesos -Pspark-ganglia-lgpl -Pkubernetes -Phive-thriftserver test:package streaming-kinesis-asl-assembly/assembly
```

2. Title: [[SPARK-29991][INFRA] Support `test hive1.2` in PR Builder](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114551/testReport)
- Note that I removed the hyphen intentionally from `test-hive1.2`.
```
========================================================================
Building Spark
========================================================================
[info] Building Spark using SBT with these arguments:  -Phadoop-2.7 -Phive-thriftserver -Pkubernetes -Pspark-ganglia-lgpl -Phadoop-cloud -Phive -Pmesos -Pyarn -Pkinesis-asl test:package streaming-kinesis-asl-assembly/assembly
```

Closes #26695 from dongjoon-hyun/SPARK-29991.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-27 21:14:40 -08:00
Dongjoon Hyun 9459833eae [SPARK-29989][INFRA] Add hadoop-2.7/hive-2.3 pre-built distribution
### What changes were proposed in this pull request?

This PR aims to add another pre-built binary distribution with `-Phadoop-2.7 -Phive-1.2` at `Apache Spark 3.0.0`.

**PRE-BUILT BINARY DISTRIBUTION**
```
spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz
spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz.asc
spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz.sha512
```

**CONTENTS (snippet)**
```
$ ls *hadoop-*
hadoop-annotations-2.7.4.jar                hadoop-mapreduce-client-shuffle-2.7.4.jar
hadoop-auth-2.7.4.jar                       hadoop-yarn-api-2.7.4.jar
hadoop-client-2.7.4.jar                     hadoop-yarn-client-2.7.4.jar
hadoop-common-2.7.4.jar                     hadoop-yarn-common-2.7.4.jar
hadoop-hdfs-2.7.4.jar                       hadoop-yarn-server-common-2.7.4.jar
hadoop-mapreduce-client-app-2.7.4.jar       hadoop-yarn-server-web-proxy-2.7.4.jar
hadoop-mapreduce-client-common-2.7.4.jar    parquet-hadoop-1.10.1.jar
hadoop-mapreduce-client-core-2.7.4.jar      parquet-hadoop-bundle-1.6.0.jar
hadoop-mapreduce-client-jobclient-2.7.4.jar

$ ls *hive-*
hive-beeline-1.2.1.spark2.jar                   hive-jdbc-1.2.1.spark2.jar
hive-cli-1.2.1.spark2.jar                       hive-metastore-1.2.1.spark2.jar
hive-exec-1.2.1.spark2.jar                      spark-hive-thriftserver_2.12-3.0.0-SNAPSHOT.jar
```

### Why are the changes needed?

Since Apache Spark switched to use `-Phive-2.3` by default, all pre-built binary distribution will use `-Phive-2.3`. This PR adds `hadoop-2.7/hive-1.2` distribution to provide a similar combination like `Apache Spark 2.4` line.

### Does this PR introduce any user-facing change?

Yes. This is additional distribution which resembles to `Apache Spark 2.4` line in terms of `hive` version.

### How was this patch tested?

Manual.

Please note that we need a dry-run mode, but the AS-IS release script do not generate additional combinations including this in `dry-run` mode.

Closes #26688 from dongjoon-hyun/SPARK-29989.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-11-27 15:55:52 -08:00
Dongjoon Hyun 7c0ce28501 [SPARK-30056][INFRA] Skip building test artifacts in dev/make-distribution.sh
### What changes were proposed in this pull request?

This PR aims to skip building test artifacts in `dev/make-distribution.sh`.
Since Apache Spark 3.0.0, we need to build additional binary distribution, this helps the release process by speeding up building multiple binary distributions.

### Why are the changes needed?

Since the generated binary artifacts are irrelevant to the test jars, we can skip this.

**BEFORE**
```
$ time dev/make-distribution.sh
      726.86 real      2526.04 user        45.63 sys
```

**AFTER**
```
$ time dev/make-distribution.sh
      305.54 real      1099.99 user        26.52 sys
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually check `dev/make-distribution.sh` result and time.

Closes #26689 from dongjoon-hyun/SPARK-30056.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-27 18:19:21 +09:00
Dongjoon Hyun bdf0c606b6 [SPARK-28752][BUILD][FOLLOWUP] Fix to install rouge instead of rogue
### What changes were proposed in this pull request?

This PR aims to fix a type; `rogue` -> `rouge` .
This is a follow-up of https://github.com/apache/spark/pull/26521.

### Why are the changes needed?

To support `Python 3`, we upgraded from `pygments` to `rouge`.

### Does this PR introduce any user-facing change?

No. (This is for only document generation.)

### How was this patch tested?

Manually.
```
$ docker build -t test dev/create-release/spark-rm/
...
1 gem installed
Successfully installed rouge-3.13.0
Parsing documentation for rouge-3.13.0
Installing ri documentation for rouge-3.13.0
Done installing documentation for rouge after 4 seconds
1 gem installed
Removing intermediate container 9bd8707d9e84
 ---> a18b2f6b0bb9
...
```

Closes #26686 from dongjoon-hyun/SPARK-28752.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-26 14:58:20 -08:00
Sean Owen e23c135e56 [SPARK-29293][BUILD] Move scalafmt to Scala 2.12 profile; bump to 0.12
### What changes were proposed in this pull request?

Move scalafmt to Scala 2.12 profile; bump to 0.12.

### Why are the changes needed?

To facilitate a future Scala 2.13 build.

### Does this PR introduce any user-facing change?

None.

### How was this patch tested?

This isn't covered by tests, it's a convenience for contributors.

Closes #26655 from srowen/SPARK-29293.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-26 09:59:19 -08:00
Dongjoon Hyun c2d513f8e9 [SPARK-30035][BUILD] Upgrade to Apache Commons Lang 3.9
### What changes were proposed in this pull request?

This PR aims to upgrade to `Apache Commons Lang 3.9`.

### Why are the changes needed?

`Apache Commons Lang 3.9` is the first official release to support JDK9+. The following is the full release note.
- https://commons.apache.org/proper/commons-lang/release-notes/RELEASE-NOTES-3.9.txt

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #26672 from dongjoon-hyun/SPARK-30035.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-26 21:31:02 +09:00
Dongjoon Hyun 53e19f3678 [SPARK-30032][BUILD] Upgrade to ORC 1.5.8
### What changes were proposed in this pull request?

This PR aims to upgrade to Apache ORC 1.5.8.

### Why are the changes needed?

This will bring the latest bug fixes. The following is the full release note.
- https://issues.apache.org/jira/projects/ORC/versions/12346462

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #26669 from dongjoon-hyun/SPARK-ORC-1.5.8.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-25 20:08:11 -08:00
Dongjoon Hyun a1706e2fa7 [SPARK-30005][INFRA] Update test-dependencies.sh to check hive-1.2/2.3 profile
### What changes were proposed in this pull request?

This PR aims to update `test-dependencies.sh` to validate all available `Hadoop/Hive` combination.

### Why are the changes needed?

Previously, we have been checking only `Hadoop2.7/Hive1.2` and `Hadoop3.2/Hive2.3`.
We need to validate `Hadoop2.7/Hive2.3` additionally for Apache Spark 3.0.

### Does this PR introduce any user-facing change?

No. (This is a dev-only change).

### How was this patch tested?

Pass the GitHub Action (Linter) with the newly updated manifest because this is only dependency check.

Closes #26646 from dongjoon-hyun/SPARK-30005.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-24 10:14:02 -08:00
Dongjoon Hyun a60da23d64 [SPARK-30007][INFRA] Publish snapshot/release artifacts with -Phive-2.3 only
### What changes were proposed in this pull request?

This PR aims to add `-Phive-2.3` to publish profiles.
Since Apache Spark 3.0.0, Maven artifacts will be publish with Apache Hive 2.3 profile only.

This PR also will recover `SNAPSHOT` publishing Jenkins job.
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/

We will provide the pre-built distributions (with Hive 1.2.1 also) like Apache Spark 2.4.
SPARK-29989 will update the release script to generate all combinations.

### Why are the changes needed?

This will reduce the explicit dependency on the illegitimate Hive fork in Maven repository.

### Does this PR introduce any user-facing change?

Yes, but this is dev only changes.

### How was this patch tested?

Manual.

Closes #26648 from dongjoon-hyun/SPARK-30007.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-23 22:34:21 -08:00
Dongjoon Hyun c98e5eb339 [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles
### What changes were proposed in this pull request?

This PR aims the followings.
- Add two profiles, `hive-1.2` and `hive-2.3` (default)
- Validate if we keep the existing combination at least. (Hadoop-2.7 + Hive 1.2 / Hadoop-3.2 + Hive 2.3).

For now, we assumes that `hive-1.2` is explicitly used with `hadoop-2.7` and `hive-2.3` with `hadoop-3.2`. The followings are beyond the scope of this PR.

- SPARK-29988 Adjust Jenkins jobs for `hive-1.2/2.3` combination
- SPARK-29989 Update release-script for `hive-1.2/2.3` combination
- SPARK-29991 Support `hive-1.2/2.3` in PR Builder

### Why are the changes needed?

This will help to switch our dependencies to update the exposed dependencies.

### Does this PR introduce any user-facing change?

This is a dev-only change that the build profile combinations are changed.
- `-Phadoop-2.7` => `-Phadoop-2.7 -Phive-1.2`
- `-Phadoop-3.2` => `-Phadoop-3.2 -Phive-2.3`

### How was this patch tested?

Pass the Jenkins with the dependency check and tests to make it sure we don't change anything for now.

- [Jenkins (-Phadoop-2.7 -Phive-1.2)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull)
- [Jenkins (-Phadoop-3.2 -Phive-2.3)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull)

Also, from now, GitHub Action validates the following combinations.
![gha](https://user-images.githubusercontent.com/9700541/69355365-822d5e00-0c36-11ea-93f7-e00e5459e1d0.png)

Closes #26619 from dongjoon-hyun/SPARK-29981.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-23 10:02:22 -08:00
Dongjoon Hyun 42f8f79ff0 [SPARK-29936][R] Fix SparkR lint errors and add lint-r GitHub Action
### What changes were proposed in this pull request?

This PR fixes SparkR lint errors and adds `lint-r` GitHub Action to protect the branch.

### Why are the changes needed?

It turns out that we currently don't run it. It's recovered yesterday. However, after that, our Jenkins linter jobs (`master`/`branch-2.4`) has been broken on `lint-r` tasks.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action on this PR in addition to Jenkins R and AppVeyor R.

Closes #26564 from dongjoon-hyun/SPARK-29936.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-17 21:01:01 -08:00
Dongjoon Hyun e1fc38b3e4 [SPARK-29932][R][TESTS] lint-r should do non-zero exit in case of errors
### What changes were proposed in this pull request?

This PR aims to make `lint-r` exits with non-zero in case of errors. Please note that `lint-r` works correctly when everything are installed correctly.

### Why are the changes needed?

There are two cases which hide errors from Jenkins/AppVeyor/GitHubAction.
1. `lint-r` exits with zero if there is no R installation.
```bash
$ dev/lint-r
dev/lint-r: line 25: type: Rscript: not found
ERROR: You should install R
$ echo $?
0
```

2. `lint-r` exits with zero if we didn't do `R/install-dev.sh`.
```bash
$ dev/lint-r
Error: You should install SparkR in a local directory with `R/install-dev.sh`.
In addition: Warning message:
In library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE) :
  no library trees found in 'lib.loc'
Execution halted
lintr checks passed.        // <=== Please note here
$ echo $?
0
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually check the above two cases.

Closes #26561 from dongjoon-hyun/SPARK-29932.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-17 10:09:46 -08:00
Dongjoon Hyun d0470d6394 [MINOR][TESTS] Ignore GitHub Action and AppVeyor file changes in testing
### What changes were proposed in this pull request?

This PR aims to ignore `GitHub Action` and `AppVeyor` file changes. When we touch these files, Jenkins job should not trigger a full testing.

### Why are the changes needed?

Currently, these files are categorized to `root` and trigger the full testing and ends up wasting the Jenkins resources.
- https://github.com/apache/spark/pull/26555
```
[info] Using build tool sbt with Hadoop profile hadoop2.7 under environment amplab_jenkins
From https://github.com/apache/spark
 * [new branch]      master     -> master
[info] Found the following changed modules: sparkr, root
[info] Setup the following environment variables for tests:
```

### Does this PR introduce any user-facing change?

No. (Jenkins testing only).

### How was this patch tested?

Manually.
```
$ dev/run-tests.py -h -v
...
Trying:
    [x.name for x in determine_modules_for_files([".github/workflows/master.yml", "appveyor.xml"])]
Expecting:
    []
...
```

Closes #26556 from dongjoon-hyun/SPARK-IGNORE-APPVEYOR.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-16 09:26:01 -08:00
HyukjinKwon d1ac25ba33 [SPARK-28752][BUILD][DOCS] Documentation build to support Python 3
### What changes were proposed in this pull request?

This PR proposes to switch `pygments.rb`, which only support Python 2 and seems inactive for the last few years (https://github.com/tmm1/pygments.rb), to Rouge which is pure Ruby code highlighter that is compatible with Pygments.

I thought it would be pretty difficult to change but thankfully Rouge does a great job as the alternative.

### Why are the changes needed?

We're moving to Python 3 and drop Python 2 completely.

### Does this PR introduce any user-facing change?

Maybe a little bit of different syntax style but should not have a notable change.

### How was this patch tested?

Manually tested the build and checked the documentation.

Closes #26521 from HyukjinKwon/SPARK-28752.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-15 13:44:20 +09:00
Bryan Cutler 65a189c7a1 [SPARK-29376][SQL][PYTHON] Upgrade Apache Arrow to version 0.15.1
### What changes were proposed in this pull request?

Upgrade Apache Arrow to version 0.15.1. This includes Java artifacts and increases the minimum required version of PyArrow also.

Version 0.12.0 to 0.15.1 includes the following selected fixes/improvements relevant to Spark users:

* ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes
* ARROW-6874 - [Python] Memory leak in Table.to_pandas() when conversion to object dtype
* ARROW-5579 - [Java] shade flatbuffer dependency
* ARROW-5843 - [Java] Improve the readability and performance of BitVectorHelper#getNullCount
* ARROW-5881 - [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits
* ARROW-5893 - [C++] Remove arrow::Column class from C++ library
* ARROW-5970 - [Java] Provide pointer to Arrow buffer
* ARROW-6070 - [Java] Avoid creating new schema before IPC sending
* ARROW-6279 - [Python] Add Table.slice method or allow slices in \_\_getitem\_\_
* ARROW-6313 - [Format] Tracking for ensuring flatbuffer serialized values are aligned in stream/files.
* ARROW-6557 - [Python] Always return pandas.Series from Array/ChunkedArray.to_pandas, propagate field names to Series from RecordBatch, Table
* ARROW-2015 - [Java] Use Java Time and Date APIs instead of JodaTime
* ARROW-1261 - [Java] Add container type for Map logical type
* ARROW-1207 - [C++] Implement Map logical type

Changelog can be seen at https://arrow.apache.org/release/0.15.0.html

### Why are the changes needed?

Upgrade to get bug fixes, improvements, and maintain compatibility with future versions of PyArrow.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests, manually tested with Python 3.7, 3.8

Closes #26133 from BryanCutler/arrow-upgrade-015-SPARK-29376.

Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-15 13:27:30 +09:00
shane knapp 04e99c1e1b [SPARK-29672][PYSPARK] update spark testing framework to use python3
### What changes were proposed in this pull request?

remove python2.7 tests and test infra for 3.0+

### Why are the changes needed?

because python2.7 is finally going the way of the dodo.

### Does this PR introduce any user-facing change?

newp.

### How was this patch tested?

the build system will test this

Closes #26330 from shaneknapp/remove-py27-tests.

Lead-authored-by: shane knapp <incomplete@gmail.com>
Co-authored-by: shane <incomplete@gmail.com>
Signed-off-by: shane knapp <incomplete@gmail.com>
2019-11-14 10:18:55 -08:00
Liang-Chi Hsieh ef1abf2e2c [SPARK-29747][BUILD] Bump joda-time version to 2.10.5
### What changes were proposed in this pull request?

This upgrades joda-time from 2.9 to 2.10.5.

### Why are the changes needed?

Joda 2.9 is almost 4 yrs ago and there are bugs fix and tz database updates.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #26389 from viirya/upgrade-joda.

Authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-05 10:08:19 +09:00
Sean Owen 19b8c71436 [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+
### What changes were proposed in this pull request?

Update the version of dropwizard metrics that Spark uses for metrics to 4.1.x, from 3.2.x.

### Why are the changes needed?

This helps JDK 9+ support, per for example https://github.com/dropwizard/metrics/pull/1236

### Does this PR introduce any user-facing change?

No, although downstream users with custom metrics may be affected.

### How was this patch tested?

Existing tests.

Closes #26332 from srowen/SPARK-29674.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 15:13:06 -08:00
Dongjoon Hyun 4bcfe5033c [SPARK-29731][INFRA] Use public JIRA REST API to read-only access
### What changes were proposed in this pull request?

This PR replaces `jira_client` API call for read-only access with public Apache JIRA REST API invocation.

### Why are the changes needed?

This will reduce the number of authenticated API invocations. I hope this will reduce the chance of CAPCHAR from Apache JIRA site.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manual.
```
$ echo 26375 > .github-jira-max
$ dev/github_jira_sync.py
Read largest PR number previously seen: 26375
Retrieved 100 JIRA PR's from Github
1 PR's remain after excluding visted ones
Checking issue SPARK-29731
Writing largest PR number seen: 26376
Build PR dictionary
SPARK-29731
26376
Set 26376 with labels "PROJECT INFRA"
```

Closes #26376 from dongjoon-hyun/SPARK-29731.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 11:17:53 -08:00
Dongjoon Hyun 1ac6bd9f79 [SPARK-29729][BUILD] Upgrade ASM to 7.2
### What changes were proposed in this pull request?

This PR aims to upgrade ASM to 7.2.
- https://issues.apache.org/jira/browse/XBEAN-322 (Upgrade to ASM 7.2)
- https://asm.ow2.io/versions.html

### Why are the changes needed?

This will bring the following patches.
- 317875: Infinite loop when parsing invalid method descriptor
- 317873: Add support for RET instruction in AdviceAdapter
- 317872: Throw an exception if visitFrame used incorrectly
- add support for Java 14

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing UTs.

Closes #26373 from dongjoon-hyun/SPARK-29729.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 10:42:38 -08:00
Xingbo Jiang 155a67d00c [SPARK-29666][BUILD] Fix the publish release failure under dry-run mode
### What changes were proposed in this pull request?

`release-build.sh` fail to publish release under dry run mode with the following error message:
```
/opt/spark-rm/release-build.sh: line 429: pushd: spark-repo-g4MBm/org/apache/spark: No such file or directory
```

We need to at least run the `mvn clean install` command once to create the `$tmp_repo` path, but now those steps are all skipped under dry-run mode. This PR fixes the issue.

### How was this patch tested?

Tested locally.

Closes #26329 from jiangxb1987/dryrun.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-30 14:57:51 -07:00
Xingbo Jiang fd6cfb1be3 [SPARK-29646][BUILD] Allow pyspark version name format ${versionNumber}-preview in release script
### What changes were proposed in this pull request?

Update `release-build.sh`, to allow pyspark version name format `${versionNumber}-preview`, otherwise the release script won't generate pyspark release tarballs.

### How was this patch tested?

Tested locally.

Closes #26306 from jiangxb1987/buildPython.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-30 14:51:50 -07:00
Dongjoon Hyun ba9d1610b6 [SPARK-29617][BUILD] Upgrade to ORC 1.5.7
### What changes were proposed in this pull request?

This PR aims to upgrade to Apache ORC 1.5.7.

### Why are the changes needed?

This will bring the latest bug fixes. The following is the full release note.
- https://issues.apache.org/jira/projects/ORC/versions/12345702

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #26276 from dongjoon-hyun/SPARK-29617.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-27 21:11:17 -07:00
Dongjoon Hyun 2baf7a1d8f [SPARK-29608][BUILD] Add hadoop-3.2 profile to release build
### What changes were proposed in this pull request?

This PR aims to add `hadoop-3.2` profile to pre-built binary package releases.

### Why are the changes needed?

Since Apache Spark 3.0.0, we provides Hadoop 3.2 pre-built binary.

### Does this PR introduce any user-facing change?

No. (Although the artifacts are available, this change is for release managers).

### How was this patch tested?

Manual. Please note that `DRY_RUN=0` disables these combination.
```
$ dev/create-release/release-build.sh package
...
Packages to build: without-hadoop hadoop3.2 hadoop2.7
make_binary_release without-hadoop -Pscala-2.12 -Phadoop-provided  2.12
make_binary_release hadoop3.2 -Pscala-2.12 -Phadoop-3.2 -Phive -Phive-thriftserver  2.12
make_binary_release hadoop2.7 -Pscala-2.12 -Phadoop-2.7 -Phive -Phive-thriftserver withpip,withr 2.12
```

Closes #26260 from dongjoon-hyun/SPARK-29608.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-25 13:57:26 -07:00
Luca Canali 5867707835 [SPARK-29557][BUILD] Update dropwizard/codahale metrics library to 3.2.6
### What changes were proposed in this pull request?
This proposes to update the dropwizard/codahale metrics library version used by Spark to `3.2.6` which is the last version supporting Ganglia.

### Why are the changes needed?
Spark is currently using Dropwizard metrics version 3.1.5, a version that is no more actively developed nor maintained, according to the project's Github repo README.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests + manual tests on a YARN cluster.

Closes #26212 from LucaCanali/updateDropwizardVersion.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-23 10:45:11 -07:00
Xianyang Liu 0a7095156b [SPARK-29499][CORE][PYSPARK] Add mapPartitionsWithIndex for RDDBarrier
### What changes were proposed in this pull request?

Add mapPartitionsWithIndex for RDDBarrier.

### Why are the changes needed?

There is only one method in `RDDBarrier`. We often use the partition index as a label for the current partition. We need to get the index from `TaskContext`  index in the method of `mapPartitions` which is not convenient.
### Does this PR introduce any user-facing change?

No

### How was this patch tested?

New UT.

Closes #26148 from ConeyLiu/barrier-index.

Authored-by: Xianyang Liu <xianyang.liu@intel.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2019-10-23 13:46:09 +02:00
igor.calabria 78bdcfade1 [SPARK-27812][K8S] Bump K8S client version to 4.6.1
### What changes were proposed in this pull request?

Updated kubernetes client.

### Why are the changes needed?

https://issues.apache.org/jira/browse/SPARK-27812
https://issues.apache.org/jira/browse/SPARK-27927

We need this fix https://github.com/fabric8io/kubernetes-client/pull/1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in https://github.com/apache/spark/pull/25785

### Does this PR introduce any user-facing change?

Nope, it should be transparent to users

### How was this patch tested?

This patch was tested manually using a simple pyspark job

```python
from pyspark.sql import SparkSession

if __name__ == '__main__':
    spark = SparkSession.builder.getOrCreate()
```

The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running

```
"OkHttp WebSocket https://10.96.0.1/..." #121 prio=5 os_prio=0 tid=0x00007fb27c005800 nid=0x24b waiting on condition [0x00007fb300847000]
"OkHttp WebSocket https://10.96.0.1/..." #117 prio=5 os_prio=0 tid=0x00007fb28c004000 nid=0x247 waiting on condition [0x00007fb300e4b000]
```
This is caused by a bug on `kubernetes-client` library, which is fixed on the version that we are upgrading to.

When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.3 is restored and both processes terminate successfully

Closes #26093 from igorcalabria/k8s-client-update.

Authored-by: igor.calabria <igor.calabria@ubee.in>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-17 12:23:24 -07:00
Fokko Driesprong 8eb8f7478c [SPARK-29483][BUILD] Bump Jackson to 2.10.0
### What changes were proposed in this pull request?

Release blog: https://medium.com/cowtowncoder/jackson-2-10-features-cd880674d8a2

Fixes the following CVE's:
https://www.cvedetails.com/cve/CVE-2019-16942/
https://www.cvedetails.com/cve/CVE-2019-16943/

Looking back, there were 3 major goals for this minor release:

- Resolve the growing problem of “endless CVE patches”, a stream of fixes for reported CVEs related to “Polymorphic Deserialization” problem (described in “On Jackson CVEs… ”) that resulted in security tools forcing Jackson upgrades. 2.10 now includes “Safe Default Typing” that is hoped to resolve this problem.
- Evolve 2.x API towards 3.0, based on changes that were done in master, within limits of 2.x API backwards-compatibility requirements.
- Add JDK support for versions beyond Java 8: specifically add“module-info.class” for JDK9+, defining proper module definitions for Jackson components

Full changelog: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10

Improved Scala 2.13 support: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10#scala

### Why are the changes needed?

Patches CVE's reported by the vulnerability scanner.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Ran `mvn clean install -DskipTests` locally.

Closes #26131 from Fokko/SPARK-29483.

Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-16 15:38:54 -07:00
Jeff Evans 95de93b24e [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read
Updating univocity-parsers version to 2.8.3, which adds support for multiple character delimiters

Moving univocity-parsers version to spark-parent pom dependencyManagement section

Adding new utility method to build multi-char delimiter string, which delegates to existing one

Adding tests for multiple character delimited CSV

### What changes were proposed in this pull request?

Adds support for parsing CSV data using multiple-character delimiters.  Existing logic for converting the input delimiter string to characters was kept and invoked in a loop.  Project dependencies were updated to remove redundant declaration of `univocity-parsers` version, and also to change that version to the latest.

### Why are the changes needed?

It is quite common for people to have delimited data, where the delimiter is not a single character, but rather a sequence of characters.  Currently, it is difficult to handle such data in Spark (typically needs pre-processing).

### Does this PR introduce any user-facing change?

Yes. Specifying the "delimiter" option for the DataFrame read, and providing more than one character, will no longer result in an exception.  Instead, it will be converted as before and passed to the underlying library (Univocity), which has accepted multiple character delimiters since 2.8.0.

### How was this patch tested?

The `CSVSuite` tests were confirmed passing (including new methods), and `sbt` tests for `sql` were executed.

Closes #26027 from jeff303/SPARK-24540.

Authored-by: Jeff Evans <jeffrey.wayne.evans@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-15 15:44:51 -05:00
Dongjoon Hyun 39d53d3e74 [SPARK-29470][BUILD] Update plugins to latest versions
### What changes were proposed in this pull request?

This PR updates plugins to latest versions.

### Why are the changes needed?

This brings bug fixes like the following.
- https://issues.apache.org/jira/projects/MCOMPILER/versions/12343484 (maven-compiler-plugin)
- https://issues.apache.org/jira/projects/MJAVADOC/versions/12345060 (maven-javadoc-plugin)
- https://issues.apache.org/jira/projects/MCHECKSTYLE/versions/12342397 (maven-checkstyle-plugin)
- https://checkstyle.sourceforge.io/releasenotes.html#Release_8.25 (checkstyle)
- https://checkstyle.sourceforge.io/releasenotes.html#Release_8.24 (checkstyle)

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins building and testing with the existing code.

Closes #26117 from dongjoon-hyun/SPARK-29470.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-15 11:55:52 -07:00
angerszhu ef81525a1a [SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2
### What changes were proposed in this pull request?

Current dev/deps/spark-deps-hadoop-3.2 have some wrong deps,   it's caused by `dev/test-dependencies.sh ` when build assembly dependencies.
add maven compile parameter `-am` to make it build with all deps, and get right result.

And update NOTICE-binary & NOTICE-binary for updated result.

### Why are the changes needed?
Update dev/deps/spark-hadoop-3.2

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #25984 from AngersZhuuuu/SPARK=29308.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-13 12:53:12 -05:00
Fokko Driesprong b5b1b69f79 [SPARK-29445][CORE] Bump netty-all from 4.1.39.Final to 4.1.42.Final
### What changes were proposed in this pull request?

Minor version bump of Netty to patch reported CVE.

Patches: https://www.cvedetails.com/cve/CVE-2019-16869/

### Why are the changes needed?

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Compiled locally using `mvn clean install -DskipTests`

Closes #26099 from Fokko/SPARK-29445.

Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-12 09:43:16 -05:00