Commit graph

1008 commits

Author SHA1 Message Date
Liang-Chi Hsieh e39948fada [SPARK-36670][SQL][TEST] Add FileSourceCodecSuite
### What changes were proposed in this pull request?

This patch mainly proposes to add some e2e test cases in Spark for codec used by main datasources.

### Why are the changes needed?

We found there is no e2e test cases available for main datasources like Parquet, Orc. It makes developers harder to identify possible bugs early. We should add such tests in Spark.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added tests.

Closes #33912 from viirya/SPARK-36670.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 5a0ae694d0)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
2021-09-07 16:53:25 -07:00
Dongjoon Hyun cdf21f729a [SPARK-36629][BUILD] Upgrade aircompressor to 1.21
### What changes were proposed in this pull request?

This PR aims to upgrade `aircompressor` dependency from 1.19 to 1.21.

### Why are the changes needed?

This will bring the latest bug fix which exists in `aircompressor` 1.17 ~ 1.20.
- 1e364f7133

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33883 from dongjoon-hyun/SPARK-36629.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ff8cc4b800)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-08-31 22:35:53 -07:00
Gengliang Wang 1bad04d028 Preparing development version 3.2.1-SNAPSHOT 2021-08-31 17:04:14 +00:00
Gengliang Wang 03f5d23e96 Preparing Spark release v3.2.0-rc2 2021-08-31 17:04:08 +00:00
Gengliang Wang 69be513c5e Preparing development version 3.2.1-SNAPSHOT 2021-08-20 12:40:47 +00:00
Gengliang Wang 6bb3523d8e Preparing Spark release v3.2.0-rc1 2021-08-20 12:40:40 +00:00
Gengliang Wang fafdc1482b Revert "Preparing Spark release v3.2.0-rc1"
This reverts commit 8e58fafb05.
2021-08-20 20:07:02 +08:00
Gengliang Wang c829ed53ff Revert "Preparing development version 3.2.1-SNAPSHOT"
This reverts commit 4f1d21571d.
2021-08-20 20:07:01 +08:00
Gengliang Wang 6357b22ba8 [SPARK-36547][BUILD] Downgrade scala-maven-plugin to 4.3.0
### What changes were proposed in this pull request?

When preparing Spark 3.2.0 RC1, I hit the same issue of https://github.com/apache/spark/pull/31031.
```
[INFO] Compiling 21 Scala sources and 3 Java sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes ...
[ERROR] ## Exception when compiling 24 sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s signer information does not match signer information of other classes in the same package
java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
```
This PR is to apply the same fix again by downgrading scala-maven-plugin to 4.3.0

### Why are the changes needed?

To unblock the release process.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Build test

Closes #33791 from gengliangwang/downgrade.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit f0775d215e)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
2021-08-20 10:45:35 +08:00
Gengliang Wang 4f1d21571d Preparing development version 3.2.1-SNAPSHOT 2021-08-19 14:08:32 +00:00
Gengliang Wang 8e58fafb05 Preparing Spark release v3.2.0-rc1 2021-08-19 14:08:26 +00:00
Sean Owen b8c1014e23 Update Spark key negotiation protocol 2021-08-14 09:08:29 -05:00
William Hyun 1a371fbfa1 [SPARK-36482][BUILD] Bump orc to 1.6.10
### What changes were proposed in this pull request?
This PR aims to bump ORC to 1.6.10

### Why are the changes needed?
This will bring the latest bug fixes.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the CIs.

Closes #33712 from williamhyun/orc.

Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit aff1b5594a)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-08-11 11:32:18 -07:00
Sajith Ariyarathna f0844f70b1 [SPARK-36432][BUILD] Upgrade Jetty version to 9.4.43
### What changes were proposed in this pull request?
This PR upgrades Jetty version to `9.4.43.v20210629`.

### Why are the changes needed?
To address vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429 which affects Jetty `9.4.42.v20210604`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI

Closes #33656 from this/upgrade-jetty-9.4.43.

Lead-authored-by: Sajith Ariyarathna <sajith.janaprasad@gmail.com>
Co-authored-by: Sajith Ariyarathna <this@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 5a22f9ceaf)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-08-09 10:14:16 +09:00
Liang-Chi Hsieh 712c311736 [SPARK-36393][BUILD] Try to raise memory for GHA
### What changes were proposed in this pull request?

According to the feedback from GitHub, the change causing memory issue has been rolled back. We can try to raise memory again for GA.

### Why are the changes needed?

Trying higher memory settings for GA. It could speed up the testing time.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

GA

Closes #33623 from viirya/increasing-mem-ga.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 7d13ac177b)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
2021-08-05 01:31:45 -07:00
Sean Owen c7d246ba4e [SPARK-35310][MLLIB] Update to breeze 1.2
Update to the latest breeze 1.2

Minor bug fixes

No.

Existing tests

Closes #33449 from srowen/SPARK-35310.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-07-24 08:20:25 -05:00
Liang-Chi Hsieh a6418a3463 [SPARK-36270][BUILD] Change memory settings for enabling GA
### What changes were proposed in this pull request?

Trying to adjust build memory settings and serial execution to re-enable GA.

### Why are the changes needed?

GA tests are failed recently due to return code 137. We need to adjust build settings to make GA work.

### Does this PR introduce _any_ user-facing change?

No, dev only.

### How was this patch tested?

GA

Closes #33447 from viirya/test-ga.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit fd36ed4550)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-07-23 19:11:09 +09:00
Dongjoon Hyun 60566f9d8e [SPARK-36262][BUILD] Upgrade ZSTD-JNI to 1.5.0-4
### What changes were proposed in this pull request?

This PR aims to upgrade ZSTD-JNI to 1.5.0-4.

### Why are the changes needed?

ZSTD-JNI 1.5.0-3 has a packaging issue. 1.5.0-4 is recommended to be used instead.
- https://github.com/luben/zstd-jni/issues/181#issuecomment-885138495

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33483 from dongjoon-hyun/SPARK-36262.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a1a197403b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-07-22 14:04:14 -07:00
Kousuke Saruta fef7bf9fcc [SPARK-36244][BUILD] Upgrade zstd-jni to 1.5.0-3 to avoid a bug about buffer size calculation
### What changes were proposed in this pull request?

This PR upgrades `zstd-jni` from `1.5.0-2` to `1.5.0-3`.
`1.5.0-3` was released few days ago.
This release resolves an issue about buffer size calculation, which can affect usage in Spark.
https://github.com/luben/zstd-jni/releases/tag/v1.5.0-3

### Why are the changes needed?

It might be a corner case that skipping length is greater than `2^31 - 1` but it's possible to affect Spark.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #33464 from sarutak/upgrade-zstd-jni-1.5.0-3.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit dcb7db5370)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-07-21 19:37:18 -07:00
William Hyun b93fa15ce2 [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2
### What changes were proposed in this pull request?
This PR aims to upgrade scalatest-maven-plugin to version 2.0.2.

### Why are the changes needed?
2.0.2 supports build on JDK 11 officially.
- f45ce192f3

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the CIs.

Closes #33408 from williamhyun/SMP.

Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit df8bae0689)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-07-18 22:14:43 -07:00
Dongjoon Hyun 8059a7e5e6 [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g
### What changes were proposed in this pull request?

This PR aims to set `MaxMetaspaceSize` to `2g` because it's increasing the native memory consumption unlimitedly by default. The unlimited increasing memory causes GitHub Action flakiness. The value I observed during `hive` module test was over 1.8G and growing.

- https://docs.oracle.com/javase/10/gctuning/other-considerations.htm#JSGCT-GUID-BFB89453-60C0-42AC-81CA-87D59B0ACE2E
> Starting with JDK 8, the permanent generation was removed and the class metadata is allocated in native memory. The amount of native memory that can be used for class metadata is by default unlimited. Use the option -XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used for class metadata.

In addition, I increased the following memory limit to 4g consistently from two places.
```xml
- <jvmArg>-Xms2048m</jvmArg>
- <jvmArg>-Xmx2048m</jvmArg>
+ <jvmArg>-Xms4g</jvmArg>
+ <jvmArg>-Xmx4g</jvmArg>
```

```scala
- javaOptions += "-Xmx3g",
+ javaOptions ++= "-Xmx4g -XX:MaxMetaspaceSize=2g".split(" ").toSeq,
```

### Why are the changes needed?

This will reduce the flakiness in CI environment by limiting the memory usage explicitly.

When we limit it with `1g`, Hive module fails with `OOM` like the following.
```
java.lang.OutOfMemoryError: Metaspace
Error: Exception in thread "dispatcher-event-loop-110" java.lang.OutOfMemoryError: Metaspace
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33405 from dongjoon-hyun/SPARK-36195.

Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Kyle Bendickson <kbendickson@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d7df7a805f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-07-18 10:15:26 -07:00
Kousuke Saruta ca8d2670b7 [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
### What changes were proposed in this pull request?

This PR upgrades `commons-compress` from `1.20` to `1.21` to deal with CVEs.

### Why are the changes needed?

Some CVEs which affect `commons-compress 1.20` are reported and fixed in `1.21`.
https://commons.apache.org/proper/commons-compress/security-reports.html

* CVE-2021-35515
* CVE-2021-35516
* CVE-2021-35517
* CVE-2021-36090

The severities are reported as low for all the CVEs but it would be better to deal with them just in case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #33333 from sarutak/upgrade-commons-compress-1.21.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit fd06cc211d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-07-13 22:53:22 -07:00
Wenchen Fan c1d8ccfb64 Revert "[SPARK-35253][SPARK-35398][SQL][BUILD] Bump up the janino version to v3.1.4"
### What changes were proposed in this pull request?

This PR reverts https://github.com/apache/spark/pull/32455 and its followup https://github.com/apache/spark/pull/32536 , because the new janino version has a bug that is not fixed yet: https://github.com/janino-compiler/janino/pull/148

### Why are the changes needed?

avoid regressions

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #33302 from cloud-fan/revert.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit ae6199af44)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-07-13 12:14:21 +09:00
Dongjoon Hyun d7990943c3 [SPARK-35992][BUILD] Upgrade ORC to 1.6.9
### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.6.9.

### Why are the changes needed?

This is required to bring ORC-804 in order to fix ORC encryption masking bug.

### Does this PR introduce _any_ user-facing change?

No. This is not released yet.

### How was this patch tested?

Pass the newly added test case.

Closes #33189 from dongjoon-hyun/SPARK-35992.

Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c55b9fd1e0)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-07-02 09:50:00 -07:00
Holden Karau 34286ae5bf [SPARK-35960][BUILD][TEST] Bump the scalatest version to 3.2.9
### What changes were proposed in this pull request?

Bump the scalatest version to 3.2.9

### Why are the changes needed?

With the scalatestplus change to 3.2.9.0, recent sbt fails to handle the mismatch between scalatest and scalatestplus and resolve resulting in test:compile errors of not being able to find the org.scalatest package.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

sbt tags/test:compile failed before and passes with this change.

Closes #33163 from holdenk/SPARK-35960-test-compile-sbt-issue.

Authored-by: Holden Karau <hkarau@netflix.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-30 21:39:12 -07:00
Kousuke Saruta 7ad682aaa1 Revert "[SPARK-34549][BUILD] Upgrade aws kinesis to 1.14.0 and java sdk 1.11.844"
### What changes were proposed in this pull request?

This PR reverts the change of SPARK-34549 ( #31658).

### Why are the changes needed?

See #33133.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Closes #33145 from sarutak/revert-SPARK-34549.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-06-30 10:45:41 +09:00
Dongjoon Hyun 7e7028282c [SPARK-35928][BUILD] Upgrade ASM to 9.1
### What changes were proposed in this pull request?

This PR aims to upgrade ASM to 9.1

### Why are the changes needed?

The latest `xbean-asm9-shaded` is built with ASM 9.1.

- https://mvnrepository.com/artifact/org.apache.xbean/xbean-asm9-shaded/4.20
- 5e0e3c0c64/pom.xml (L67)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33130 from dongjoon-hyun/SPARK-35928.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-29 10:27:51 -07:00
Dongjoon Hyun c45a6f5d09 [SPARK-35922][BUILD] Upgrade maven-shade-plugin to 3.2.4
### What changes were proposed in this pull request?

This PR aims to upgrade `maven-shade-plugin` to 3.2.4.

### Why are the changes needed?

This is required to build with Java 17-ea.

Since `maven-shade-plugin` 3.2.3, `asm` 8.0 is used now. We should remove our custom dependency of `7.3.1`.
- https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-shade-plugin/3.2.4
- https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-shade-plugin/3.2.3

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33122 from dongjoon-hyun/SPARK-35922.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-28 22:08:44 -07:00
Dongjoon Hyun b999e6bd90 [SPARK-35920][BUILD] Upgrade to Chill 0.10.0
### What changes were proposed in this pull request?

This PR aims to upgrade Chill to 0.10.0.

### Why are the changes needed?

This is a maintenance release having cross-compilation to 2.12.14 and 2.13.6 .
- https://github.com/twitter/chill/releases/tag/v0.10.0

### Does this PR introduce _any_ user-facing change?

No, this is a dependency change.

### How was this patch tested?

Pass the CIs.

Closes #33119 from dongjoon-hyun/SPARK-35920.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-28 22:06:41 -07:00
Adam Binford 939ea3d5da [SPARK-35863][BUILD] Update Ivy to 2.5.0
### What changes were proposed in this pull request?

Update Ivy from 2.4.0 to 2.5.0.

- https://ant.apache.org/ivy/history/2.5.0/release-notes.html

### Why are the changes needed?

This brings various improvements and bug fixes. Most notably, the adding of `ivy.maven.lookup.sources` and `ivy.maven.lookup.javadoc` configs can significantly speed up module resolution time if these are turned off, especially behind a proxy. These could arguably be turned off by default, because when submitting jobs you probably don't care about the sources or javadoc jars. I didn't include that here but happy to look into if it's desired.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT and build passes

Closes #33088 from Kimahriman/feature/ivy-update.

Authored-by: Adam Binford <adamq43@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-25 07:37:36 -07:00
Kousuke Saruta 7b78d56f34 [SPARK-35870][BUILD] Upgrade Jetty to 9.4.42
### What changes were proposed in this pull request?

This PR upgrades Jetty to `9.4.42`.
In the current master, `9.4.40` is used.
`9.4.41` and `9.4.42` include the following updates.
https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.41.v20210516
https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.42.v20210604

### Why are the changes needed?

Mainly for CVE-2021-28169.
https://nvd.nist.gov/vuln/detail/CVE-2021-28169
This CVE might little affect Spark, but just in case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #33053 from sarutak/upgrade-jetty-9.4.42.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
2021-06-25 03:32:32 +09:00
William Hyun 89dbf514f5 [SPARK-35850][BUILD] Upgrade scala-maven-plugin to 4.5.3
### What changes were proposed in this pull request?
This PR aims to upgrade the scala-maven-plugin version to 4.5.3.

### Why are the changes needed?
This will upgrade `sbt-compiler-bridge` from 1.3.1 to 1.5.5 in order to bring the latest bug fixes.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the CIs.

Closes #33007 from williamhyun/scalamvnplugin.

Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-21 21:35:42 -07:00
Gengliang Wang 74d647d2ca [SPARK-35825][INFRA] Increase the heap and stack size for Maven build
### What changes were proposed in this pull request?

Increase memory configuration for Maven build.
Stack size: 64MB => 128MB
Initial heap size: 1024MB => 2048MB
Maximum heap size: 1024MB => 2048MB

The SBT builds are ok so let's keep the current configuration.

### Why are the changes needed?

The jenkins jobs are unstable due to the stackoverflow errors:
 https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2-jdk-11/
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/2274/

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Jenkins test

Closes #32961 from gengliangwang/increaseXss.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-19 10:44:46 -07:00
David Christle 7fcb127674 [SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2
### What changes were proposed in this pull request?
This PR aims to upgrade `zstd-jni` to 1.5.0-2, which uses `zstd` version 1.5.0.

### Why are the changes needed?
Major improvements to Zstd support are targeted for the upcoming 3.2.0 release of Spark. Zstd 1.5.0 introduces significant compression (+25% to 140%) and decompression (~15%) speed improvements in benchmarks described in more detail on the releases page:

- https://github.com/facebook/zstd/releases/tag/v1.5.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Build passes build tests, but the benchmark tests seem flaky. I am unsure if this change is responsible. The error is:
```
Running org.apache.spark.rdd.CoalescedRDDBenchmark:
21/06/08 18:53:10 ERROR SparkContext: Failed to add file:/home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar to Spark environment
java.lang.IllegalArgumentException: requirement failed: File spark-core_2.12-3.2.0-SNAPSHOT-tests.jar was already registered with a different path (old path = /home/runner/work/spark/spark/core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar, new path = /home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar
```

https://github.com/dchristle/spark/runs/2776123749?check_suite_focus=true

cc: dongjoon-hyun

Closes #32826 from dchristle/ZSTD150.

Lead-authored-by: David Christle <dchristle@squareup.com>
Co-authored-by: David Christle <dchristle@users.noreply.github.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-17 11:06:50 -07:00
Chao Sun 506ef9aad7 [SPARK-29250][BUILD] Upgrade to Hadoop 3.3.1
### What changes were proposed in this pull request?

This upgrade default Hadoop version from 3.2.1 to 3.3.1. The changes here are simply update the version number and dependency file.

### Why are the changes needed?

Hadoop 3.3.1 just came out, which comes with many client-side improvements such as for S3A/ABFS (20% faster when accessing S3). These are important for users who want to use Spark in a cloud environment.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing unit tests in Spark
- Manually tested using my S3 bucket for event log dir:
```
bin/spark-shell \
  -c spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID \
  -c spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY \
  -c spark.eventLog.enabled=true
  -c spark.eventLog.dir=s3a://<my-bucket>
```
- Manually tested against docker-based YARN dev cluster, by running `SparkPi`.

Closes #30135 from sunchao/SPARK-29250.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-16 13:28:07 -07:00
Sumeet Gajjar 864ff67746 [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs
### What changes were proposed in this pull request?

Remove commons-httpclient as a direct dependency for Hadoop-3.2 profile.
Hadoop-2.7 profile distribution still has it, hadoop-client has a compile dependency on commons-httpclient, thus we cannot remove it for Hadoop-2.7 profile.
```
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.7.4:compile
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.7.4:compile
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
```

### Why are the changes needed?

Spark is pulling in commons-httpclient as a dependency directly. commons-httpclient went EOL years ago and there are most likely CVEs not being reported against it, thus we should remove it.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing unittests
- Checked the dependency tree before and after introducing the changes

Before:
```
./build/mvn dependency:tree -Phadoop-3.2 | grep -i "commons-httpclient"
Using `mvn` from path: /usr/bin/mvn
[INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:provided
```

After
```
./build/mvn dependency:tree | grep -i "commons-httpclient"
Using `mvn` from path: /Users/sumeet.gajjar/cloudera/upstream-spark/build/apache-maven-3.6.3/bin/mvn
```

P.S. Reopening this since [spark upgraded](463daabd5a) its `hive.version` to `2.3.9` which does not have a dependency on `commons-httpclient`.

Closes #32912 from sumeetgajjar/SPARK-35429.

Authored-by: Sumeet Gajjar <sumeetgajjar93@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-15 14:43:30 -07:00
Gengliang Wang 62be22e929 [SPARK-35694][INFRA][FOLLOWUP] Increase the default JVM stack size of SBT/Maven
### What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/32838, we set the default JVM stack size to 16M from 4M.
However, there are still stackoverflow error in builds:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139672/console

Let's update the value to 64M

### Why are the changes needed?

Make test build stable.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manual trigger test builds.

Closes #32879 from gengliangwang/increaseStackAgain.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-06-11 18:51:07 +09:00
Yuming Wang 463daabd5a [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9
### What changes were proposed in this pull request?

This pr upgrades built-in Hive to 2.3.9. Hive 2.3.9 changes:
- [HIVE-17155] - findConfFile() in HiveConf.java has some issues with the conf path
- [HIVE-24797] - Disable validate default values when parsing Avro schemas
- [HIVE-24608] - Switch back to get_table in HMS client for Hive 2.3.x
- [HIVE-21200] - Vectorization: date column throwing java.lang.UnsupportedOperationException for parquet
- [HIVE-21563] - Improve Table#getEmptyTable performance by disabling registerAllFunctionsOnce
- [HIVE-19228] - Remove commons-httpclient 3.x usage

### Why are the changes needed?

Fix regression caused by AVRO-2035.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #32750 from wangyum/SPARK-34512.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-06-10 20:44:35 -07:00
Gengliang Wang 0b5683a4d5 [SPARK-35694][INFRA] Increase the default JVM stack size of SBT/Maven
### What changes were proposed in this pull request?

The jenkins SBT/Maven build keep failing with stack overflow error:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139542

We should increase the JVM stack size to 16MB.
Also, https://github.com/apache/spark/pull/32521 set the stack size to 256MB for Java 11 build, which might be too big since every thread will allocate this memory for the stack. This PR also set it as 16MB to make the config consistent.

### Why are the changes needed?

Fix SBT/Maven build.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Jenkins and GA tests.

Closes #32838 from gengliangwang/increaseSBTStackSize.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
2021-06-09 19:36:29 +08:00
Dongjoon Hyun 6f2ffccb5e [SPARK-35660][BUILD][K8S] Upgrade kubernetes-client to 5.4.1
### What changes were proposed in this pull request?

This PR aims to upgrade kubernetes-client to 5.4.1.

### Why are the changes needed?

This will bring a few bug fixes.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v5.4.1

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #32798 from dongjoon-hyun/SPARK-35660.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-06-06 22:27:08 -07:00
Kousuke Saruta 510bde460a [SPARK-35655][BUILD] Upgrade HtmlUnit and its related artifacts to 2.50
### What changes were proposed in this pull request?

This PR upgrades HtmlUnit and its related artifacts to `2.50`.

### Why are the changes needed?

We currently uses 2.40 but 2.41+ seem to include bunch of bug fixes and improvements especially for JavaScript and CSS.
https://htmlunit.sourceforge.io/changes-report.html#a2.50.0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

All the `UISeleniumSuite` which use HtmlUnit pass on my laptop.

Closes #32789 from sarutak/upgrade-htmlunit250.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
2021-06-05 17:45:44 +08:00
yangjie01 4a549f2de2 [SPARK-35574][BUILD] Add a compile arg to turn compilation warnings related to procedure syntax to compilation errors in Scala 2.13
### What changes were proposed in this pull request?
There are several pr to fix compilation warnings  related to `procedure syntax` like SPARK-29291, SPARK-33352 and SPARK-35526, in order to prevent the recurrence of similar problems, this pr add a compile arg to convert `procedure syntax` related compilation warnings to compilation errors in Scala 2.13.

### Why are the changes needed?
Prevent the recurrence of compilation warnings related to `procedure syntax is deprecated`

### Does this PR introduce _any_ user-facing change?
`procedure syntax` is no longer allowed in Spark code with Scala 2.13, for constructors methods definition should be `this(...) = { }` not `this(...) { }`, for without  `return type` methods definition should be `def methodName(...): Unit = {}` not `def methodName(...) {}`.

### How was this patch tested?

- Pass the GitHub Action Scala 2.13 job

- Manual test:

Do some code change like:

```
Index: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
===================================================================
 -67,7 +67,7
 private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
   extends SparkListener with ThreadSafeRpcEndpoint with Logging {

-  def this(sc: SparkContext) = {
+  def this(sc: SparkContext) {
     this(sc, new SystemClock)
   }

Index: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
===================================================================
 -720,7 +720,7
     }
   }

-  def registerMergeResult(shuffleId: Int, reduceId: Int, status: MergeStatus): Unit = {
+  def registerMergeResult(shuffleId: Int, reduceId: Int, status: MergeStatus) {
     shuffleStatuses(shuffleId).addMergeResult(reduceId, status)
   }
```

**sbt  with Scala 2.13 profile compile  failed as follows:***

```
[error] /home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70:29: procedure syntax is deprecated for constructors: add `=`, as in method definition
[error]   def this(sc: SparkContext) {
[error]                             ^
[error] /home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:723:79: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `registerMergeResult`'s return type
[error]   def registerMergeResult(shuffleId: Int, reduceId: Int, status: MergeStatus) {
[error]                                                                               ^
[error] two errors found
[error] (core / Compile / compileIncremental) Compilation failed
[error] Total time: 136 s (02:16), completed May 31, 2021 10:06:50 AM

Error: Process completed with exit code 1.
```

**maven  with Scala 2.13 profile compile  failed as follows:**

```
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: procedure syntax is deprecated for constructors: add `=`, as in method definition
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:723: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `registerMergeResult`'s return type
[ERROR] two errors found
```

Closes #32710 from LuciferYang/SPARK-35574.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-06-03 13:52:04 +09:00
Dongjoon Hyun 1a55019b1f [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile
### What changes were proposed in this pull request?

This PR is a follow-up of https://github.com/apache/spark/pull/32697 to update the missed part.
After SPARK-34774, we have Scala 2.12 version in `scala-2.12` profile.

### Why are the changes needed?

To be consistent.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual.

**BEFORE**
```
$ build/mvn help:evaluate -Pscala-2.12 -Dexpression=scala.version | grep "^2.12"
Using `mvn` from path: /usr/local/bin/mvn
2.12.10
```

**AFTER**
```
$ build/mvn help:evaluate -Pscala-2.12 -Dexpression=scala.version | grep "^2.12"
Using `mvn` from path: /usr/local/bin/mvn
2.12.14
```

Closes #32707 from dongjoon-hyun/SPARK-31168-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-05-30 21:27:24 -07:00
yangjie01 ff27264ae5 [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3
### What changes were proposed in this pull request?
This pr upgrade Jackson version to 2.12.3.
Jackson Release 2.12.3: [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12.3](https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12.3)

### Why are the changes needed?
Upgrade to a new version to bring potential bug fixes like [https://github.com/FasterXML/jackson-modules-java8/issues/207](https://github.com/FasterXML/jackson-modules-java8/issues/207)  and avro's master has been upgraded to Jackson to 2.12.3

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #32688 from LuciferYang/SPARK-35550.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-05-31 10:28:43 +09:00
Dongjoon Hyun 6c4b60f3b3 [SPARK-31168][BUILD] Upgrade Scala to 2.12.14
### What changes were proposed in this pull request?

This PR is the 4th try to upgrade Scala 2.12.x in order to see the feasibility.
- https://github.com/apache/spark/pull/27929 (Upgrade Scala to 2.12.11, wangyum )
- https://github.com/apache/spark/pull/30940 (Upgrade Scala to 2.12.12, viirya )
- https://github.com/apache/spark/pull/31223 (Upgrade Scala to 2.12.13, dongjoon-hyun )

Note that Scala 2.12.14 has the following fix for Apache Spark community.
- Fix cyclic error in runtime reflection (protobuf), a regression that prevented Spark upgrading to 2.12.13

REQUIREMENTS:
- [x] `silencer` library is released via https://github.com/ghik/silencer/pull/66
- [x] `genjavadoc` library is released via https://github.com/lightbend/genjavadoc/issues/282

### Why are the changes needed?

Apache Spark was stuck to 2.12.10 due to the regression in Scala 2.12.11/2.12.12/2.12.13. This will bring all the bug fixes.
- https://github.com/scala/scala/releases/tag/v2.12.14
- https://github.com/scala/scala/releases/tag/v2.12.13
- https://github.com/scala/scala/releases/tag/v2.12.12
- https://github.com/scala/scala/releases/tag/v2.12.11

### Does this PR introduce _any_ user-facing change?

Yes, but this is a bug-fixed version.

### How was this patch tested?

Pass the CIs.

Closes #32697 from dongjoon-hyun/SPARK-31168.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-05-30 16:08:13 -07:00
Kousuke Saruta 116a97e153 [SPARK-35501][SQL][TESTS] Add a feature for removing pulled container image for docker integration tests
### What changes were proposed in this pull request?

This PR adds a feature for removing pulled container image after every docker integration test finish.
This feature is enabled by the new propoerty `spark.tes.docker.removePulledImage`.

### Why are the changes needed?

For idempotent.
I'm trying to add docker integration tests to GA in SPARK-35483 (#32631) but I noticed that `jdbc.OracleIntegrationSuite` consistently fails(https://github.com/sarutak/spark/runs/2646707235?check_suite_focus=true).
I investigated the reason and I found it's short of the storage capacity of the host on GA.
```
 ORACLE PASSWORD FOR SYS AND SYSTEM: oracle
The location '/opt/oracle' specified for database files has insufficient space.
Database creation needs at least '4.5GB' disk space.
Specify a different database file destination that has enough space in the configuration file '/etc/sysconfig/oracle-xe-18c.conf'.
mv: cannot stat '/opt/oracle/product/18c/dbhomeXE/dbs/spfileXE.ora': No such file or directory
mv: cannot stat '/opt/oracle/product/18c/dbhomeXE/dbs/orapwXE': No such file or directory
ORACLE_HOME = [/home/oracle] ? ORACLE_BASE environment variable is not being set since this
information is not available for the current user ID .
You can set ORACLE_BASE manually if it is required.
Resetting ORACLE_BASE to its previous value or ORACLE_HOME
The Oracle base remains unchanged with value /opt/oracle
#####################################
########### E R R O R ###############
DATABASE SETUP WAS NOT SUCCESSFUL!
Please check output for further info!
########### E R R O R ###############
#####################################
The following output is now a tail of the alert.log:
tail: cannot open '/opt/oracle/diag/rdbms/*/*/trace/alert*.log' for reading: No such file or directory
tail: no files remaining
```

With this feature, pulled container image is removed and keep the capacity for `jdbc.OracleIntegrationSuite` in GA.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed the following things.

* A container image which is absent in the local repository is removed after test finished if `spark.test.container.removePulledImage` is `true`.
* A container image which is present in the local repository is not removed after the finished even if `spark.test.container.removePulledImage` is `true`.
* A container image is not removed regardless of presence of the container image in the local repository even if `spark.test.container.removePulledImage` is `true`.

Closes #32652 from sarutak/docker-image-rm.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-05-26 17:24:29 +09:00
Vinod KC e3c6907c99 [SPARK-35490][BUILD] Update json4s to 3.7.0-M11
### What changes were proposed in this pull request?
This PR aims to upgrade json4s from   3.7.0-M5  to 3.7.0-M11

Note: json4s version greater than 3.7.0-M11 is not binary compatible with Spark third party jars

### Why are the changes needed?
Multiple defect fixes and improvements  like

https://github.com/json4s/json4s/issues/750
https://github.com/json4s/json4s/issues/554
https://github.com/json4s/json4s/issues/715

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ran with the existing UTs

Closes #32636 from vinodkc/br_build_upgrade_json4s.

Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-05-26 11:10:14 +03:00
Vinod KC 4ba1db91f0 [SPARK-35513][BUILD] Update joda-time to 2.10.10
### What changes were proposed in this pull request?
This PR aims to upgrade joda-time from 2.10.5 to 2.10.10

### Why are the changes needed?
Improvement and bug fixes in joda-time
https://www.joda.org/joda-time/changes-report.html#a2.10.10

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ran with the existing UTs

Closes #32661 from vinodkc/br_build_upgrade_joda_time.

Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-05-25 11:29:03 -07:00
Vinod KC d5868ebc39 [SPARK-35492][BUILD] Upgrade httpcore to 4.4.14
### What changes were proposed in this pull request?
This PR aims to upgrade Apache HttpCore from 4.4.12 to 4.4.14.

### Why are the changes needed?
Stability improvements in httpcore 4.4.14

- Bug fix: Non-blocking TLSv1.3 connections can end up in an infinite event spin when closed concurrently by the local and the remote endpoints.
- HTTPCORE-647: Non-blocking connection terminated due to 'java.io.IOException: Broken pipe' can enter an infinite loop flushing buffered output data.
- PR #201, HTTPCORE-634: Fix race condition in AbstractConnPool that can cause internal state
-   corruption
- HTTPCORE-612: DefaultConnectionReuseStrategy incorrectly used int to represent Content-Length value
-   instead of long

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
 With Jenkins Tests

Closes #32638 from vinodkc/br_build_upgrade_httpcore.

Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-05-23 08:16:50 -07:00
Dongjoon Hyun fa424ac2b8 [SPARK-35489][BUILD] Upgrade ORC to 1.6.8
### What changes were proposed in this pull request?

This PR aims to upgrade ORC to 1.6.8.

### Why are the changes needed?

This will bring the latest bug fixes.
- https://orc.apache.org/news/2021/05/21/ORC-1.6.8/

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the existing CIs.

Closes #32635 from dongjoon-hyun/SPARK-35489.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-05-22 10:35:40 -07:00