### What changes were proposed in this pull request?
Enable `createDependencyReducedPom` for Spark's Maven shaded plugin so that the effective pom won't contain those shaded artifacts such as `org.eclipse.jetty`
### Why are the changes needed?
At the moment, the effective pom leaks transitive dependencies to downstream apps for those shaded artifacts, which potentially will cause issues.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
I manually tested and the `core/dependency-reduced-pom.xml` no longer contains dependencies such as `jetty-XX`.
Closes#34085 from sunchao/SPARK-36835.
Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit ed88e610f0)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request?
Remove `com.github.rdblue:brotli-codec:0.1.1` dependency.
### Why are the changes needed?
As Stephen Coy pointed out in the dev list, we should not have `com.github.rdblue:brotli-codec:0.1.1` dependency which is not available on Maven Central. This is to avoid possible artifact changes on `Jitpack.io`.
Also, the dependency is for tests only. I suggest that we remove it now to unblock the 3.2.0 release ASAP.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
GA tests.
Closes#34059 from gengliangwang/removeDeps.
Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ba5708d944)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This is a follow-up to fix the leftover during switching the Scala version.
### Why are the changes needed?
This should be consistent.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This is not tested by UT. We need to check manually. There is no more `2.12.14`.
```
$ git grep 2.12.14
R/pkg/tests/fulltests/test_sparkSQL.R: c(as.Date("2012-12-14"), as.Date("2013-12-15"), as.Date("2014-12-16")))
data/mllib/ridge-data/lpsa.data:3.5307626,0.987291634724086 -0.36279314978779 -0.922212414640967 0.232904453212813 -0.522940888712441 1.79270085261407 0.342627053981254 1.26288870310799
sql/hive/src/test/resources/data/files/over10k:-3|454|65705|4294967468|62.12|14.32|true|mike white|2013-03-01 09:11:58.703087|40.18|joggying
```
Closes#34020 from dongjoon-hyun/SPARK-36759-2.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit adbea252db)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade Apache ORC to 1.6.11 to bring the latest bug fixes.
### Why are the changes needed?
Apache ORC 1.6.11 has the following fixes.
- https://issues.apache.org/jira/projects/ORC/versions/12350499
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33971 from dongjoon-hyun/SPARK-36732.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c217797297)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade Scala to 2.12.15 to support Java 17/18 better.
### Why are the changes needed?
Scala 2.12.15 improves compatibility with JDK 17 and 18:
https://github.com/scala/scala/releases/tag/v2.12.15
- Avoids IllegalArgumentException in JDK 17+ for lambda deserialization
- Upgrades to ASM 9.2, for JDK 18 support in optimizer
### Does this PR introduce _any_ user-facing change?
Yes, this is a Scala version change.
### How was this patch tested?
Pass the CIs
Closes#33999 from dongjoon-hyun/SPARK-36759.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 16f1f71ba5)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Upgrade Apache Parquet to 1.12.1
### Why are the changes needed?
Parquet 1.12.1 contains the following bug fixes:
- PARQUET-2064: Make Range public accessible in RowRanges
- PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream`
- PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding
- PARQUET-1633: Fix integer overflow
- PARQUET-2054: fix TCP leaking when calling ParquetFileWriter.appendFile
- PARQUET-2072: Do Not Determine Both Min/Max for Binary Stats
- PARQUET-2073: Fix estimate remaining row count in ColumnWriteStoreBase
- PARQUET-2078: Failed to read parquet file after writing with the same
In particular PARQUET-2078 is a blocker for the upcoming Apache Spark 3.2.0 release.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests + a new test for the issue in SPARK-36696
Closes#33969 from sunchao/upgrade-parquet-12.1.
Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
(cherry picked from commit a927b0836b)
Signed-off-by: DB Tsai <d_tsai@apple.com>
As [reported on `devspark.apache.org`](https://lists.apache.org/thread.html/r84cff66217de438f1389899e6d6891b573780159cd45463acf3657aa%40%3Cdev.spark.apache.org%3E), the published POMs when building with Scala 2.13 have the `scala-parallel-collections` dependency only in the `scala-2.13` profile of the pom.
### What changes were proposed in this pull request?
This PR suggests to work around this by un-commenting the `scala-parallel-collections` dependency when switching to 2.13 using the the `change-scala-version.sh` script.
I included an upgrade to scala-parallel-collections version 1.0.3, the changes compared to 0.2.0 are minor.
- removed OSGi metadata
- renamed some internal inner classes
- added `Automatic-Module-Name`
### Why are the changes needed?
According to the posts, this solves issues for developers that write unit tests for their applications.
Stephen Coy suggested to use the https://www.mojohaus.org/flatten-maven-plugin. While this sounds like a more principled solution, it is possibly too risky to do at this specific point in time?
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Locally
Closes#33948 from lrytz/parCollDep.
Authored-by: Lukas Rytz <lukas.rytz@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 1a62e6a2c1)
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This patch mainly proposes to add some e2e test cases in Spark for codec used by main datasources.
### Why are the changes needed?
We found there is no e2e test cases available for main datasources like Parquet, Orc. It makes developers harder to identify possible bugs early. We should add such tests in Spark.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added tests.
Closes#33912 from viirya/SPARK-36670.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 5a0ae694d0)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
### What changes were proposed in this pull request?
This PR aims to upgrade `aircompressor` dependency from 1.19 to 1.21.
### Why are the changes needed?
This will bring the latest bug fix which exists in `aircompressor` 1.17 ~ 1.20.
- 1e364f7133
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33883 from dongjoon-hyun/SPARK-36629.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ff8cc4b800)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
When preparing Spark 3.2.0 RC1, I hit the same issue of https://github.com/apache/spark/pull/31031.
```
[INFO] Compiling 21 Scala sources and 3 Java sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes ...
[ERROR] ## Exception when compiling 24 sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s signer information does not match signer information of other classes in the same package
java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
```
This PR is to apply the same fix again by downgrading scala-maven-plugin to 4.3.0
### Why are the changes needed?
To unblock the release process.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Build test
Closes#33791 from gengliangwang/downgrade.
Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit f0775d215e)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request?
This PR aims to bump ORC to 1.6.10
### Why are the changes needed?
This will bring the latest bug fixes.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33712 from williamhyun/orc.
Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit aff1b5594a)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR upgrades Jetty version to `9.4.43.v20210629`.
### Why are the changes needed?
To address vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429 which affects Jetty `9.4.42.v20210604`.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI
Closes#33656 from this/upgrade-jetty-9.4.43.
Lead-authored-by: Sajith Ariyarathna <sajith.janaprasad@gmail.com>
Co-authored-by: Sajith Ariyarathna <this@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 5a22f9ceaf)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
According to the feedback from GitHub, the change causing memory issue has been rolled back. We can try to raise memory again for GA.
### Why are the changes needed?
Trying higher memory settings for GA. It could speed up the testing time.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
GA
Closes#33623 from viirya/increasing-mem-ga.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 7d13ac177b)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
Update to the latest breeze 1.2
Minor bug fixes
No.
Existing tests
Closes#33449 from srowen/SPARK-35310.
Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
Trying to adjust build memory settings and serial execution to re-enable GA.
### Why are the changes needed?
GA tests are failed recently due to return code 137. We need to adjust build settings to make GA work.
### Does this PR introduce _any_ user-facing change?
No, dev only.
### How was this patch tested?
GA
Closes#33447 from viirya/test-ga.
Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit fd36ed4550)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade ZSTD-JNI to 1.5.0-4.
### Why are the changes needed?
ZSTD-JNI 1.5.0-3 has a packaging issue. 1.5.0-4 is recommended to be used instead.
- https://github.com/luben/zstd-jni/issues/181#issuecomment-885138495
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33483 from dongjoon-hyun/SPARK-36262.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a1a197403b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR upgrades `zstd-jni` from `1.5.0-2` to `1.5.0-3`.
`1.5.0-3` was released few days ago.
This release resolves an issue about buffer size calculation, which can affect usage in Spark.
https://github.com/luben/zstd-jni/releases/tag/v1.5.0-3
### Why are the changes needed?
It might be a corner case that skipping length is greater than `2^31 - 1` but it's possible to affect Spark.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#33464 from sarutak/upgrade-zstd-jni-1.5.0-3.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit dcb7db5370)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade scalatest-maven-plugin to version 2.0.2.
### Why are the changes needed?
2.0.2 supports build on JDK 11 officially.
- f45ce192f3
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33408 from williamhyun/SMP.
Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit df8bae0689)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to set `MaxMetaspaceSize` to `2g` because it's increasing the native memory consumption unlimitedly by default. The unlimited increasing memory causes GitHub Action flakiness. The value I observed during `hive` module test was over 1.8G and growing.
- https://docs.oracle.com/javase/10/gctuning/other-considerations.htm#JSGCT-GUID-BFB89453-60C0-42AC-81CA-87D59B0ACE2E
> Starting with JDK 8, the permanent generation was removed and the class metadata is allocated in native memory. The amount of native memory that can be used for class metadata is by default unlimited. Use the option -XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used for class metadata.
In addition, I increased the following memory limit to 4g consistently from two places.
```xml
- <jvmArg>-Xms2048m</jvmArg>
- <jvmArg>-Xmx2048m</jvmArg>
+ <jvmArg>-Xms4g</jvmArg>
+ <jvmArg>-Xmx4g</jvmArg>
```
```scala
- javaOptions += "-Xmx3g",
+ javaOptions ++= "-Xmx4g -XX:MaxMetaspaceSize=2g".split(" ").toSeq,
```
### Why are the changes needed?
This will reduce the flakiness in CI environment by limiting the memory usage explicitly.
When we limit it with `1g`, Hive module fails with `OOM` like the following.
```
java.lang.OutOfMemoryError: Metaspace
Error: Exception in thread "dispatcher-event-loop-110" java.lang.OutOfMemoryError: Metaspace
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33405 from dongjoon-hyun/SPARK-36195.
Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Kyle Bendickson <kbendickson@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d7df7a805f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR upgrades `commons-compress` from `1.20` to `1.21` to deal with CVEs.
### Why are the changes needed?
Some CVEs which affect `commons-compress 1.20` are reported and fixed in `1.21`.
https://commons.apache.org/proper/commons-compress/security-reports.html
* CVE-2021-35515
* CVE-2021-35516
* CVE-2021-35517
* CVE-2021-36090
The severities are reported as low for all the CVEs but it would be better to deal with them just in case.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#33333 from sarutak/upgrade-commons-compress-1.21.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit fd06cc211d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR reverts https://github.com/apache/spark/pull/32455 and its followup https://github.com/apache/spark/pull/32536 , because the new janino version has a bug that is not fixed yet: https://github.com/janino-compiler/janino/pull/148
### Why are the changes needed?
avoid regressions
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
existing tests
Closes#33302 from cloud-fan/revert.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit ae6199af44)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade Apache ORC to 1.6.9.
### Why are the changes needed?
This is required to bring ORC-804 in order to fix ORC encryption masking bug.
### Does this PR introduce _any_ user-facing change?
No. This is not released yet.
### How was this patch tested?
Pass the newly added test case.
Closes#33189 from dongjoon-hyun/SPARK-35992.
Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c55b9fd1e0)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Bump the scalatest version to 3.2.9
### Why are the changes needed?
With the scalatestplus change to 3.2.9.0, recent sbt fails to handle the mismatch between scalatest and scalatestplus and resolve resulting in test:compile errors of not being able to find the org.scalatest package.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
sbt tags/test:compile failed before and passes with this change.
Closes#33163 from holdenk/SPARK-35960-test-compile-sbt-issue.
Authored-by: Holden Karau <hkarau@netflix.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR reverts the change of SPARK-34549 ( #31658).
### Why are the changes needed?
See #33133.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Closes#33145 from sarutak/revert-SPARK-34549.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade ASM to 9.1
### Why are the changes needed?
The latest `xbean-asm9-shaded` is built with ASM 9.1.
- https://mvnrepository.com/artifact/org.apache.xbean/xbean-asm9-shaded/4.20
- 5e0e3c0c64/pom.xml (L67)
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33130 from dongjoon-hyun/SPARK-35928.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade `maven-shade-plugin` to 3.2.4.
### Why are the changes needed?
This is required to build with Java 17-ea.
Since `maven-shade-plugin` 3.2.3, `asm` 8.0 is used now. We should remove our custom dependency of `7.3.1`.
- https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-shade-plugin/3.2.4
- https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-shade-plugin/3.2.3
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33122 from dongjoon-hyun/SPARK-35922.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade Chill to 0.10.0.
### Why are the changes needed?
This is a maintenance release having cross-compilation to 2.12.14 and 2.13.6 .
- https://github.com/twitter/chill/releases/tag/v0.10.0
### Does this PR introduce _any_ user-facing change?
No, this is a dependency change.
### How was this patch tested?
Pass the CIs.
Closes#33119 from dongjoon-hyun/SPARK-35920.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Update Ivy from 2.4.0 to 2.5.0.
- https://ant.apache.org/ivy/history/2.5.0/release-notes.html
### Why are the changes needed?
This brings various improvements and bug fixes. Most notably, the adding of `ivy.maven.lookup.sources` and `ivy.maven.lookup.javadoc` configs can significantly speed up module resolution time if these are turned off, especially behind a proxy. These could arguably be turned off by default, because when submitting jobs you probably don't care about the sources or javadoc jars. I didn't include that here but happy to look into if it's desired.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing UT and build passes
Closes#33088 from Kimahriman/feature/ivy-update.
Authored-by: Adam Binford <adamq43@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade the scala-maven-plugin version to 4.5.3.
### Why are the changes needed?
This will upgrade `sbt-compiler-bridge` from 1.3.1 to 1.5.5 in order to bring the latest bug fixes.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33007 from williamhyun/scalamvnplugin.
Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade `zstd-jni` to 1.5.0-2, which uses `zstd` version 1.5.0.
### Why are the changes needed?
Major improvements to Zstd support are targeted for the upcoming 3.2.0 release of Spark. Zstd 1.5.0 introduces significant compression (+25% to 140%) and decompression (~15%) speed improvements in benchmarks described in more detail on the releases page:
- https://github.com/facebook/zstd/releases/tag/v1.5.0
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Build passes build tests, but the benchmark tests seem flaky. I am unsure if this change is responsible. The error is:
```
Running org.apache.spark.rdd.CoalescedRDDBenchmark:
21/06/08 18:53:10 ERROR SparkContext: Failed to add file:/home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar to Spark environment
java.lang.IllegalArgumentException: requirement failed: File spark-core_2.12-3.2.0-SNAPSHOT-tests.jar was already registered with a different path (old path = /home/runner/work/spark/spark/core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar, new path = /home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar
```
https://github.com/dchristle/spark/runs/2776123749?check_suite_focus=true
cc: dongjoon-hyun
Closes#32826 from dchristle/ZSTD150.
Lead-authored-by: David Christle <dchristle@squareup.com>
Co-authored-by: David Christle <dchristle@users.noreply.github.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This upgrade default Hadoop version from 3.2.1 to 3.3.1. The changes here are simply update the version number and dependency file.
### Why are the changes needed?
Hadoop 3.3.1 just came out, which comes with many client-side improvements such as for S3A/ABFS (20% faster when accessing S3). These are important for users who want to use Spark in a cloud environment.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Existing unit tests in Spark
- Manually tested using my S3 bucket for event log dir:
```
bin/spark-shell \
-c spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID \
-c spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY \
-c spark.eventLog.enabled=true
-c spark.eventLog.dir=s3a://<my-bucket>
```
- Manually tested against docker-based YARN dev cluster, by running `SparkPi`.
Closes#30135 from sunchao/SPARK-29250.
Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Remove commons-httpclient as a direct dependency for Hadoop-3.2 profile.
Hadoop-2.7 profile distribution still has it, hadoop-client has a compile dependency on commons-httpclient, thus we cannot remove it for Hadoop-2.7 profile.
```
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.7.4:compile
[INFO] | +- org.apache.hadoop:hadoop-common:jar:2.7.4:compile
[INFO] | | +- commons-cli:commons-cli:jar:1.2:compile
[INFO] | | +- xmlenc:xmlenc:jar:0.52:compile
[INFO] | | +- commons-httpclient:commons-httpclient:jar:3.1:compile
```
### Why are the changes needed?
Spark is pulling in commons-httpclient as a dependency directly. commons-httpclient went EOL years ago and there are most likely CVEs not being reported against it, thus we should remove it.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Existing unittests
- Checked the dependency tree before and after introducing the changes
Before:
```
./build/mvn dependency:tree -Phadoop-3.2 | grep -i "commons-httpclient"
Using `mvn` from path: /usr/bin/mvn
[INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:provided
```
After
```
./build/mvn dependency:tree | grep -i "commons-httpclient"
Using `mvn` from path: /Users/sumeet.gajjar/cloudera/upstream-spark/build/apache-maven-3.6.3/bin/mvn
```
P.S. Reopening this since [spark upgraded](463daabd5a) its `hive.version` to `2.3.9` which does not have a dependency on `commons-httpclient`.
Closes#32912 from sumeetgajjar/SPARK-35429.
Authored-by: Sumeet Gajjar <sumeetgajjar93@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
In https://github.com/apache/spark/pull/32838, we set the default JVM stack size to 16M from 4M.
However, there are still stackoverflow error in builds:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139672/console
Let's update the value to 64M
### Why are the changes needed?
Make test build stable.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manual trigger test builds.
Closes#32879 from gengliangwang/increaseStackAgain.
Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This pr upgrades built-in Hive to 2.3.9. Hive 2.3.9 changes:
- [HIVE-17155] - findConfFile() in HiveConf.java has some issues with the conf path
- [HIVE-24797] - Disable validate default values when parsing Avro schemas
- [HIVE-24608] - Switch back to get_table in HMS client for Hive 2.3.x
- [HIVE-21200] - Vectorization: date column throwing java.lang.UnsupportedOperationException for parquet
- [HIVE-21563] - Improve Table#getEmptyTable performance by disabling registerAllFunctionsOnce
- [HIVE-19228] - Remove commons-httpclient 3.x usage
### Why are the changes needed?
Fix regression caused by AVRO-2035.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#32750 from wangyum/SPARK-34512.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>