### What changes were proposed in this pull request?
Move `spark.yarn.isHadoopProvided` to Spark parent pom, so that under `resource-managers/yarn` we can make `hadoop-3.2` as the default profile.
### Why are the changes needed?
Currently under `resource-managers/yarn` there are 3 maven profiles : `hadoop-provided`, `hadoop-2.7`, and `hadoop-3.2`, of which `hadoop-3.2` is activated by default (via `activeByDefault`). The activation, however, doesn't work when there is other explicitly activated profiles. In specific, if users build Spark with `hadoop-provided`, maven will fail because it can't find Hadoop 3.2 related dependencies, which are defined in the `hadoop-3.2` profile section.
To fix the issue, this proposes to move the `hadoop-provided` section to the parent pom. Currently this is only used to define a property `spark.yarn.isHadoopProvided`, and it shouldn't matter where we define it.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Tested via running the command:
```
build/mvn clean package -DskipTests -B -Pmesos -Pyarn -Pkubernetes -Pscala-2.12 -Phadoop-provided
```
which was failing before this PR but is succeeding with it.
Also checked active profiles with the command:
```
build/mvn -Pyarn -Phadoop-provided help:active-profiles
```
and it shows that `hadoop-3.2` is active for `spark-yarn` module now.
Closes#34110 from sunchao/SPARK-36835-followup2.
Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit f9efdeea8c)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request?
Fix an issue where Maven may stuck in an infinite loop when building Spark, for Hadoop 2.7 profile.
### Why are the changes needed?
After re-enabling `createDependencyReducedPom` for `maven-shade-plugin`, Spark build stopped working for Hadoop 2.7 profile and will stuck in an infinitely loop, likely due to a Maven shade plugin bug similar to https://issues.apache.org/jira/browse/MSHADE-148. This seems to be caused by the fact that, under `hadoop-2.7` profile, variable `hadoop-client-runtime.artifact` and `hadoop-client-api.artifact`are both `hadoop-client` which triggers the issue.
As a workaround, this changes `hadoop-client-runtime.artifact` to be `hadoop-yarn-api` when using `hadoop-2.7`. Since `hadoop-yarn-api` is a dependency of `hadoop-client`, this essentially moves the former to the same level as the latter. It should have no effect as both are dependencies of Spark.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
N/A
Closes#34100 from sunchao/SPARK-36835-followup.
Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 937a74e6e7)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request?
Enable `createDependencyReducedPom` for Spark's Maven shaded plugin so that the effective pom won't contain those shaded artifacts such as `org.eclipse.jetty`
### Why are the changes needed?
At the moment, the effective pom leaks transitive dependencies to downstream apps for those shaded artifacts, which potentially will cause issues.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
I manually tested and the `core/dependency-reduced-pom.xml` no longer contains dependencies such as `jetty-XX`.
Closes#34085 from sunchao/SPARK-36835.
Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit ed88e610f0)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request?
Remove `com.github.rdblue:brotli-codec:0.1.1` dependency.
### Why are the changes needed?
As Stephen Coy pointed out in the dev list, we should not have `com.github.rdblue:brotli-codec:0.1.1` dependency which is not available on Maven Central. This is to avoid possible artifact changes on `Jitpack.io`.
Also, the dependency is for tests only. I suggest that we remove it now to unblock the 3.2.0 release ASAP.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
GA tests.
Closes#34059 from gengliangwang/removeDeps.
Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ba5708d944)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This is a follow-up to fix the leftover during switching the Scala version.
### Why are the changes needed?
This should be consistent.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This is not tested by UT. We need to check manually. There is no more `2.12.14`.
```
$ git grep 2.12.14
R/pkg/tests/fulltests/test_sparkSQL.R: c(as.Date("2012-12-14"), as.Date("2013-12-15"), as.Date("2014-12-16")))
data/mllib/ridge-data/lpsa.data:3.5307626,0.987291634724086 -0.36279314978779 -0.922212414640967 0.232904453212813 -0.522940888712441 1.79270085261407 0.342627053981254 1.26288870310799
sql/hive/src/test/resources/data/files/over10k:-3|454|65705|4294967468|62.12|14.32|true|mike white|2013-03-01 09:11:58.703087|40.18|joggying
```
Closes#34020 from dongjoon-hyun/SPARK-36759-2.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit adbea252db)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade Apache ORC to 1.6.11 to bring the latest bug fixes.
### Why are the changes needed?
Apache ORC 1.6.11 has the following fixes.
- https://issues.apache.org/jira/projects/ORC/versions/12350499
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33971 from dongjoon-hyun/SPARK-36732.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c217797297)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade Scala to 2.12.15 to support Java 17/18 better.
### Why are the changes needed?
Scala 2.12.15 improves compatibility with JDK 17 and 18:
https://github.com/scala/scala/releases/tag/v2.12.15
- Avoids IllegalArgumentException in JDK 17+ for lambda deserialization
- Upgrades to ASM 9.2, for JDK 18 support in optimizer
### Does this PR introduce _any_ user-facing change?
Yes, this is a Scala version change.
### How was this patch tested?
Pass the CIs
Closes#33999 from dongjoon-hyun/SPARK-36759.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 16f1f71ba5)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Upgrade Apache Parquet to 1.12.1
### Why are the changes needed?
Parquet 1.12.1 contains the following bug fixes:
- PARQUET-2064: Make Range public accessible in RowRanges
- PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream`
- PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding
- PARQUET-1633: Fix integer overflow
- PARQUET-2054: fix TCP leaking when calling ParquetFileWriter.appendFile
- PARQUET-2072: Do Not Determine Both Min/Max for Binary Stats
- PARQUET-2073: Fix estimate remaining row count in ColumnWriteStoreBase
- PARQUET-2078: Failed to read parquet file after writing with the same
In particular PARQUET-2078 is a blocker for the upcoming Apache Spark 3.2.0 release.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests + a new test for the issue in SPARK-36696
Closes#33969 from sunchao/upgrade-parquet-12.1.
Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
(cherry picked from commit a927b0836b)
Signed-off-by: DB Tsai <d_tsai@apple.com>
As [reported on `devspark.apache.org`](https://lists.apache.org/thread.html/r84cff66217de438f1389899e6d6891b573780159cd45463acf3657aa%40%3Cdev.spark.apache.org%3E), the published POMs when building with Scala 2.13 have the `scala-parallel-collections` dependency only in the `scala-2.13` profile of the pom.
### What changes were proposed in this pull request?
This PR suggests to work around this by un-commenting the `scala-parallel-collections` dependency when switching to 2.13 using the the `change-scala-version.sh` script.
I included an upgrade to scala-parallel-collections version 1.0.3, the changes compared to 0.2.0 are minor.
- removed OSGi metadata
- renamed some internal inner classes
- added `Automatic-Module-Name`
### Why are the changes needed?
According to the posts, this solves issues for developers that write unit tests for their applications.
Stephen Coy suggested to use the https://www.mojohaus.org/flatten-maven-plugin. While this sounds like a more principled solution, it is possibly too risky to do at this specific point in time?
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Locally
Closes#33948 from lrytz/parCollDep.
Authored-by: Lukas Rytz <lukas.rytz@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 1a62e6a2c1)
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This patch mainly proposes to add some e2e test cases in Spark for codec used by main datasources.
### Why are the changes needed?
We found there is no e2e test cases available for main datasources like Parquet, Orc. It makes developers harder to identify possible bugs early. We should add such tests in Spark.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added tests.
Closes#33912 from viirya/SPARK-36670.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 5a0ae694d0)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
### What changes were proposed in this pull request?
This PR aims to upgrade `aircompressor` dependency from 1.19 to 1.21.
### Why are the changes needed?
This will bring the latest bug fix which exists in `aircompressor` 1.17 ~ 1.20.
- 1e364f7133
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33883 from dongjoon-hyun/SPARK-36629.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ff8cc4b800)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
When preparing Spark 3.2.0 RC1, I hit the same issue of https://github.com/apache/spark/pull/31031.
```
[INFO] Compiling 21 Scala sources and 3 Java sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes ...
[ERROR] ## Exception when compiling 24 sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s signer information does not match signer information of other classes in the same package
java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
```
This PR is to apply the same fix again by downgrading scala-maven-plugin to 4.3.0
### Why are the changes needed?
To unblock the release process.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Build test
Closes#33791 from gengliangwang/downgrade.
Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit f0775d215e)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request?
This PR aims to bump ORC to 1.6.10
### Why are the changes needed?
This will bring the latest bug fixes.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33712 from williamhyun/orc.
Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit aff1b5594a)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR upgrades Jetty version to `9.4.43.v20210629`.
### Why are the changes needed?
To address vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429 which affects Jetty `9.4.42.v20210604`.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI
Closes#33656 from this/upgrade-jetty-9.4.43.
Lead-authored-by: Sajith Ariyarathna <sajith.janaprasad@gmail.com>
Co-authored-by: Sajith Ariyarathna <this@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 5a22f9ceaf)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
According to the feedback from GitHub, the change causing memory issue has been rolled back. We can try to raise memory again for GA.
### Why are the changes needed?
Trying higher memory settings for GA. It could speed up the testing time.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
GA
Closes#33623 from viirya/increasing-mem-ga.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 7d13ac177b)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
Update to the latest breeze 1.2
Minor bug fixes
No.
Existing tests
Closes#33449 from srowen/SPARK-35310.
Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
Trying to adjust build memory settings and serial execution to re-enable GA.
### Why are the changes needed?
GA tests are failed recently due to return code 137. We need to adjust build settings to make GA work.
### Does this PR introduce _any_ user-facing change?
No, dev only.
### How was this patch tested?
GA
Closes#33447 from viirya/test-ga.
Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit fd36ed4550)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade ZSTD-JNI to 1.5.0-4.
### Why are the changes needed?
ZSTD-JNI 1.5.0-3 has a packaging issue. 1.5.0-4 is recommended to be used instead.
- https://github.com/luben/zstd-jni/issues/181#issuecomment-885138495
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33483 from dongjoon-hyun/SPARK-36262.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a1a197403b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR upgrades `zstd-jni` from `1.5.0-2` to `1.5.0-3`.
`1.5.0-3` was released few days ago.
This release resolves an issue about buffer size calculation, which can affect usage in Spark.
https://github.com/luben/zstd-jni/releases/tag/v1.5.0-3
### Why are the changes needed?
It might be a corner case that skipping length is greater than `2^31 - 1` but it's possible to affect Spark.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#33464 from sarutak/upgrade-zstd-jni-1.5.0-3.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit dcb7db5370)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade scalatest-maven-plugin to version 2.0.2.
### Why are the changes needed?
2.0.2 supports build on JDK 11 officially.
- f45ce192f3
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33408 from williamhyun/SMP.
Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit df8bae0689)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to set `MaxMetaspaceSize` to `2g` because it's increasing the native memory consumption unlimitedly by default. The unlimited increasing memory causes GitHub Action flakiness. The value I observed during `hive` module test was over 1.8G and growing.
- https://docs.oracle.com/javase/10/gctuning/other-considerations.htm#JSGCT-GUID-BFB89453-60C0-42AC-81CA-87D59B0ACE2E
> Starting with JDK 8, the permanent generation was removed and the class metadata is allocated in native memory. The amount of native memory that can be used for class metadata is by default unlimited. Use the option -XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used for class metadata.
In addition, I increased the following memory limit to 4g consistently from two places.
```xml
- <jvmArg>-Xms2048m</jvmArg>
- <jvmArg>-Xmx2048m</jvmArg>
+ <jvmArg>-Xms4g</jvmArg>
+ <jvmArg>-Xmx4g</jvmArg>
```
```scala
- javaOptions += "-Xmx3g",
+ javaOptions ++= "-Xmx4g -XX:MaxMetaspaceSize=2g".split(" ").toSeq,
```
### Why are the changes needed?
This will reduce the flakiness in CI environment by limiting the memory usage explicitly.
When we limit it with `1g`, Hive module fails with `OOM` like the following.
```
java.lang.OutOfMemoryError: Metaspace
Error: Exception in thread "dispatcher-event-loop-110" java.lang.OutOfMemoryError: Metaspace
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33405 from dongjoon-hyun/SPARK-36195.
Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Kyle Bendickson <kbendickson@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d7df7a805f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR upgrades `commons-compress` from `1.20` to `1.21` to deal with CVEs.
### Why are the changes needed?
Some CVEs which affect `commons-compress 1.20` are reported and fixed in `1.21`.
https://commons.apache.org/proper/commons-compress/security-reports.html
* CVE-2021-35515
* CVE-2021-35516
* CVE-2021-35517
* CVE-2021-36090
The severities are reported as low for all the CVEs but it would be better to deal with them just in case.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#33333 from sarutak/upgrade-commons-compress-1.21.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit fd06cc211d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR reverts https://github.com/apache/spark/pull/32455 and its followup https://github.com/apache/spark/pull/32536 , because the new janino version has a bug that is not fixed yet: https://github.com/janino-compiler/janino/pull/148
### Why are the changes needed?
avoid regressions
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
existing tests
Closes#33302 from cloud-fan/revert.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit ae6199af44)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade Apache ORC to 1.6.9.
### Why are the changes needed?
This is required to bring ORC-804 in order to fix ORC encryption masking bug.
### Does this PR introduce _any_ user-facing change?
No. This is not released yet.
### How was this patch tested?
Pass the newly added test case.
Closes#33189 from dongjoon-hyun/SPARK-35992.
Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c55b9fd1e0)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Bump the scalatest version to 3.2.9
### Why are the changes needed?
With the scalatestplus change to 3.2.9.0, recent sbt fails to handle the mismatch between scalatest and scalatestplus and resolve resulting in test:compile errors of not being able to find the org.scalatest package.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
sbt tags/test:compile failed before and passes with this change.
Closes#33163 from holdenk/SPARK-35960-test-compile-sbt-issue.
Authored-by: Holden Karau <hkarau@netflix.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR reverts the change of SPARK-34549 ( #31658).
### Why are the changes needed?
See #33133.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Closes#33145 from sarutak/revert-SPARK-34549.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade ASM to 9.1
### Why are the changes needed?
The latest `xbean-asm9-shaded` is built with ASM 9.1.
- https://mvnrepository.com/artifact/org.apache.xbean/xbean-asm9-shaded/4.20
- 5e0e3c0c64/pom.xml (L67)
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33130 from dongjoon-hyun/SPARK-35928.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade `maven-shade-plugin` to 3.2.4.
### Why are the changes needed?
This is required to build with Java 17-ea.
Since `maven-shade-plugin` 3.2.3, `asm` 8.0 is used now. We should remove our custom dependency of `7.3.1`.
- https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-shade-plugin/3.2.4
- https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-shade-plugin/3.2.3
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33122 from dongjoon-hyun/SPARK-35922.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>