### What changes were proposed in this pull request?
This PR aims to upgrade the scala-maven-plugin version to 4.5.3.
### Why are the changes needed?
This will upgrade `sbt-compiler-bridge` from 1.3.1 to 1.5.5 in order to bring the latest bug fixes.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#33007 from williamhyun/scalamvnplugin.
Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade `zstd-jni` to 1.5.0-2, which uses `zstd` version 1.5.0.
### Why are the changes needed?
Major improvements to Zstd support are targeted for the upcoming 3.2.0 release of Spark. Zstd 1.5.0 introduces significant compression (+25% to 140%) and decompression (~15%) speed improvements in benchmarks described in more detail on the releases page:
- https://github.com/facebook/zstd/releases/tag/v1.5.0
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Build passes build tests, but the benchmark tests seem flaky. I am unsure if this change is responsible. The error is:
```
Running org.apache.spark.rdd.CoalescedRDDBenchmark:
21/06/08 18:53:10 ERROR SparkContext: Failed to add file:/home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar to Spark environment
java.lang.IllegalArgumentException: requirement failed: File spark-core_2.12-3.2.0-SNAPSHOT-tests.jar was already registered with a different path (old path = /home/runner/work/spark/spark/core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar, new path = /home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar
```
https://github.com/dchristle/spark/runs/2776123749?check_suite_focus=true
cc: dongjoon-hyun
Closes#32826 from dchristle/ZSTD150.
Lead-authored-by: David Christle <dchristle@squareup.com>
Co-authored-by: David Christle <dchristle@users.noreply.github.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This upgrade default Hadoop version from 3.2.1 to 3.3.1. The changes here are simply update the version number and dependency file.
### Why are the changes needed?
Hadoop 3.3.1 just came out, which comes with many client-side improvements such as for S3A/ABFS (20% faster when accessing S3). These are important for users who want to use Spark in a cloud environment.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Existing unit tests in Spark
- Manually tested using my S3 bucket for event log dir:
```
bin/spark-shell \
-c spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID \
-c spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY \
-c spark.eventLog.enabled=true
-c spark.eventLog.dir=s3a://<my-bucket>
```
- Manually tested against docker-based YARN dev cluster, by running `SparkPi`.
Closes#30135 from sunchao/SPARK-29250.
Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Remove commons-httpclient as a direct dependency for Hadoop-3.2 profile.
Hadoop-2.7 profile distribution still has it, hadoop-client has a compile dependency on commons-httpclient, thus we cannot remove it for Hadoop-2.7 profile.
```
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.7.4:compile
[INFO] | +- org.apache.hadoop:hadoop-common:jar:2.7.4:compile
[INFO] | | +- commons-cli:commons-cli:jar:1.2:compile
[INFO] | | +- xmlenc:xmlenc:jar:0.52:compile
[INFO] | | +- commons-httpclient:commons-httpclient:jar:3.1:compile
```
### Why are the changes needed?
Spark is pulling in commons-httpclient as a dependency directly. commons-httpclient went EOL years ago and there are most likely CVEs not being reported against it, thus we should remove it.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Existing unittests
- Checked the dependency tree before and after introducing the changes
Before:
```
./build/mvn dependency:tree -Phadoop-3.2 | grep -i "commons-httpclient"
Using `mvn` from path: /usr/bin/mvn
[INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:provided
```
After
```
./build/mvn dependency:tree | grep -i "commons-httpclient"
Using `mvn` from path: /Users/sumeet.gajjar/cloudera/upstream-spark/build/apache-maven-3.6.3/bin/mvn
```
P.S. Reopening this since [spark upgraded](463daabd5a) its `hive.version` to `2.3.9` which does not have a dependency on `commons-httpclient`.
Closes#32912 from sumeetgajjar/SPARK-35429.
Authored-by: Sumeet Gajjar <sumeetgajjar93@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
In https://github.com/apache/spark/pull/32838, we set the default JVM stack size to 16M from 4M.
However, there are still stackoverflow error in builds:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139672/console
Let's update the value to 64M
### Why are the changes needed?
Make test build stable.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manual trigger test builds.
Closes#32879 from gengliangwang/increaseStackAgain.
Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This pr upgrades built-in Hive to 2.3.9. Hive 2.3.9 changes:
- [HIVE-17155] - findConfFile() in HiveConf.java has some issues with the conf path
- [HIVE-24797] - Disable validate default values when parsing Avro schemas
- [HIVE-24608] - Switch back to get_table in HMS client for Hive 2.3.x
- [HIVE-21200] - Vectorization: date column throwing java.lang.UnsupportedOperationException for parquet
- [HIVE-21563] - Improve Table#getEmptyTable performance by disabling registerAllFunctionsOnce
- [HIVE-19228] - Remove commons-httpclient 3.x usage
### Why are the changes needed?
Fix regression caused by AVRO-2035.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#32750 from wangyum/SPARK-34512.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
The jenkins SBT/Maven build keep failing with stack overflow error:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139542
We should increase the JVM stack size to 16MB.
Also, https://github.com/apache/spark/pull/32521 set the stack size to 256MB for Java 11 build, which might be too big since every thread will allocate this memory for the stack. This PR also set it as 16MB to make the config consistent.
### Why are the changes needed?
Fix SBT/Maven build.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Jenkins and GA tests.
Closes#32838 from gengliangwang/increaseSBTStackSize.
Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade kubernetes-client to 5.4.1.
### Why are the changes needed?
This will bring a few bug fixes.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v5.4.1
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#32798 from dongjoon-hyun/SPARK-35660.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR upgrades HtmlUnit and its related artifacts to `2.50`.
### Why are the changes needed?
We currently uses 2.40 but 2.41+ seem to include bunch of bug fixes and improvements especially for JavaScript and CSS.
https://htmlunit.sourceforge.io/changes-report.html#a2.50.0
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
All the `UISeleniumSuite` which use HtmlUnit pass on my laptop.
Closes#32789 from sarutak/upgrade-htmlunit250.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request?
There are several pr to fix compilation warnings related to `procedure syntax` like SPARK-29291, SPARK-33352 and SPARK-35526, in order to prevent the recurrence of similar problems, this pr add a compile arg to convert `procedure syntax` related compilation warnings to compilation errors in Scala 2.13.
### Why are the changes needed?
Prevent the recurrence of compilation warnings related to `procedure syntax is deprecated`
### Does this PR introduce _any_ user-facing change?
`procedure syntax` is no longer allowed in Spark code with Scala 2.13, for constructors methods definition should be `this(...) = { }` not `this(...) { }`, for without `return type` methods definition should be `def methodName(...): Unit = {}` not `def methodName(...) {}`.
### How was this patch tested?
- Pass the GitHub Action Scala 2.13 job
- Manual test:
Do some code change like:
```
Index: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
===================================================================
-67,7 +67,7
private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
extends SparkListener with ThreadSafeRpcEndpoint with Logging {
- def this(sc: SparkContext) = {
+ def this(sc: SparkContext) {
this(sc, new SystemClock)
}
Index: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
===================================================================
-720,7 +720,7
}
}
- def registerMergeResult(shuffleId: Int, reduceId: Int, status: MergeStatus): Unit = {
+ def registerMergeResult(shuffleId: Int, reduceId: Int, status: MergeStatus) {
shuffleStatuses(shuffleId).addMergeResult(reduceId, status)
}
```
**sbt with Scala 2.13 profile compile failed as follows:***
```
[error] /home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70:29: procedure syntax is deprecated for constructors: add `=`, as in method definition
[error] def this(sc: SparkContext) {
[error] ^
[error] /home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:723:79: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `registerMergeResult`'s return type
[error] def registerMergeResult(shuffleId: Int, reduceId: Int, status: MergeStatus) {
[error] ^
[error] two errors found
[error] (core / Compile / compileIncremental) Compilation failed
[error] Total time: 136 s (02:16), completed May 31, 2021 10:06:50 AM
Error: Process completed with exit code 1.
```
**maven with Scala 2.13 profile compile failed as follows:**
```
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: procedure syntax is deprecated for constructors: add `=`, as in method definition
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:723: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `registerMergeResult`'s return type
[ERROR] two errors found
```
Closes#32710 from LuciferYang/SPARK-35574.
Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR is a follow-up of https://github.com/apache/spark/pull/32697 to update the missed part.
After SPARK-34774, we have Scala 2.12 version in `scala-2.12` profile.
### Why are the changes needed?
To be consistent.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs and manual.
**BEFORE**
```
$ build/mvn help:evaluate -Pscala-2.12 -Dexpression=scala.version | grep "^2.12"
Using `mvn` from path: /usr/local/bin/mvn
2.12.10
```
**AFTER**
```
$ build/mvn help:evaluate -Pscala-2.12 -Dexpression=scala.version | grep "^2.12"
Using `mvn` from path: /usr/local/bin/mvn
2.12.14
```
Closes#32707 from dongjoon-hyun/SPARK-31168-2.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR adds a feature for removing pulled container image after every docker integration test finish.
This feature is enabled by the new propoerty `spark.tes.docker.removePulledImage`.
### Why are the changes needed?
For idempotent.
I'm trying to add docker integration tests to GA in SPARK-35483 (#32631) but I noticed that `jdbc.OracleIntegrationSuite` consistently fails(https://github.com/sarutak/spark/runs/2646707235?check_suite_focus=true).
I investigated the reason and I found it's short of the storage capacity of the host on GA.
```
ORACLE PASSWORD FOR SYS AND SYSTEM: oracle
The location '/opt/oracle' specified for database files has insufficient space.
Database creation needs at least '4.5GB' disk space.
Specify a different database file destination that has enough space in the configuration file '/etc/sysconfig/oracle-xe-18c.conf'.
mv: cannot stat '/opt/oracle/product/18c/dbhomeXE/dbs/spfileXE.ora': No such file or directory
mv: cannot stat '/opt/oracle/product/18c/dbhomeXE/dbs/orapwXE': No such file or directory
ORACLE_HOME = [/home/oracle] ? ORACLE_BASE environment variable is not being set since this
information is not available for the current user ID .
You can set ORACLE_BASE manually if it is required.
Resetting ORACLE_BASE to its previous value or ORACLE_HOME
The Oracle base remains unchanged with value /opt/oracle
#####################################
########### E R R O R ###############
DATABASE SETUP WAS NOT SUCCESSFUL!
Please check output for further info!
########### E R R O R ###############
#####################################
The following output is now a tail of the alert.log:
tail: cannot open '/opt/oracle/diag/rdbms/*/*/trace/alert*.log' for reading: No such file or directory
tail: no files remaining
```
With this feature, pulled container image is removed and keep the capacity for `jdbc.OracleIntegrationSuite` in GA.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I confirmed the following things.
* A container image which is absent in the local repository is removed after test finished if `spark.test.container.removePulledImage` is `true`.
* A container image which is present in the local repository is not removed after the finished even if `spark.test.container.removePulledImage` is `true`.
* A container image is not removed regardless of presence of the container image in the local repository even if `spark.test.container.removePulledImage` is `true`.
Closes#32652 from sarutak/docker-image-rm.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade json4s from 3.7.0-M5 to 3.7.0-M11
Note: json4s version greater than 3.7.0-M11 is not binary compatible with Spark third party jars
### Why are the changes needed?
Multiple defect fixes and improvements like
https://github.com/json4s/json4s/issues/750https://github.com/json4s/json4s/issues/554https://github.com/json4s/json4s/issues/715
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Ran with the existing UTs
Closes#32636 from vinodkc/br_build_upgrade_json4s.
Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request?
This PR aims to upgrade joda-time from 2.10.5 to 2.10.10
### Why are the changes needed?
Improvement and bug fixes in joda-time
https://www.joda.org/joda-time/changes-report.html#a2.10.10
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Ran with the existing UTs
Closes#32661 from vinodkc/br_build_upgrade_joda_time.
Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade Apache HttpCore from 4.4.12 to 4.4.14.
### Why are the changes needed?
Stability improvements in httpcore 4.4.14
- Bug fix: Non-blocking TLSv1.3 connections can end up in an infinite event spin when closed concurrently by the local and the remote endpoints.
- HTTPCORE-647: Non-blocking connection terminated due to 'java.io.IOException: Broken pipe' can enter an infinite loop flushing buffered output data.
- PR #201, HTTPCORE-634: Fix race condition in AbstractConnPool that can cause internal state
- corruption
- HTTPCORE-612: DefaultConnectionReuseStrategy incorrectly used int to represent Content-Length value
- instead of long
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
With Jenkins Tests
Closes#32638 from vinodkc/br_build_upgrade_httpcore.
Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade ORC to 1.6.8.
### Why are the changes needed?
This will bring the latest bug fixes.
- https://orc.apache.org/news/2021/05/21/ORC-1.6.8/
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the existing CIs.
Closes#32635 from dongjoon-hyun/SPARK-35489.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade ASM to 7.3.1.
- https://issues.apache.org/jira/browse/XBEAN-323
- https://asm.ow2.io/versions.html
### Why are the changes needed?
ASM 7.3.1 bring following changes
- new V15 constant
- experimental support for PermittedSubtypes and RecordComponent
- bug fixes
- - 317885: SKIP_DEBUG now skips MethodParameters attributes
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Ran with the existing UTs
Closes#32634 from vinodkc/br_build_upgrade_asm.
Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
### What changes were proposed in this pull request?
This PR upgrades Dropwizard metrics to 4.2.0.
I also modified the corresponding links in `docs/monitoring.md`.
### Why are the changes needed?
The latest version was released last week and it contains some improvements.
https://github.com/dropwizard/metrics/releases/tag/v4.2.0
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Build succeeds and all the modified links are reachable.
Closes#32628 from sarutak/upgrade-dropwizard-4.2.0.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade `kubernetes-client` from 5.3.1 to 5.4.0 to support K8s 1.21 models officially.
### Why are the changes needed?
`kubernetes-client` 5.4.0 has `Kubernetes Model v1.21.0`
- https://github.com/fabric8io/kubernetes-client/releases/tag/v5.4.0
### Does this PR introduce _any_ user-facing change?
No. This is a dev-only change.
### How was this patch tested?
Pass the CIs including Jenkins K8s IT.
- https://github.com/apache/spark/pull/32612#issuecomment-845456039
I tested K8s IT with the following versions.
- minikube version: v1.20.0
- K8s Client Version: v1.21.0
- Server Version: v1.21.0
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
- Run SparkR on simple dataframe.R example
Run completed in 17 minutes, 18 seconds.
Total number of tests run: 26
Suites: completed 2, aborted 0
Tests: succeeded 26, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
Closes#32612 from dongjoon-hyun/SPARK-35462.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR changes the antlr4-runtime version from 4.8-1 to 4.8.
### Why are the changes needed?
Version 4.8 is the official release version, with a proper release note (see https://github.com/antlr/antlr4/releases) and artifiacts listed in https://www.antlr.org/download/index.html.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Will rely on tests in the PR.
Closes#32603 from bozhang2820/antlr-4.8.
Authored-by: Bo Zhang <bo.zhang@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
This PR upgrades the scalatestplus artifacts and scalacheck.
### Why are the changes needed?
scalatestplus artifacts Spark uses are two years old and these artifacts are currently renamed.
So, let's follow up.
Also, the latest releases seem to support Scala 3.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA passed on my repository.
Closes#32581 from sarutak/upgrade-scalatestplus.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to unify two K8s version variables in two `pom.xml`s into one. `kubernetes-client.version` is correct because the artifact ID is `kubernetes-client`.
```
kubernetes.client.version (kubernetes/core module)
kubernetes-client.version (kubernetes/integration-test module)
```
### Why are the changes needed?
Having two variables for the same value is confusing and inconvenient when we upgrade K8s versions.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs. (The compilation test passes are enough.)
Closes#32531 from dongjoon-hyun/SPARK-35394.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR proposes to bump up the janino version from 3.0.16 to v3.1.4.
The major changes of this upgrade are as follows:
- Fixed issue #131: Janino 3.1.2 is 10x slower than 3.0.11: The Compiler's IClassLoader was initialized way too eagerly, thus lots of classes were loaded from the class path, which is very slow.
- Improved the encoding of stack map frames according to JVMS11 4.7.4: Previously, only "full_frame"s were generated.
- Fixed issue #107: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package
- Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions).
For all the changes, please see the change log: http://janino-compiler.github.io/janino/changelog.html
NOTE1: I've checked that there is no obvious performance regression. For all the data, see a link: https://docs.google.com/spreadsheets/d/1srxT9CioGQg1fLKM3Uo8z1sTzgCsMj4pg6JzpdcG6VU/edit?usp=sharing
NOTE2: We upgraded janino to 3.1.2 (#27860) once before, but the commit had been reverted in #29495 because of the correctness issue. Recently, #32374 had checked if Spark could land on v3.1.3 or not, but a new bug was found there. These known issues has been fixed in v3.1.4 by following PRs:
- janino-compiler/janino#145
- janino-compiler/janino#146
### Why are the changes needed?
janino v3.0.X is no longer maintained.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA passed.
Closes#32455 from maropu/janino_v3.1.4.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR increases the stack size for Scala compilation in Maven build to fix the error:
```
java.lang.StackOverflowError
scala.reflect.internal.Trees$UnderConstructionTransformer.transform(Trees.scala:1741)
scala.reflect.internal.Trees$UnderConstructionTransformer.transform$(Trees.scala:1740)
scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.transform(ExplicitOuter.scala:289)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:477)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:330)
scala.reflect.api.Trees$Transformer.$anonfun$transformStats$1(Trees.scala:2597)
scala.reflect.api.Trees$Transformer.transformStats(Trees.scala:2595)
scala.reflect.internal.Trees.itransform(Trees.scala:1404)
scala.reflect.internal.Trees.itransform$(Trees.scala:1374)
scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:51)
scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.scala$reflect$internal$Trees$UnderConstructionTransformer$$super$transform(ExplicitOuter.scala:212)
scala.reflect.internal.Trees$UnderConstructionTransformer.transform(Trees.scala:1745)
scala.reflect.internal.Trees$UnderConstructionTransformer.transform$(Trees.scala:1740)
scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.transform(ExplicitOuter.scala:289)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:477)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:330)
scala.reflect.internal.Trees.itransform(Trees.scala:1383)
```
See https://github.com/apache/spark/runs/2554067779
### Why are the changes needed?
To recover JDK 11 compilation
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
CI in this PR will test it out.
Closes#32502 from HyukjinKwon/SPARK-35372.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
### What changes were proposed in this pull request?
This PR upgrades Jersey to 2.34.
### Why are the changes needed?
CVE-2021-28168, a local information disclosure vulnerability, is reported (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28168).
Spark 3.1.1, 3.0.2 and 3.2.0 use an affected version 2.30.
### Does this PR introduce _any_ user-facing change?
It's not clear how much the impact is but Spark uses an affected version of Jersey so I think it's better to upgrade it just in case.
### How was this patch tested?
CI.
Closes#32453 from sarutak/upgrade-jersey.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade snappy to version 1.1.8.4.
### Why are the changes needed?
This will bring the latest bug fixes and improvements.
- https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-1183-2021-01-20
- Make pure-java Snappy thread-safe
- Improved SnappyFramedInput/OutputStream performance by using java.util.zip.CRC32C
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#32402 from williamhyun/snappy1184.
Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This pr aims to upgrade Apache commons-lang3 to 3.12.0
### Why are the changes needed?
This version will bring the latest bug fixes as follows:
- https://commons.apache.org/proper/commons-lang/changes-report.html#a3.12.0
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass the Jenkins or GitHub Action
Closes#32393 from LuciferYang/lang3-to-312.
Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade Kafka client to 2.8.0.
Note that Kafka 2.8.0 uses ZSTD JNI 1.4.9-1 like Apache Spark 3.2.0.
### Why are the changes needed?
This will bring the latest client-side improvement and bug fixes like the following examples.
- KAFKA-10631 ProducerFencedException is not Handled on Offest Commit
- KAFKA-10134 High CPU issue during rebalance in Kafka consumer after upgrading to 2.5
- KAFKA-12193 Re-resolve IPs when a client is disconnected
- KAFKA-10090 Misleading warnings: The configuration was supplied but isn't a known config
- KAFKA-9263 The new hw is added to incorrect log when ReplicaAlterLogDirsThread is replacing log
- KAFKA-10607 Ensure the error counts contains the NONE
- KAFKA-10458 Need a way to update quota for TokenBucket registered with Sensor
- KAFKA-10503 MockProducer doesn't throw ClassCastException when no partition for topic
**RELEASE NOTE**
- https://downloads.apache.org/kafka/2.8.0/RELEASE_NOTES.html
- https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs with the existing tests because this is a dependency change.
Closes#32325 from dongjoon-hyun/SPARK-33913.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to upgrade Jetty to 9.4.40.
### Why are the changes needed?
SPARK-34988 (#32091) upgraded Jetty to 9.4.39 for CVE-2021-28165.
But after the upgrade, Jetty 9.4.40 was released to fix the ERR_CONNECTION_RESET issue (https://github.com/eclipse/jetty.project/issues/6152).
This issue seems to affect Jetty 9.4.39 when POST method is used with SSL.
For Spark, job submission using REST and ThriftServer with HTTPS protocol can be affected.
### Does this PR introduce _any_ user-facing change?
No. No released version uses Jetty 9.3.39.
### How was this patch tested?
CI.
Closes#32318 from sarutak/upgrade-jetty-9.4.40.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
### What changes were proposed in this pull request?
This PR upgrades the version of Jetty to 9.4.39.
### Why are the changes needed?
CVE-2021-28165 affects the version of Jetty that Spark uses and it seems to be a little bit serious.
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28165
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes#32091 from sarutak/upgrade-jetty-9.4.39.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request?
Update the Avro version to 1.10.2
### Why are the changes needed?
To stay up to date with upstream and catch compatibility issues with zstd
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests
Closes#31866 from iemejia/SPARK-27733-upgrade-avro-1.10.2.
Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
### What changes were proposed in this pull request?
This pr upgrade Jackson to 2.12.2.
Jackson Release 2.12: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12
### Why are the changes needed?
Make it easy to upgrade Avro 1.10.2.
```
[error] Caused by: sbt.ForkMain$ForkError: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.11.4 requires Jackson Databind version >= 2.11.0 and < 2.12.0
[error] at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
[error] at com.fasterxml.jackson.module.scala.JacksonModule.setupModule$(JacksonModule.scala:46)
[error] at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:17)
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Tested with Avro 1.10.2 and Parquet 1.12.0: https://github.com/apache/spark/runs/2157735537Closes#31878 from xclyfe/SPARK-34784.
Authored-by: Zhang, Xingchao <xingczhang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
### What changes were proposed in this pull request?
This PR aims to update `PostgresIntegrationSuite` with the latest results.
### Why are the changes needed?
The latest PostgreSQL jar version is 42.2.19. Since 42.2.9, the test is broken because it returns `0.0` instead of `0.00`.
- https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.19
42.2.9 (2019-12-06)
42.2.10 (2020-01-30)
42.2.11 (2020-03-09)
42.2.12 (2020-03-31)
42.2.13 (2020-06-04)
42.2.14 (2020-06-10)
42.2.15 (2020-08-14)
42.2.16 (2020-08-20)
42.2.17 (2020-10-09)
42.2.18 (2020-10-15)
42.2.19 (2021-02-18)
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CI with the updated test cases.
```
build/sbt -Pdocker-integration-tests 'docker-integration-tests/testOnly org.apache.spark.sql.jdbc.PostgresIntegrationSuite'
```
Closes#31910 from williamhyun/pg.
Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
### What changes were proposed in this pull request?
After SPARK-34507, execute` change-scala-version.sh` script will update `scala.version` in parent pom, but if we execute the following commands in order:
```
dev/change-scala-version.sh 2.13
dev/change-scala-version.sh 2.12
git status
```
there will generate git diff as follow:
```
diff --git a/pom.xml b/pom.xml
index ddc4ce2f68..f43d8c8f78 100644
--- a/pom.xml
+++ b/pom.xml
-162,7 +162,7
<commons.math3.version>3.4.1</commons.math3.version>
<commons.collections.version>3.2.2</commons.collections.version>
- <scala.version>2.12.10</scala.version>
+ <scala.version>2.13.5</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<scalatest-maven-plugin.version>2.0.0</scalatest-maven-plugin.version>
<scalafmt.parameters>--test</scalafmt.parameters>
```
seem 'scala.version' property was not update correctly.
So this pr add an extra 'scala.version' to scala-2.12 profile to ensure change-scala-version.sh can update the public `scala.version` property correctly.
### Why are the changes needed?
Bug fix.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
**Manual test**
Execute the following commands in order:
```
dev/change-scala-version.sh 2.13
dev/change-scala-version.sh 2.12
git status
```
**Before**
```
diff --git a/pom.xml b/pom.xml
index ddc4ce2f68..f43d8c8f78 100644
--- a/pom.xml
+++ b/pom.xml
-162,7 +162,7
<commons.math3.version>3.4.1</commons.math3.version>
<commons.collections.version>3.2.2</commons.collections.version>
- <scala.version>2.12.10</scala.version>
+ <scala.version>2.13.5</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<scalatest-maven-plugin.version>2.0.0</scalatest-maven-plugin.version>
<scalafmt.parameters>--test</scalafmt.parameters>
```
**After**
No git diff.
Closes#31865 from LuciferYang/SPARK-34774.
Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR fixes the build failure with Scala 2.13 which is related to `commons-cli`.
The last few days, build with Scala 2.13 on GA continues to fail and the error message says like as follows.
```
[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1: error: package org.apache.commons.cli does not exist
1278[error] import org.apache.commons.cli.GnuParser;
```
The reason is that `mvn help` in `change-scala-version.sh` downloads the POM file of `commons-cli` but doesn't download the JAR file, leading the build failure.
This PR also adds `commons-cli` to the dependencies explicitly because HiveThriftServer depends on it.
### Why are the changes needed?
Expect to fix the build failure with Scala 2.13.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I confirmed that build successfully finishes with Scala 2.13 on my laptop.
```
find ~/.m2 -name commons-cli -exec rm -rf {} \;
find ~/.ivy2 -name commons-cli -exec rm -rf {} \;
find ~/.cache/ -name commons-cli -exec rm -rf {} \; // For Linux
find ~/Library/Caches -name commons-cli -exec rm -rf {} \; // For macOS
dev/change-scala-version 2.13
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 clean compile test:compile
```
Closes#31862 from sarutak/commons-cli.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade ZSTD-JNI to 1.4.9-1.
### Why are the changes needed?
ZStandard 1.4.9 and its corresponding JNI brings the following bug fixes and improvements.
- https://github.com/facebook/zstd/releases/tag/v1.4.9
One of notable improvement of ZStandard 1.4.9 is `2x faster Long Distance Mode`, but we are not using it yet.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs with the existing tests and there is no regression in ZStandardBenchmark.
Closes#31784 from dongjoon-hyun/ZSTD-149.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to use `ZstdInputStreamNoFinalizer` and `ZstdOutputStreamNoFinalizer` classes and upgrade ZSTD JNI to 1.4.8-7.
### Why are the changes needed?
`1.4.8-7` makes `NoFinalizer` classes public again. This improves the performance.
- 57d53a09d2
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#31762 from dongjoon-hyun/SPARK-ZSTD-NOFINALIZER.
Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR proposes a new feature that allows developers to debug test code using JDWP with sbt an Maven.
More specifically, this PR introduces the following profile options.
* `jdwp-test-debug`: An profile which controls enable/disable JDWP debug
* `test.jdwp.address`: An option which corresponds to `address` option in JDWP
* `test.jdwp.suspend`: An option which corresponds to `suspend` option in JDWP
* `test.jdwp.server`: An option which corresponds to `server` option in JDWP
* `test.debug.suite`: An option which controls whether debug ScalaStyle suites (Maven only)
For `sbt`, this feature can be used like `build/sbt -Pjdwp-test-debug -Dtest.jdwp.address=localhost:9876 -Dtest.jdwp.suspend=y -Dtest.jdwp.server=y` and can be used for both JUnit tests and ScalaTest tests.
For `Maven`, this feature can be used like as follows:
(For JUnit tests) `build/mvn -Pjdwp-test-debug -Dtest.jdwp.address=localhost:9876 -Dtest.jdwp.suspend=y -Dtest.jdwp.server=y`
(For ScalaTest suites) `build/mvn -Pjdwp-test-debug -Dtest.debug.suite=true -Dtest.jdwp.address=localhost:9876 -Dtest.jdwp.suspend=y -Dtest.jdwp.server=y` (It might be useful to specify specific sub-modules like `-pl sql/core,sql/catalyst`).
### Why are the changes needed?
It's useful to debug test code.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I confirmed the following things.
* `jdwp-tes-debug` can switch JDWP enabled/disabled
* `test.jdwp.address` can change address and port.
* `test.jdwp.suspend` can change the behavior that the target debugee suspends or not.
* `test.jdwp.server` can change the behavior that the JDWP debugger run as a server or client.
* ScalaTest suites can be debugged with Maven with setting `test.debug.suite` to `true`.
Closes#31706 from sarutak/sbt-jdwp.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
1. This PR aims to ignore ORC encryption tests when ORC shim is loaded by old Hadoop library by some other tests. The test coverage is preserved by Jenkins SBT runs and GitHub Action jobs. This PR only aims to recover Maven Jenkins jobs.
2. In addition, this PR simplifies SBT testing by refactor the test config to `SparkBuild.scala/pom.xml` and remove `DedicatedJVMTest`. This will remove one GitHub Action job which was recently added for `DedicatedJVMTest` tag.
### Why are the changes needed?
Currently, Maven test fails when it runs in a batch mode because `HadoopShimsPre2_3$NullKeyProvider` is loaded.
**MVN COMMAND**
```
$ mvn test -pl sql/core --am -Dtest=none -DwildcardSuites=org.apache.spark.sql.execution.datasources.orc.OrcV1QuerySuite,org.apache.spark.sql.execution.datasources.orc.OrcEncryptionSuite
```
**BEFORE**
```
- Write and read an encrypted table *** FAILED ***
...
Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (localhost executor driver): java.lang.IllegalArgumentException: Unknown key pii
at org.apache.orc.impl.HadoopShimsPre2_3$NullKeyProvider.getCurrentKeyVersion(HadoopShimsPre2_3.java:71)
at org.apache.orc.impl.WriterImpl.getKey(WriterImpl.java:871)
```
**AFTER**
```
OrcV1QuerySuite
...
OrcEncryptionSuite:
- Write and read an encrypted file !!! CANCELED !!!
[] was empty org.apache.orc.impl.HadoopShimsPre2_3$NullKeyProvider1b705f65 doesn't has the test keys. ORC shim is created with old Hadoop libraries (OrcEncryptionSuite.scala:39)
- Write and read an encrypted table !!! CANCELED !!!
[] was empty org.apache.orc.impl.HadoopShimsPre2_3$NullKeyProvider22adeee1 doesn't has the test keys. ORC shim is created with old Hadoop libraries (OrcEncryptionSuite.scala:67)
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the Jenkins Maven tests.
For SBT command,
- the test suite required a dedicated JVM (Before)
- the test suite doesn't require a dedicated JVM (After)
```
$ build/sbt "sql/testOnly *.OrcV1QuerySuite *.OrcEncryptionSuite"
...
[info] OrcV1QuerySuite
...
[info] - SPARK-20728 Make ORCFileFormat configurable between sql/hive and sql/core (26 milliseconds)
[info] OrcEncryptionSuite:
[info] - Write and read an encrypted file (431 milliseconds)
[info] - Write and read an encrypted table (359 milliseconds)
[info] All tests passed.
[info] Passed: Total 35, Failed 0, Errors 0, Passed 35
```
Closes#31697 from dongjoon-hyun/SPARK-34578-TEST.
Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Cleanup all Zinc standalone server code, and realated coniguration.
### Why are the changes needed?
![image](https://user-images.githubusercontent.com/1736354/109154790-c1d3e580-77a9-11eb-8cde-835deed6e10e.png)
- Zinc is the incremental compiler to speed up builds of compilation.
- The scala-maven-plugin is the mave plugin, which is used by Spark, one of the function is to integrate the Zinc to enable the incremental compiler.
- Since Spark v3.0.0 ([SPARK-28759](https://issues.apache.org/jira/browse/SPARK-28759)), the scala-maven-plugin is upgraded to v4.X, that means Zinc v0.3.13 standalone server is useless anymore.
However, we still download, install, start the standalone Zinc server. we should remove all zinc standalone server code, and all related configuration.
See more in [SPARK-34539](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-34539) or the doc [Zinc standalone server is useless after scala-maven-plugin 4.x](https://docs.google.com/document/d/1u4kCHDx7KjVlHGerfmbcKSB0cZo6AD4cBdHSse-SBsM).
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Run any mvn build:
./build/mvn -DskipTests clean package -pl core
You could see the increamental compilation is still working, the stage of "scala-maven-plugin:4.3.0:compile (scala-compile-first)" with incremental compilation info, like:
```
[INFO] --- scala-maven-plugin:4.3.0:testCompile (scala-test-compile-first) spark-core_2.12 ---
[INFO] Using incremental compilation using Mixed compile order
[INFO] Compiler bridge file: /root/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar
[INFO] compiler plugin: BasicArtifact(com.github.ghik,silencer-plugin_2.12.10,1.6.0,null)
[INFO] Compiling 303 Scala sources and 27 Java sources to /root/spark/core/target/scala-2.12/test-classes ...
```
Closes#31647 from Yikun/cleanup-zinc.
Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>