### What changes were proposed in this pull request?
Although AS-IS master branch already works with K8s 1.20, this PR aims to upgrade K8s client to 5.3.0 to support K8s 1.20 officially.
- https://github.com/fabric8io/kubernetes-client#compatibility-matrix
The following are the notable breaking API changes.
1. Remove Doneable (5.0+):
- https://github.com/fabric8io/kubernetes-client/pull/2571
2. Change Watcher.onClose signature (5.0+):
- https://github.com/fabric8io/kubernetes-client/pull/2616
3. Change Readiness (5.1+)
- https://github.com/fabric8io/kubernetes-client/pull/2796
### Why are the changes needed?
According to the compatibility matrix, this makes Apache Spark and its external cluster manager extension support all K8s 1.20 features officially for Apache Spark 3.2.0.
### Does this PR introduce _any_ user-facing change?
Yes, this is a dev dependency change which affects K8s cluster extension users.
### How was this patch tested?
Pass the CIs.
This is manually tested with K8s IT.
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
- Run SparkR on simple dataframe.R example
Run completed in 17 minutes, 44 seconds.
Total number of tests run: 27
Suites: completed 2, aborted 0
Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
Closes#32221 from dongjoon-hyun/SPARK-K8S-530.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
SPARK-10498 added the initial Jira client requirement with 1.0.3 five year ago (2016 January). As of today, it causes `dev/merge_spark_pr.py` failure with `Python 3.9.4` due to this old dependency. This PR aims to upgrade it to the latest version, 2.0.0. The latest version is also a little old (2018 July).
- https://pypi.org/project/jira/#history
### Why are the changes needed?
`Jira==2.0.0` works well with both Python 3.8/3.9 while `Jira==1.0.3` fails with Python 3.9.
**BEFORE**
```
$ pyenv global 3.9.4
$ pip freeze | grep jira
jira==1.0.3
$ dev/merge_spark_pr.py
Traceback (most recent call last):
File "/Users/dongjoon/APACHE/spark-merge/dev/merge_spark_pr.py", line 39, in <module>
import jira.client
File "/Users/dongjoon/.pyenv/versions/3.9.4/lib/python3.9/site-packages/jira/__init__.py", line 5, in <module>
from .config import get_jira
File "/Users/dongjoon/.pyenv/versions/3.9.4/lib/python3.9/site-packages/jira/config.py", line 17, in <module>
from .client import JIRA
File "/Users/dongjoon/.pyenv/versions/3.9.4/lib/python3.9/site-packages/jira/client.py", line 165
validate=False, get_server_info=True, async=False, logging=True, max_retries=3):
^
SyntaxError: invalid syntax
```
**AFTER**
```
$ pip install jira==2.0.0
$ dev/merge_spark_pr.py
git rev-parse --abbrev-ref HEAD
Which pull request would you like to merge? (e.g. 34):
```
### Does this PR introduce _any_ user-facing change?
No. This is a committer-only script.
### How was this patch tested?
Manually.
Closes#32215 from dongjoon-hyun/jira.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas Index unit tests to PySpark.
### Why are the changes needed?
Currently, the pandas-on-Spark modules are not tested fully. We should enable the Index unit tests.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Enable Index unit tests.
Closes#32139 from xinrong-databricks/port.indexes_tests.
Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas miscellaneous unit tests to PySpark.
### Why are the changes needed?
Currently, the pandas-on-Spark modules are not tested fully. We should enable miscellaneous unit tests.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Enable miscellaneous unit tests.
Closes#32152 from xinrong-databricks/port.misc_tests.
Lead-authored-by: xinrong-databricks <47337188+xinrong-databricks@users.noreply.github.com>
Co-authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR bumps up the version of pycodestyle from 2.6.0 to 2.7.0 released a month ago.
### Why are the changes needed?
2.7.0 includes three major fixes below (see https://readthedocs.org/projects/pycodestyle/downloads/pdf/latest/):
- Fix physical checks (such as W191) at end of file. PR #961.
- Add --indent-size option (defaulting to 4). PR #970.
- W605: fix escaped crlf false positive on windows. PR #976
The first and third ones could be useful for dev to detect the styles.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually tested locally.
Closes#32160 from HyukjinKwon/SPARK-35061.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas internal implementation unit tests to PySpark.
### Why are the changes needed?
Currently, the pandas-on-Spark modules are not tested fully. We should enable the internal implementation unit tests.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Enable internal implementation unit tests.
Closes#32137 from xinrong-databricks/port.test_internal_impl.
Lead-authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Co-authored-by: xinrong-databricks <47337188+xinrong-databricks@users.noreply.github.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to leverage the GitHub Actions resources from the forked repositories instead of using the resources in ASF organisation at GitHub.
This is how it works:
1. "Build and test" (`build_and_test.yml`) triggers a build on any commit on any branch (except `branch-*.*`), which roughly means:
- The original repository will trigger the build on any commits in `master` branch
- The forked repository will trigger the build on any commit in any branch.
2. The build triggered in the forked repository will checkout the original repository's `master` branch locally, and merge the branch from the forked repository into the original repository's `master` branch locally.
Therefore, the tests in the forked repository will run after being sync'ed with the original repository's `master` branch.
3. In the original repository, it triggers a workflow that detects the workflow triggered in the forked repository, and add a comment, to the PR, pointing out the workflow in forked repository.
In short, please see this example HyukjinKwon#34
1. You create a PR and your repository triggers the workflow. Your PR uses the resources allocated to you for testing.
2. Apache Spark repository finds your workflow, and links it in a comment in your PR
**NOTE** that we will still run the tests in the original repository for each commit pushed to `master` branch. This distributes the workflows only in PRs.
### Why are the changes needed?
ASF shares the resources across all the ASF projects, which makes the development slow down.
Please see also:
- Discussion in the buildsa.o mailing list: https://lists.apache.org/x/thread.html/r48d079eeff292254db22705c8ef8618f87ff7adc68d56c4e5d0b4105%3Cbuilds.apache.org%3E
- Infra ticket: https://issues.apache.org/jira/browse/INFRA-21646
By distributing the workflows to use author's resources, we can get around this issue.
### Does this PR introduce _any_ user-facing change?
No, this is a dev-only change.
### How was this patch tested?
Manually tested at https://github.com/HyukjinKwon/spark/pull/34 and https://github.com/HyukjinKwon/spark/pull/33.
Closes#32092 from HyukjinKwon/poc-fork-resources.
Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas plot unit tests to PySpark.
### Why are the changes needed?
Currently, the pandas-on-Spark modules are not tested fully. We should enable the plot unit tests.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Enable plot unit tests.
Closes#32151 from xinrong-databricks/port.plot_tests.
Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas DataFrame-related unit tests to PySpark.
### Why are the changes needed?
Currently, the pandas-on-Spark modules are not fully tested. We should enable the DataFrame-related unit tests first.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Enable DataFrame-related unit tests.
Closes#32131 from xinrong-databricks/port.test_dataframe_related.
Lead-authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Co-authored-by: xinrong-databricks <47337188+xinrong-databricks@users.noreply.github.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas Series related unit tests to PySpark.
### Why are the changes needed?
Currently, the pandas-on-Spark modules are not fully tested. We should enable the Series related unit tests first.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Enable Series-related unit tests.
Closes#32117 from xinrong-databricks/port.test_series_related.
Lead-authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Co-authored-by: xinrong-databricks <47337188+xinrong-databricks@users.noreply.github.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas operations on different frames unit tests to PySpark.
### Why are the changes needed?
Currently, the pandas-on-Spark modules are not tested fully. We should enable the operations on different frames unit tests.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Enable operations on different frames unit tests.
Closes#32133 from xinrong-databricks/port.test_ops_on_diff_frames.
Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas DataFrame unit test to PySpark.
### Why are the changes needed?
Currently, the pandas-on-Spark modules are not tested at all. We should enable the DataFrame unit test first.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Enable the DataFrame unit test.
Closes#32083 from xinrong-databricks/port.test_dataframe.
Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Now that we merged the Koalas main code into PySpark code base (#32036), we should enable doctests on the Spark's infrastructure.
### Why are the changes needed?
Currently the pandas-on-Spark modules are not tested at all.
We should enable doctests first, and we will port other unit tests separately later.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Enabled the whole doctests.
Closes#32069 from ueshin/issues/SPARK-34972/pyspark-pandas_doctests.
Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
As a first step of [SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849), this PR proposes porting the Koalas main code into PySpark.
This PR contains minimal changes to the existing Koalas code as follows:
1. `databricks.koalas` -> `pyspark.pandas`
2. `from databricks import koalas as ks` -> `from pyspark import pandas as pp`
3. `ks.xxx -> pp.xxx`
Other than them:
1. Added a line to `python/mypy.ini` in order to ignore the mypy test. See related issue at [SPARK-34941](https://issues.apache.org/jira/browse/SPARK-34941).
2. Added a comment to several lines in several files to ignore the flake8 F401. See related issue at [SPARK-34943](https://issues.apache.org/jira/browse/SPARK-34943).
When this PR is merged, all the features that were previously used in [Koalas](https://github.com/databricks/koalas) will be available in PySpark as well.
Users can access to the pandas API in PySpark as below:
```python
>>> from pyspark import pandas as pp
>>> ppdf = pp.DataFrame({"A": [1, 2, 3], "B": [15, 20, 25]})
>>> ppdf
A B
0 1 15
1 2 20
2 3 25
```
The existing "options and settings" in Koalas are also available in the same way:
```python
>>> from pyspark.pandas.config import set_option, reset_option, get_option
>>> ppser1 = pp.Series([1, 2, 3])
>>> ppser2 = pp.Series([3, 4, 5])
>>> ppser1 + ppser2
Traceback (most recent call last):
...
ValueError: Cannot combine the series or dataframe because it comes from a different dataframe. In order to allow this operation, enable 'compute.ops_on_diff_frames' option.
>>> set_option("compute.ops_on_diff_frames", True)
>>> ppser1 + ppser2
0 4
1 6
2 8
dtype: int64
```
Please also refer to the [API Reference](https://koalas.readthedocs.io/en/latest/reference/index.html) and [Options and Settings](https://koalas.readthedocs.io/en/latest/user_guide/options.html) for more detail.
**NOTE** that this PR intentionally ports the main codes of Koalas first almost as are with minimal changes because:
- Koalas project is fairly large. Making some changes together for PySpark will make it difficult to review the individual change.
Koalas dev includes multiple Spark committers who will review. By doing this, the committers will be able to more easily and effectively review and drive the development.
- Koalas tests and documentation require major changes to make it look great together with PySpark whereas main codes do not require.
- We lately froze the Koalas codebase, and plan to work together on the initial porting. By porting the main codes first as are, it unblocks the Koalas dev to work on other items in parallel.
I promise and will make sure on:
- Rename Koalas to PySpark pandas APIs and/or pandas-on-Spark accordingly in documentation, and the docstrings and comments in the main codes.
- Triage APIs to remove that don’t make sense when Koalas is in PySpark
The documentation changes will be tracked in [SPARK-34885](https://issues.apache.org/jira/browse/SPARK-34885), the test code changes will be tracked in [SPARK-34886](https://issues.apache.org/jira/browse/SPARK-34886).
### Why are the changes needed?
Please refer to:
- [[DISCUSS] Support pandas API layer on PySpark](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Support-pandas-API-layer-on-PySpark-td30945.html)
- [[VOTE] SPIP: Support pandas API layer on PySpark](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-SPIP-Support-pandas-API-layer-on-PySpark-td30996.html)
### Does this PR introduce _any_ user-facing change?
Yes, now users can use the pandas APIs on Spark
### How was this patch tested?
Manually tested for exposed major APIs and options as described above.
### Koalas contributors
Koalas would not have been possible without the following contributors:
ueshin
HyukjinKwon
rxin
xinrong-databricks
RainFung
charlesdong1991
harupy
floscha
beobest2
thunterdb
garawalid
LucasG0
shril
deepyaman
gioa
fwani
90jam
thoo
AbdealiJK
abishekganesh72
gliptak
DumbMachine
dvgodoy
stbof
nitlev
hjoo
gatorsmile
tomspur
icexelloss
awdavidson
guyao
akhilputhiry
scook12
patryk-oleniuk
tracek
dennyglee
athena15
gstaubli
WeichenXu123
hsubbaraj
lfdversluis
ktksq
shengjh
margaret-databricks
LSturtew
sllynn
manuzhang
jijosg
sadikovi
Closes#32036 from itholic/SPARK-34890.
Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Update the Avro version to 1.10.2
### Why are the changes needed?
To stay up to date with upstream and catch compatibility issues with zstd
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests
Closes#31866 from iemejia/SPARK-27733-upgrade-avro-1.10.2.
Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
### What changes were proposed in this pull request?
This pr upgrade Jackson to 2.12.2.
Jackson Release 2.12: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12
### Why are the changes needed?
Make it easy to upgrade Avro 1.10.2.
```
[error] Caused by: sbt.ForkMain$ForkError: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.11.4 requires Jackson Databind version >= 2.11.0 and < 2.12.0
[error] at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
[error] at com.fasterxml.jackson.module.scala.JacksonModule.setupModule$(JacksonModule.scala:46)
[error] at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:17)
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Tested with Avro 1.10.2 and Parquet 1.12.0: https://github.com/apache/spark/runs/2157735537Closes#31878 from xclyfe/SPARK-34784.
Authored-by: Zhang, Xingchao <xingczhang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
### What changes were proposed in this pull request?
This PR fixes the build failure with Scala 2.13 which is related to `commons-cli`.
The last few days, build with Scala 2.13 on GA continues to fail and the error message says like as follows.
```
[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1: error: package org.apache.commons.cli does not exist
1278[error] import org.apache.commons.cli.GnuParser;
```
The reason is that `mvn help` in `change-scala-version.sh` downloads the POM file of `commons-cli` but doesn't download the JAR file, leading the build failure.
This PR also adds `commons-cli` to the dependencies explicitly because HiveThriftServer depends on it.
### Why are the changes needed?
Expect to fix the build failure with Scala 2.13.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I confirmed that build successfully finishes with Scala 2.13 on my laptop.
```
find ~/.m2 -name commons-cli -exec rm -rf {} \;
find ~/.ivy2 -name commons-cli -exec rm -rf {} \;
find ~/.cache/ -name commons-cli -exec rm -rf {} \; // For Linux
find ~/Library/Caches -name commons-cli -exec rm -rf {} \; // For macOS
dev/change-scala-version 2.13
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 clean compile test:compile
```
Closes#31862 from sarutak/commons-cli.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR upgrade Py4J from 0.10.9.1 to 0.10.9.2 that contains some bug fixes and improvements.
* expose shell parameter in Popen inside launch_gateway. ([bartdag/py4j220efc3](220efc3716))
* fixed Flake8 errors ([bartdag/py4j6c6ee9a](6c6ee9aedc))
### Why are the changes needed?
To leverage fixes from the upstream in Py4J.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Jenkins build and GitHub Actions will test it out.
Closes#31796 from wankunde/py4j.
Authored-by: wankunde <wankunde@163.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
The `change-scala-version.sh` script updates Scala versions across the build for cross-build purposes. It manually changes `scala.binary.version` but not `scala.version`.
### Why are the changes needed?
It seems that this has always been an oversight, and the cross-built builds of Spark have an incorrect scala.version. See 2.4.5's 2.12 POM for example, which shows a Scala 2.11 version.
https://search.maven.org/artifact/org.apache.spark/spark-core_2.12/2.4.5/pom
More comments in the JIRA.
### Does this PR introduce _any_ user-facing change?
Should be a build-only bug fix.
### How was this patch tested?
Existing tests, but really N/A
Closes#31801 from srowen/SPARK-34507.
Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade ZSTD-JNI to 1.4.9-1.
### Why are the changes needed?
ZStandard 1.4.9 and its corresponding JNI brings the following bug fixes and improvements.
- https://github.com/facebook/zstd/releases/tag/v1.4.9
One of notable improvement of ZStandard 1.4.9 is `2x faster Long Distance Mode`, but we are not using it yet.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs with the existing tests and there is no regression in ZStandardBenchmark.
Closes#31784 from dongjoon-hyun/ZSTD-149.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to use `ZstdInputStreamNoFinalizer` and `ZstdOutputStreamNoFinalizer` classes and upgrade ZSTD JNI to 1.4.8-7.
### Why are the changes needed?
`1.4.8-7` makes `NoFinalizer` classes public again. This improves the performance.
- 57d53a09d2
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#31762 from dongjoon-hyun/SPARK-ZSTD-NOFINALIZER.
Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Cleanup all Zinc standalone server code, and realated coniguration.
### Why are the changes needed?
![image](https://user-images.githubusercontent.com/1736354/109154790-c1d3e580-77a9-11eb-8cde-835deed6e10e.png)
- Zinc is the incremental compiler to speed up builds of compilation.
- The scala-maven-plugin is the mave plugin, which is used by Spark, one of the function is to integrate the Zinc to enable the incremental compiler.
- Since Spark v3.0.0 ([SPARK-28759](https://issues.apache.org/jira/browse/SPARK-28759)), the scala-maven-plugin is upgraded to v4.X, that means Zinc v0.3.13 standalone server is useless anymore.
However, we still download, install, start the standalone Zinc server. we should remove all zinc standalone server code, and all related configuration.
See more in [SPARK-34539](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-34539) or the doc [Zinc standalone server is useless after scala-maven-plugin 4.x](https://docs.google.com/document/d/1u4kCHDx7KjVlHGerfmbcKSB0cZo6AD4cBdHSse-SBsM).
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Run any mvn build:
./build/mvn -DskipTests clean package -pl core
You could see the increamental compilation is still working, the stage of "scala-maven-plugin:4.3.0:compile (scala-compile-first)" with incremental compilation info, like:
```
[INFO] --- scala-maven-plugin:4.3.0:testCompile (scala-test-compile-first) spark-core_2.12 ---
[INFO] Using incremental compilation using Mixed compile order
[INFO] Compiler bridge file: /root/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar
[INFO] compiler plugin: BasicArtifact(com.github.ghik,silencer-plugin_2.12.10,1.6.0,null)
[INFO] Compiling 303 Scala sources and 27 Java sources to /root/spark/core/target/scala-2.12/test-classes ...
```
Closes#31647 from Yikun/cleanup-zinc.
Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR fixes an issue that `bundler exec jekyll` build fails to generate Scala API docs even though after `dev/change-scala-version.sh 2.13` run.
### Why are the changes needed?
The reason of this issue is that `build/sbt` in `copy_api_dirs.rb` runs without `-Pscala-2.13`.
So, it's a bug.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I tested the following patterns manually.
* `dev/change-scala-version 2.13` and then `bundler exec jekyll build`
* `dev/change-scala-version 2.12` to change back to Scala 2.12 and then `bundler exec jekyll build`
* `dev/change-scala-version 2.13` two times to confirm the idempotency and then `bundler exec jekyll build`
* `dev/change-scala-version 2.12` two times to confirm the idempotency and then `bundler exec jekyll build`
Closes#31690 from sarutak/jekyll-scala-2.13.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This adds `hadoop-yarn-server-web-proxy` as dependency for Yarn and Hadoop 3.x profile (it is already a dependency for 2.x). Also excludes some dependencies from the module which are already covered by other Hadoop jars used by Spark.
### Why are the changes needed?
The class `org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter` is used by `ApplicationMaster`:
```scala
private def addAmIpFilter(driver: Option[RpcEndpointRef], proxyBase: String) = {
val amFilter = "org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter"
val params = client.getAmIpFilterParams(yarnConf, proxyBase)
driver match {
case Some(d) =>
d.send(AddWebUIFilter(amFilter, params, proxyBase))
...
```
and will be loaded at runtime. Therefore, without the above jar Spark Yarn app will fail with `ClassNotFoundError`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing unit tests. Also tested manually and it worked with the fix, while was failing previously.
Closes#31642 from sunchao/SPARK-33212-followup-2.
Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade ZSTD JNI to 1.4.8-6.
### Why are the changes needed?
This fixes the following issue and will unblock SPARK-34479 (Support ZSTD at Avro data source).
- https://github.com/luben/zstd-jni/issues/161
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#31674 from dongjoon-hyun/SPARK-34559.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR adds some more known translations of contributors who contributed multiple times in Spark 3.1.1.
### Why are the changes needed?
To make release process easier.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
N/A (auto-generated)
Closes#31665 from HyukjinKwon/minor-add-known-translations.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to add an alias environment variable `GITHUB_OAUTH_KEY` for `GITHUB_API_TOKEN` in `translate-contributors.py` script.
### Why are the changes needed?
```
dev/github_jira_sync.py:GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY")
dev/github_jira_sync.py: request.add_header('Authorization', 'token %s' % GITHUB_OAUTH_KEY)
dev/github_jira_sync.py: request.add_header('Authorization', 'token %s' % GITHUB_OAUTH_KEY)
dev/merge_spark_pr.py:GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY")
dev/merge_spark_pr.py: if GITHUB_OAUTH_KEY:
dev/merge_spark_pr.py: request.add_header('Authorization', 'token %s' % GITHUB_OAUTH_KEY)
dev/run-tests-jenkins.py: github_oauth_key = os.environ["GITHUB_OAUTH_KEY"]
```
Spark uses `GITHUB_OAUTH_KEY` for GitHub token, but `translate-contributors.py` script alone uses `GITHUB_API_TOKEN`. We should better match to make it easier to run the script
### Does this PR introduce _any_ user-facing change?
No, it's dev-only.
### How was this patch tested?
I manually tested by running this script.
Closes#31662 from HyukjinKwon/minor-gh-token-name.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to make the scripts working by:
- Recovering credit related scripts that were broken from https://github.com/apache/spark/pull/29563
`raw_input` does not exist in `releaseutils` but only in Python 2
- Dropping Python 2 in these scripts because we dropped Python 2 in https://github.com/apache/spark/pull/28957
- Making these scripts workin with Python 3
### Why are the changes needed?
To unblock the release.
### Does this PR introduce _any_ user-facing change?
No, it's dev-only change.
### How was this patch tested?
I manually tested against Spark 3.1.1 RC3.
Closes#31660 from HyukjinKwon/SPARK-34551.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade ZSTD-JNI to 1.4.8-5 for better API compatibility.
### Why are the changes needed?
Previously, we upgrade for ZSTD-JNI performance improvement.
And, `Apache Spark`/`Apache Parquet`/`Apache Avro` master branches are using JZSTD-JNI 1.4.8-x.
This PR aims to upgrade a minor version for a better API compatibility.
- def1860c6f
- 188c803044
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs
Closes#31609 from dongjoon-hyun/SPARK-34496.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade `kubernetes-client` library from 4.12.0 to 4.13.2 for Apache Spark 3.2.0.
### Why are the changes needed?
This will bring [K8s 1.19.1](https://github.com/fabric8io/kubernetes-client/pull/2541) models officially and the latest bug fixes.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.0
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.1
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.2
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass the K8s IT and UT.
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
- Run SparkR on simple dataframe.R example
Run completed in 19 minutes, 25 seconds.
Total number of tests run: 27
Suites: completed 2, aborted 0
Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
Closes#31602 from dongjoon-hyun/SPARK-34486.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade Zstd-JNI library to 1.4.8-4 to bring JNI side optimization.
`ZStandardBenchmark` shows that there is no regression in terms of performance and show some improvements.
### Why are the changes needed?
https://github.com/luben/zstd-jni/commits/v1.4.8-4
- be9be47fae
- be51ebade1
- 44ff8b6f95
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes#31585 from dongjoon-hyun/SPARK-ZSTD-1.4.8-4.
Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Improving the documentation and release process by pinning Jekyll version by Gemfile and Bundler.
Some files and their responsibilities within this PR:
- `docs/.bundle/config` is used to specify a directory "docs/.local_ruby_bundle" which will be used as destination to install the ruby packages into instead of the global one which requires root access
- `docs/Gemfile` is specifying the required Jekyll version and other top level gem versions
- `docs/Gemfile.lock` is generated by the "bundle install". This file contains the exact resolved versions of all the gems including the top level gems and all the direct and transitive dependencies of those gems. When this file is generated it contains a platform related section "PLATFORMS" (in my case after the generation it was "universal-darwin-19"). Still this file must be under version control as when the version of a gem does not fit to the one specified in `Gemfile` an error comes (i.e. if the `Gemfile.lock` was generated for Jekyll 4.1.0 and its version is updated in the `Gemfile` to 4.2.0 then it triggers the error: "The bundle currently has jekyll locked at 4.1.0."). This is solution is also suggested officially in [its documentation](https://bundler.io/rationale.html#checking-your-code-into-version-control). To get rid of the specific platform (like "universal-darwin-19") first we have to add "ruby" as platform [which means this should work on every platform where Ruby runs](https://guides.rubygems.org/what-is-a-gem/)) by running "bundle lock --add-platform ruby" then the specific platform can be removed by "bundle lock --remove-platform universal-darwin-19".
After this the correct process to update Jekyll version is the following:
1. update the version in `Gemfile`
2. run "bundle update" which updates the `Gemfile.lock`
3. commit both files
This process for version update is tested for details please check the testing section.
### Why are the changes needed?
Using different Jekyll versions can generate different output documents.
This PR standardize the process.
### Does this PR introduce _any_ user-facing change?
No, assuming the release was done via docker by using `do-release-docker.sh`.
In that case there should be no difference at all as the same Jekyll version is specified in the Gemfile.
### How was this patch tested?
#### Testing document generation
Doc generation step was triggered via the docker release:
```
$ ./do-release-docker.sh -d ~/working -n -s docs
...
========================
= Building documentation...
Command: /opt/spark-rm/release-build.sh docs
Log file: docs.log
Skipping publish step.
```
The docs.log contains the followings:
```
Building Spark docs
Fetching gem metadata from https://rubygems.org/.........
Using bundler 2.2.9
Fetching rb-fsevent 0.10.4
Fetching forwardable-extended 2.6.0
Fetching public_suffix 4.0.6
Fetching colorator 1.1.0
Fetching eventmachine 1.2.7
Fetching http_parser.rb 0.6.0
Fetching ffi 1.14.2
Fetching concurrent-ruby 1.1.8
Installing colorator 1.1.0
Installing forwardable-extended 2.6.0
Installing rb-fsevent 0.10.4
Installing public_suffix 4.0.6
Installing http_parser.rb 0.6.0 with native extensions
Installing eventmachine 1.2.7 with native extensions
Installing concurrent-ruby 1.1.8
Fetching rexml 3.2.4
Fetching liquid 4.0.3
Installing ffi 1.14.2 with native extensions
Installing rexml 3.2.4
Installing liquid 4.0.3
Fetching mercenary 0.4.0
Installing mercenary 0.4.0
Fetching rouge 3.26.0
Installing rouge 3.26.0
Fetching safe_yaml 1.0.5
Installing safe_yaml 1.0.5
Fetching unicode-display_width 1.7.0
Installing unicode-display_width 1.7.0
Fetching webrick 1.7.0
Installing webrick 1.7.0
Fetching pathutil 0.16.2
Fetching kramdown 2.3.0
Fetching terminal-table 2.0.0
Fetching addressable 2.7.0
Fetching i18n 1.8.9
Installing terminal-table 2.0.0
Installing pathutil 0.16.2
Installing i18n 1.8.9
Installing addressable 2.7.0
Installing kramdown 2.3.0
Fetching kramdown-parser-gfm 1.1.0
Installing kramdown-parser-gfm 1.1.0
Fetching rb-inotify 0.10.1
Fetching sassc 2.4.0
Fetching em-websocket 0.5.2
Installing rb-inotify 0.10.1
Installing em-websocket 0.5.2
Installing sassc 2.4.0 with native extensions
Fetching listen 3.4.1
Installing listen 3.4.1
Fetching jekyll-watch 2.2.1
Installing jekyll-watch 2.2.1
Fetching jekyll-sass-converter 2.1.0
Installing jekyll-sass-converter 2.1.0
Fetching jekyll 4.2.0
Installing jekyll 4.2.0
Fetching jekyll-redirect-from 0.16.0
Installing jekyll-redirect-from 0.16.0
Bundle complete! 4 Gemfile dependencies, 30 gems now installed.
Bundled gems are installed into `./.local_ruby_bundle`
```
#### Testing Jekyll (or other gem) update
First locally I reverted Jekyll to 4.1.0:
```
$ rm Gemfile.lock
$ rm -rf .local_ruby_bundle
# edited Gemfile to use version 4.1.0
$ cat Gemfile
source "https://rubygems.org"
gem "jekyll", "4.1.0"
gem "rouge", "3.26.0"
gem "jekyll-redirect-from", "0.16.0"
gem "webrick", "1.7"
$ bundle install
...
```
Testing Jekyll version before the update:
```
$ bundle exec jekyll --version
jekyll 4.1.0
```
Imitating Jekyll update coming from git by reverting my local changes:
```
$ git checkout Gemfile
Updated 1 path from the index
$ cat Gemfile
source "https://rubygems.org"
gem "jekyll", "4.2.0"
gem "rouge", "3.26.0"
gem "jekyll-redirect-from", "0.16.0"
gem "webrick", "1.7"
$ git checkout Gemfile.lock
Updated 1 path from the index
```
Run the install:
```
$ bundle install
...
```
Checking the updated Jekyll version:
```
$ bundle exec jekyll --version
jekyll 4.2.0
```
Closes#31559 from attilapiros/pin-jekyll-version.
Lead-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to add a note about pip installation test in RC for release vote template.
### Why are the changes needed?
To promote PySpark users to test PyPi distribution and pip installation.
### Does this PR introduce _any_ user-facing change?
No. It will be used for release vote.
### How was this patch tested?
N/A
Closes#31527 from HyukjinKwon/minor-update-vote-templ.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade zstd-jni to 1.4.8-3.
### Why are the changes needed?
This will bring the latest improvements and bug fixes.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs with the existing tests.
Closes#31430 from williamhyun/zstd-148.
Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This changeset is published into the public domain.
### What changes were proposed in this pull request?
Some typos and syntax issues in docstrings and the output of `dev/lint-python` have been fixed.
### Why are the changes needed?
In some places, the documentation did not refer to parameters or classes by the full and correct name, potentially causing uncertainty in the reader or rendering issues in Sphinx. Also, a typo in the standard output of `dev/lint-python` was fixed.
### Does this PR introduce _any_ user-facing change?
Slight improvements in documentation, and in standard output of `dev/lint-python`.
### How was this patch tested?
Manual testing and `dev/lint-python` run. No new Sphinx warnings arise due to this change.
Closes#31401 from DavidToneian/SPARK-34300.
Authored-by: David Toneian <david@toneian.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This is a follow-up of #31311 and fixes a typo in Scala 2.13 profile section in `publish-snapshot` command.
### Why are the changes needed?
To fix snapshot publishing for Scala 2.13.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual.
Closes#31338 from dongjoon-hyun/SPARK-34218-2.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to add `Scala 2.13` packaging and publishing.
### Why are the changes needed?
To support Scala 2.13 officially in Apache Spark 3.2.0, we need to publish the artifacts.
### Does this PR introduce _any_ user-facing change?
Yes, this will provide additional artifacts.
### How was this patch tested?
Manual.
Closes#31311 from dongjoon-hyun/SPARK-34218.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to fix the Scala 2.12 release profile in `release-build.sh`.
### Why are the changes needed?
Since 3.0.0 (SPARK-26132), the release script is using `SCALA_2_11_PROFILES` to publish Scala 2.12 artifacts.
After looking at the code, this is not a blocker because `-Pscala-2.11` is no-op in `branch-3.x`.
In addition `scala-2.12` profile is enabled by default and it's an empty profile without any configuration technically.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This is used by release manager only.
Manually. This should land at `master/3.1/3.0`.
Closes#31310 from dongjoon-hyun/SPARK-34217.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade Apache ORC from 1.6.6 to 1.6.7.
### Why are the changes needed?
Apache ORC 1.6.7 has the following fixes including [ORC-711 Support CryptoExtension in create/decryptLocalKey](https://issues.apache.org/jira/browse/ORC-711).
- https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320&version=12349470
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs with the existing tests.
Closes#31301 from dongjoon-hyun/SPARK-34208.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Update Avro dependency to version 1.10.1
### Why are the changes needed?
To catch up multiple improvements of Avro as well as fix security issues on transitive dependencies.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Since there were no API changes required we just run the tests
Closes#31232 from iemejia/SPARK-27733-avro-upgrade.
Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR is the 3rd try to upgrade Scala 2.12.x in order to see the feasibility.
- https://github.com/apache/spark/pull/27929 (Upgrade Scala to 2.12.11, wangyum )
- https://github.com/apache/spark/pull/30940 (Upgrade Scala to 2.12.12, viirya )
`silencer` library is updated accordingly. And, Kafka version upgrade is required because it fails like the following.
```
[info] KafkaDataConsumerSuite:
[info] org.apache.spark.streaming.kafka010.KafkaDataConsumerSuite *** ABORTED *** (1 second, 580 milliseconds)
[info] java.lang.NoClassDefFoundError: scala/math/Ordering$$anon$7
[info] at kafka.api.ApiVersion$.orderingByVersion(ApiVersion.scala:45)
```
### Why are the changes needed?
Apache Spark was stuck to 2.12.10 due to the regression in Scala 2.12.11 and 2.12.12. This will bring all the bug fixes.
- https://github.com/scala/scala/releases/tag/v2.12.13
- https://github.com/scala/scala/releases/tag/v2.12.12
- https://github.com/scala/scala/releases/tag/v2.12.11
### Does this PR introduce _any_ user-facing change?
Yes, but this is a bug-fixed version.
### How was this patch tested?
Pass the CIs.
Closes#31223 from dongjoon-hyun/SPARK-31168.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Hive 2.3.8 changes:
HIVE-19662: Upgrade Avro to 1.8.2
HIVE-24324: Remove deprecated API usage from Avro
HIVE-23980: Shade Guava from hive-exec in Hive 2.3
HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
HIVE-24512: Exclude calcite in packaging.
HIVE-22708: Fix for HttpTransport to replace String.equals
HIVE-24551: Hive should include transitive dependencies from calcite after shading it
HIVE-24553: Exclude calcite from test-jar dependency of hive-exec
### Why are the changes needed?
Upgrade Avro and Parquet to latest version.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: https://github.com/apache/spark/pull/30517Closes#30657 from wangyum/SPARK-33696.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR upgrade Zookeeper to 3.6.2.
### Why are the changes needed?
To make Spark running on jdk 14, otherwise:
```
21/01/13 20:25:32,533 WARN [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] zookeeper.ClientCnxn:1164 : Session 0x0 for server apache-spark-zk-3.vip.hadoop.com/<unresolved>:2181, unexpected error, closing socket connection and attempting reconnect
java.lang.IllegalArgumentException: Unable to canonicalize address apache-spark-zk-3.vip.hadoop.com/<unresolved>:2181 because it's not resolvable
at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
```
Please see [ZOOKEEPER-3779](https://issues.apache.org/jira/browse/ZOOKEEPER-3779) for more details.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test:
1. Replace zookeeper-3.4.14.jar with zookeeper-3.6.2.jar and zookeeper-jute-3.6.2.jar
2. Run Spark on jdk 14. Hadoop 2.7 with HADOOP-12760, Hive 1.2.1 and Zookeeper server version is 3.4.6.
Some key configurations:
```
# spark-defaults.conf
spark.yarn.appMasterEnv.JAVA_HOME /apache/releases/jdk-14.0.2
spark.executorEnv.JAVA_HOME /apache/releases/jdk-14.0.2
# spark-env.sh
export JAVA_HOME=/apache/releases/jdk-14.0.2
```
Jenkins Tests
- Hadoop 3.2: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134048/testReport
- Hadoop 2.7: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134063/testReportCloses#31177 from wangyum/SPARK-34110.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>