Commit graph

290 commits

Author SHA1 Message Date
Dongjoon Hyun 425dc58c02 [SPARK-35125][K8S] Upgrade K8s client to 5.3.0 to support K8s 1.20
### What changes were proposed in this pull request?

Although AS-IS master branch already works with K8s 1.20, this PR aims to upgrade K8s client to 5.3.0 to support K8s 1.20 officially.
- https://github.com/fabric8io/kubernetes-client#compatibility-matrix

The following are the notable breaking API changes.

1. Remove Doneable (5.0+):
    - https://github.com/fabric8io/kubernetes-client/pull/2571
2. Change Watcher.onClose signature (5.0+):
    - https://github.com/fabric8io/kubernetes-client/pull/2616
3. Change Readiness (5.1+)
    - https://github.com/fabric8io/kubernetes-client/pull/2796

### Why are the changes needed?

According to the compatibility matrix, this makes Apache Spark and its external cluster manager extension support all K8s 1.20 features officially for Apache Spark 3.2.0.

### Does this PR introduce _any_ user-facing change?

Yes, this is a dev dependency change which affects K8s cluster extension users.

### How was this patch tested?

Pass the CIs.

This is manually tested with K8s IT.
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
- Run SparkR on simple dataframe.R example
Run completed in 17 minutes, 44 seconds.
Total number of tests run: 27
Suites: completed 2, aborted 0
Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #32221 from dongjoon-hyun/SPARK-K8S-530.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-04-19 07:39:38 -07:00
Yuming Wang cbffc12f90 [SPARK-34542][BUILD] Upgrade Parquet to 1.12.0
### What changes were proposed in this pull request?

Parquet 1.12.0 New Feature
- PARQUET-41 - Add bloom filters to parquet statistics
- PARQUET-1373 - Encryption key management tools
- PARQUET-1396 - Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory
- PARQUET-1622 - Add BYTE_STREAM_SPLIT encoding
- PARQUET-1784 - Column-wise configuration
- PARQUET-1817 - Crypto Properties Factory
- PARQUET-1854 - Properties-Driven Interface to Parquet Encryption

Parquet 1.12.0 release notes:
https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0/CHANGES.md

### Why are the changes needed?

- Bloom filters to improve filter performance
- ZSTD enhancement

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit test.

Closes #31649 from wangyum/SPARK-34542.

Lead-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: Yuming Wang <yumwang@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-03-27 07:56:29 -07:00
Ismaël Mejía 8a552bfc76 [SPARK-34778][BUILD] Upgrade to Avro 1.10.2
### What changes were proposed in this pull request?
Update the  Avro version to 1.10.2

### Why are the changes needed?
To stay up to date with upstream and catch compatibility issues with zstd

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes #31866 from iemejia/SPARK-27733-upgrade-avro-1.10.2.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
2021-03-22 19:30:14 +08:00
Zhang, Xingchao 2888d1883e [SPARK-34784][BUILD] Upgrade Jackson to 2.12.2
### What changes were proposed in this pull request?

This pr upgrade Jackson to 2.12.2.
Jackson Release 2.12: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12

### Why are the changes needed?

Make it easy to upgrade Avro 1.10.2.

```
[error] Caused by: sbt.ForkMain$ForkError: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.11.4 requires Jackson Databind version >= 2.11.0 and < 2.12.0
[error] 	at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
[error] 	at com.fasterxml.jackson.module.scala.JacksonModule.setupModule$(JacksonModule.scala:46)
[error] 	at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:17)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested with Avro 1.10.2 and Parquet 1.12.0: https://github.com/apache/spark/runs/2157735537

Closes #31878 from xclyfe/SPARK-34784.

Authored-by: Zhang, Xingchao <xingczhang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
2021-03-21 15:36:38 +08:00
wankunde 60e324aa9f [SPARK-34688][PYTHON] Upgrade to Py4J 0.10.9.2
### What changes were proposed in this pull request?
This PR upgrade Py4J from 0.10.9.1 to 0.10.9.2 that contains some bug fixes and improvements.

* expose shell parameter in Popen inside launch_gateway. ([bartdag/py4j220efc3](220efc3716))
* fixed Flake8 errors ([bartdag/py4j6c6ee9a](6c6ee9aedc))

### Why are the changes needed?
To leverage fixes from the upstream in Py4J.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Jenkins build and GitHub Actions will test it out.

Closes #31796 from wankunde/py4j.

Authored-by: wankunde <wankunde@163.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-03-11 09:51:41 -06:00
Dongjoon Hyun ba7e525a11 [SPARK-34670][BUILD] Upgrade ZSTD-JNI to 1.4.9-1
### What changes were proposed in this pull request?

This PR aims to upgrade ZSTD-JNI to 1.4.9-1.

### Why are the changes needed?

ZStandard 1.4.9 and its corresponding JNI brings the following bug fixes and improvements.
- https://github.com/facebook/zstd/releases/tag/v1.4.9

One of notable improvement of ZStandard 1.4.9 is `2x faster Long Distance Mode`, but we are not using it yet.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the existing tests and there is no regression in ZStandardBenchmark.

Closes #31784 from dongjoon-hyun/ZSTD-149.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-03-08 22:40:49 -08:00
Dongjoon Hyun 1f6089b165 [SPARK-34647][CORE] Use ZSTD JNI NoFinalizer classes and bump to 1.4.8-7
### What changes were proposed in this pull request?

This PR aims to use `ZstdInputStreamNoFinalizer` and `ZstdOutputStreamNoFinalizer` classes and upgrade ZSTD JNI to 1.4.8-7.

### Why are the changes needed?

`1.4.8-7` makes `NoFinalizer` classes public again. This improves the performance.
- 57d53a09d2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #31762 from dongjoon-hyun/SPARK-ZSTD-NOFINALIZER.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-03-06 10:32:27 -08:00
Chao Sun f494c5cff9 [SPARK-33212][FOLLOWUP] Add hadoop-yarn-server-web-proxy for Hadoop 3.x profile
### What changes were proposed in this pull request?

This adds `hadoop-yarn-server-web-proxy` as dependency for Yarn and Hadoop 3.x profile (it is already a dependency for 2.x). Also excludes some dependencies from the module which are already covered by other Hadoop jars used by Spark.

### Why are the changes needed?

The class `org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter` is used by `ApplicationMaster`:
```scala
  private def addAmIpFilter(driver: Option[RpcEndpointRef], proxyBase: String) = {
    val amFilter = "org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter"
    val params = client.getAmIpFilterParams(yarnConf, proxyBase)
    driver match {
      case Some(d) =>
        d.send(AddWebUIFilter(amFilter, params, proxyBase))
   ...
```
and will be loaded at runtime. Therefore, without the above jar Spark Yarn app will fail with `ClassNotFoundError`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests. Also tested manually and it worked with the fix, while was failing previously.

Closes #31642 from sunchao/SPARK-33212-followup-2.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-02-28 16:37:49 -08:00
Dongjoon Hyun 1aeafb4852 [SPARK-34559][BUILD] Upgrade to ZSTD JNI 1.4.8-6
### What changes were proposed in this pull request?

This PR aims to upgrade ZSTD JNI to 1.4.8-6.

### Why are the changes needed?

This fixes the following issue and will unblock SPARK-34479 (Support ZSTD at Avro data source).
- https://github.com/luben/zstd-jni/issues/161

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #31674 from dongjoon-hyun/SPARK-34559.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-02-27 03:04:12 -08:00
Dongjoon Hyun 0bccf1664f [SPARK-34496][BUILD] Upgrade ZSTD-JNI to 1.4.8-5 for better API compatibility
### What changes were proposed in this pull request?

This PR aims to upgrade ZSTD-JNI to 1.4.8-5 for better API compatibility.

### Why are the changes needed?

Previously, we upgrade for ZSTD-JNI performance improvement.
And, `Apache Spark`/`Apache Parquet`/`Apache Avro` master branches are using JZSTD-JNI 1.4.8-x.

This PR aims to upgrade a minor version for a better API compatibility.
- def1860c6f
- 188c803044

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs

Closes #31609 from dongjoon-hyun/SPARK-34496.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-02-22 22:27:39 +09:00
Dongjoon Hyun 020e84e92f [SPARK-34486][K8S] Upgrade kubernetes-client to 4.13.2
### What changes were proposed in this pull request?

This PR aims to upgrade `kubernetes-client` library from 4.12.0 to 4.13.2 for Apache Spark 3.2.0.

### Why are the changes needed?

This will bring [K8s 1.19.1](https://github.com/fabric8io/kubernetes-client/pull/2541) models officially and the latest bug fixes.

- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.0
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.1
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.2

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the K8s IT and UT.

```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
- Run SparkR on simple dataframe.R example
Run completed in 19 minutes, 25 seconds.
Total number of tests run: 27
Suites: completed 2, aborted 0
Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #31602 from dongjoon-hyun/SPARK-34486.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-02-21 18:35:38 +09:00
Dongjoon Hyun 331c6fd4ef [SPARK-34467][BUILD] Upgrade Zstd-jni to 1.4.8-4
### What changes were proposed in this pull request?

This PR aims to upgrade Zstd-JNI library to 1.4.8-4 to bring JNI side optimization.
`ZStandardBenchmark` shows that there is no regression in terms of performance and show some improvements.

### Why are the changes needed?

https://github.com/luben/zstd-jni/commits/v1.4.8-4
- be9be47fae
- be51ebade1
- 44ff8b6f95

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #31585 from dongjoon-hyun/SPARK-ZSTD-1.4.8-4.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-02-18 13:35:49 -08:00
Dongjoon Hyun 329f945534 [SPARK-34391][BUILD] Upgrade commons-io to 2.8.0
### What changes were proposed in this pull request?

This PR aims to upgrade `commons-io` from 2.5 to 2.8.0 for Apache Spark 3.2.0.

### Why are the changes needed?

`2.5` was released on 2016-04-22. This will bring the latest bug fixes.
- [2020-09-05: 2.8.0](https://commons.apache.org/proper/commons-io/changes-report.html#a2.8.0)
- [2020-05-24: 2.7](https://commons.apache.org/proper/commons-io/changes-report.html#a2.7)
- [2017-10-15: 2.6](https://commons.apache.org/proper/commons-io/changes-report.html#a2.6)

### Does this PR introduce _any_ user-facing change?

Yes, but this is a compatible dependency change.

### How was this patch tested?

Pass the CIs.

Closes #31503 from dongjoon-hyun/SPARK-34391.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-02-07 21:53:42 -08:00
William Hyun 5acc5b8f1e [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3
### What changes were proposed in this pull request?

This PR aims to upgrade zstd-jni to 1.4.8-3.

### Why are the changes needed?

This will bring the latest improvements and bug fixes.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the CIs with the existing tests.

Closes #31430 from williamhyun/zstd-148.

Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-02-02 00:39:05 -08:00
Yuming Wang a7683afdf4 [SPARK-26346][BUILD][SQL] Upgrade Parquet to 1.11.1
### What changes were proposed in this pull request?

This PR upgrade Parquet to 1.11.1.

Parquet 1.11.1 new features:

- [PARQUET-1201](https://issues.apache.org/jira/browse/PARQUET-1201) - Column indexes
- [PARQUET-1253](https://issues.apache.org/jira/browse/PARQUET-1253) - Support for new logical type representation
- [PARQUET-1388](https://issues.apache.org/jira/browse/PARQUET-1388) - Nanosecond precision time and timestamp - parquet-mr

More details:
https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.1/CHANGES.md

### Why are the changes needed?
Support column indexes to improve query performance.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing test.

Closes #26804 from wangyum/SPARK-26346.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
2021-01-29 08:07:49 +08:00
HyukjinKwon 1217c8b418 Revert "[SPARK-31168][SPARK-33913][BUILD] Upgrade Scala to 2.12.13 and Kafka to 2.7.0"
This reverts commit a65e86a65e.
2021-01-27 17:03:15 +09:00
Dongjoon Hyun d5d1c84bf4 [SPARK-34208][BUILD] Upgrade ORC to 1.6.7
### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC from 1.6.6 to 1.6.7.

### Why are the changes needed?

Apache ORC 1.6.7 has the following fixes including [ORC-711 Support CryptoExtension in create/decryptLocalKey](https://issues.apache.org/jira/browse/ORC-711).
- https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320&version=12349470

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the existing tests.

Closes #31301 from dongjoon-hyun/SPARK-34208.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-22 17:06:18 -08:00
Ismaël Mejía e9e81f798f [SPARK-27733][CORE] Upgrade Avro to version 1.10.1
### What changes were proposed in this pull request?

Update Avro dependency to version 1.10.1

### Why are the changes needed?

To catch up multiple improvements of Avro as well as fix security issues on transitive dependencies.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Since there were no API changes required we just run the tests

Closes #31232 from iemejia/SPARK-27733-avro-upgrade.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-20 15:42:27 -08:00
CodingCat 7f3e952c23 [SPARK-33940][BUILD] Upgrade univocity to 2.9.1
### What changes were proposed in this pull request?

upgrade univocity

### Why are the changes needed?

csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0,

when we initialize a writer e09114c687/src/main/java/com/univocity/parsers/common/AbstractWriter.java (L211), it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (e09114c687/src/main/java/com/univocity/parsers/common/NormalizedString.java (L205-L209))

in that stringCache.get, it has a maxStringLength cap e09114c687/src/main/java/com/univocity/parsers/common/StringCache.java (L104) which is 1024 by default

more details at https://github.com/apache/spark/pull/30972 and https://github.com/uniVocity/univocity-parsers/issues/438

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
existing UT

Closes #31246 from CodingCat/upgrade_univocity.

Authored-by: CodingCat <zhunansjtu@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-01-20 11:40:37 +09:00
Dongjoon Hyun a65e86a65e [SPARK-31168][SPARK-33913][BUILD] Upgrade Scala to 2.12.13 and Kafka to 2.7.0
### What changes were proposed in this pull request?

This PR is the 3rd try to upgrade Scala 2.12.x in order to see the feasibility.
- https://github.com/apache/spark/pull/27929 (Upgrade Scala to 2.12.11, wangyum )
- https://github.com/apache/spark/pull/30940 (Upgrade Scala to 2.12.12, viirya )

`silencer` library is updated accordingly. And, Kafka version upgrade is required because it fails like the following.
```
[info] KafkaDataConsumerSuite:
[info] org.apache.spark.streaming.kafka010.KafkaDataConsumerSuite *** ABORTED *** (1 second, 580 milliseconds)
[info]   java.lang.NoClassDefFoundError: scala/math/Ordering$$anon$7
[info]   at kafka.api.ApiVersion$.orderingByVersion(ApiVersion.scala:45)
```

### Why are the changes needed?

Apache Spark was stuck to 2.12.10 due to the regression in Scala 2.12.11 and 2.12.12. This will bring all the bug fixes.
- https://github.com/scala/scala/releases/tag/v2.12.13
- https://github.com/scala/scala/releases/tag/v2.12.12
- https://github.com/scala/scala/releases/tag/v2.12.11

### Does this PR introduce _any_ user-facing change?

Yes, but this is a bug-fixed version.

### How was this patch tested?

Pass the CIs.

Closes #31223 from dongjoon-hyun/SPARK-31168.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-18 13:45:06 -08:00
Yuming Wang c87b0085c9 [SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8
### What changes were proposed in this pull request?

Hive 2.3.8 changes:
HIVE-19662: Upgrade Avro to 1.8.2
HIVE-24324: Remove deprecated API usage from Avro
HIVE-23980: Shade Guava from hive-exec in Hive 2.3
HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
HIVE-24512: Exclude calcite in packaging.
HIVE-22708: Fix for HttpTransport to replace String.equals
HIVE-24551: Hive should include transitive dependencies from calcite after shading it
HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

### Why are the changes needed?

Upgrade Avro and Parquet to latest version.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: https://github.com/apache/spark/pull/30517

Closes #30657 from wangyum/SPARK-33696.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-17 21:54:35 -08:00
Yuming Wang d6906b3b76 [SPARK-34110][BUILD] Upgrade Zookeeper to 3.6.2
### What changes were proposed in this pull request?

This PR upgrade Zookeeper to 3.6.2.

### Why are the changes needed?

To make Spark running on jdk 14, otherwise:
```
21/01/13 20:25:32,533 WARN [Driver-SendThread(apache-spark-zk-3.vip.hadoop.com:2181)] zookeeper.ClientCnxn:1164 : Session 0x0 for server apache-spark-zk-3.vip.hadoop.com/<unresolved>:2181, unexpected error, closing socket connection and attempting reconnect
java.lang.IllegalArgumentException: Unable to canonicalize address apache-spark-zk-3.vip.hadoop.com/<unresolved>:2181 because it's not resolvable
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65)
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
```

Please see [ZOOKEEPER-3779](https://issues.apache.org/jira/browse/ZOOKEEPER-3779) for more details.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test:
1. Replace zookeeper-3.4.14.jar with zookeeper-3.6.2.jar and zookeeper-jute-3.6.2.jar
2. Run Spark on jdk 14. Hadoop 2.7 with HADOOP-12760, Hive 1.2.1 and Zookeeper server version is 3.4.6.
    Some key configurations:
    ```
    # spark-defaults.conf
    spark.yarn.appMasterEnv.JAVA_HOME              /apache/releases/jdk-14.0.2
    spark.executorEnv.JAVA_HOME                    /apache/releases/jdk-14.0.2
    # spark-env.sh
    export JAVA_HOME=/apache/releases/jdk-14.0.2
    ```

Jenkins Tests
- Hadoop 3.2: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134048/testReport
- Hadoop 2.7: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134063/testReport

Closes #31177 from wangyum/SPARK-34110.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-15 21:12:41 -08:00
Chao Sun b6f46ca297 [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
### What changes were proposed in this pull request?

This:
1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x.
2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2

Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client.

In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties:

```
hadoop-client-api.artifact
hadoop-client-runtime.artifact
hadoop-client-minicluster.artifact
```

which default to:
```
hadoop-client-api
hadoop-client-runtime
hadoop-client-minicluster
```
but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`.

Besides above, there are the following changes:
- explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars.
- removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API.
- modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests).

### Why are the changes needed?

Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts.

### Does this PR introduce _any_ user-facing change?

When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts.

### How was this patch tested?

Relying on existing tests.

Closes #30701 from sunchao/test-hadoop-3.2.2.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-15 14:06:50 -08:00
Kousuke Saruta b1c4fc7fc7 [SPARK-34008][BUILD] Upgrade derby to 10.14.2.0
### What changes were proposed in this pull request?

This PR upgrades `derby` to `10.14.2.0`.

You can check the major changes from the following URLs.

* 10.13.1.1 http://svn.apache.org/repos/asf/db/derby/code/tags/10.13.1.1/RELEASE-NOTES.html
* 10.14.1.0 http://svn.apache.org/repos/asf/db/derby/code/tags/10.14.1.0/RELEASE-NOTES.html
* 10.14.2.0 http://svn.apache.org/repos/asf/db/derby/code/tags/10.14.2.0/RELEASE-NOTES.html

### Why are the changes needed?

It seems to be the final release which supports `JDK8` as the minimum required version.
After `10.15.1.3`, the minimum required version is `JDK9`.
https://db.apache.org/derby/

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #31032 from sarutak/upgrade-derby.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-05 21:50:16 -08:00
HyukjinKwon 6b86aa0b52 [SPARK-33984][PYTHON] Upgrade to Py4J 0.10.9.1
### What changes were proposed in this pull request?

This PR upgrade Py4J from 0.10.9 to 0.10.9.1 that contains some bug fixes and improvements.
It contains one bug fix (4152353ac1).

### Why are the changes needed?

To leverage fixes from the upstream in Py4J.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Jenkins build and GitHub Actions will test it out.

Closes #31009 from HyukjinKwon/SPARK-33984.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-04 10:23:38 -08:00
William Hyun bd346f4a2d [SPARK-33957][BUILD] Update commons-lang3 to 3.11
### What changes were proposed in this pull request?

This PR aims to update commons-lang3 to 3.11 to support Java 16+ better.

### Why are the changes needed?

commons-lang3 has the following bug fixes and Java 16 support.
- https://commons.apache.org/proper/commons-lang/changes-report.html#a3.11

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?
Pass the CIs.

Closes #30990 from williamhyun/Commons-lang3.

Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-01 19:59:17 -08:00
Dongjoon Hyun 00642ee19e
[SPARK-33843][BUILD] Upgrade to Zstd 1.4.8
### What changes were proposed in this pull request?

This PR aims to upgrade Zstd library to 1.4.8.

### Why are the changes needed?

This will bring Zstd 1.4.7 and 1.4.8 improvement and bug fixes and the following from `zstd-jni`.
- https://github.com/facebook/zstd/releases/tag/v1.4.7
- https://github.com/facebook/zstd/releases/tag/v1.4.8
- https://github.com/luben/zstd-jni/issues/153 (Apple M1 architecture)

### Does this PR introduce _any_ user-facing change?

This will unblock Apple Silicon usage.

### How was this patch tested?

Pass the CIs.

Closes #30848 from dongjoon-hyun/SPARK-33843.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-19 06:59:44 -08:00
Kent Yao 4d47ac4b4b [SPARK-33705][SQL][TEST] Fix HiveThriftHttpServerSuite flakiness
### What changes were proposed in this pull request?
TO FIX flaky tests:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132345/testReport/
```
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.JDBC query execution
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.Checks Hive version
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.SPARK-24829 Checks cast as float
```

The root cause here is a jar conflict issue.
`NewCookie.isHttpOnly` is not defined in the `jsr311-api.jar` which conflicts
The transitive artifact `jsr311-api.jar` of `hadoop-client` is excluded at the maven side. See https://issues.apache.org/jira/browse/SPARK-27179.

The Jenkins PR builder and Github Action use `SBT` as the compiler tool.

First, the exclusion rule from maven is not followed by sbt, so I was able to see `jsr311-api.jar` from maven cache to be added to the classpath directly. **This seems to be a  bug of `sbt-pom-reader` plugin but I'm not that sure.**

Then I added an `ExcludeRule` for the `hive-thriftserver` module at the SBT side and did see the `jsr311-api.jar` gone, but the CI jobs still failed with the same error.

I added a trace log in ThriftHttpServlet

```s
ERROR ThriftHttpServlet: !!!!!!!!! Suspect???????? --->
file:/home/jenkins/workspace/SparkPullRequestBuilder/assembly/target/scala-2.12/jars/jsr311-api-1.1.1.jar
```
And the log pointed out that the assembly phase copied it to `assembly/target/scala-2.12/jars/` which will be added to the classpath too. With the help of SBT `dependencyTree` tool, I saw the `jsr311-api` again as a transitive of `jersery-core` from `yarn` module with a `test` scope. So **This seems to be another bug from the SBT side of the `sbt-assembly` plugin.**  It copied a test scope transitive artifact to the assembly output.

In this PR, I defined some rules in SparkBuild.scala to bypass the potential bugs from the SBT side.

First, exclude the `jsr311` from all over the project and then add it back separately to the YARN module for SBT.

Additionally, the HiveThriftServerSuites was reflected for reducing flakiness too, but not related to the bugs I have found so far.

### Why are the changes needed?

fix test here

### Does this PR introduce _any_ user-facing change?

NO
### How was this patch tested?

passing jenkins and ga

Closes #30643 from yaooqinn/HiveThriftHttpServerSuite.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-12-14 05:14:38 +00:00
Yuming Wang 01b73ae638
[SPARK-33766][BUILD] Upgrade Jackson to 2.11.4
### What changes were proposed in this pull request?

This pr upgrade Jackson to 2.11.4.
Jackson Release 2.11: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11

### Why are the changes needed?

Make it easy to upgrade dependency because Jackson 2.10 is not compatible with 2.11:
```
com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 requires Jackson Databind version >= 2.10.0 and < 2.11.0
```
[Avro](https://issues.apache.org/jira/browse/AVRO-2967) has upgraded Jackson to 2.11.3.
[Parquet](https://issues.apache.org/jira/browse/PARQUET-1895) has upgraded Jackson to 2.11.2.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test.

Closes #30746 from wangyum/SPARK-33766.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-13 14:40:55 -08:00
Nicholas Marion 99848e530f
[SPARK-33762][BUILD] Upgrade commons-codec to 1.15
### What changes were proposed in this pull request?

### Why are the changes needed?

Open Source scans are reporting a potential encoding/decoding issue related to versions of commons-codec prior to 1.13. Commit referenced: 48b615756d

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #30740 from n-marion/SPARK-33762_upgrade-commons-codec.

Authored-by: Nicholas Marion <nmarion@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-13 14:36:54 -08:00
Dongjoon Hyun 1ba1732beb [SPARK-33295][BUILD] Upgrade ORC to 1.6.6
### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.6.6 for Apache Spark 3.2.0.

### Why are the changes needed?

This brings the latest bug fixes and features.
Apache Iceberg is already using Apache ORC 1.6.6.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #30715 from dongjoon-hyun/SPARK-33295.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-10 19:15:01 -08:00
Liang-Chi Hsieh 667f64f447
[SPARK-33725][BUILD] Upgrade snappy-java to 1.1.8.2
### What changes were proposed in this pull request?

This upgrades snappy-java to 1.1.8.2.

### Why are the changes needed?

Minor version upgrade that includes:

- [Fixed](https://github.com/xerial/snappy-java/pull/265) an initialization issue when using a recent Mac OS X version
- Support Apple Silicon (M1, Mac-aarch64)
- Fixed the pure-java Snappy fallback logic when no native library for your platform is found.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test.

Closes #30690 from viirya/upgrade-snappy.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-09 14:26:53 -08:00
Nicholas Marion 3ac70f169d
[SPARK-33695][BUILD] Upgrade to jackson to 2.10.5 and jackson-databind to 2.10.5.1
### What changes were proposed in this pull request?

Upgrade the jackson dependencies to 2.10.5 and jackson-databind to 2.10.5.1

### Why are the changes needed?

Jackson dependency has vulnerability CVE-2020-25649.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests.

Closes #30656 from n-marion/SPARK-33695_upgrade-jackson.

Authored-by: Nicholas Marion <nmarion@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-08 12:11:06 -08:00
Kousuke Saruta d48ef34911
[SPARK-33684][BUILD] Upgrade httpclient from 4.5.6 to 4.5.13
### What changes were proposed in this pull request?

This PR upgrades `commons.httpclient` from `4.5.6` to `4.5.13`.
4.5.6 is released over 2 years ago and now we can use more stable `4.5.13`.
https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt

### Why are the changes needed?

To follow the more stable release.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Should be done by the existing tests.

Closes #30634 from sarutak/upgrade-httpclient.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-06 23:02:36 -08:00
Dongjoon Hyun 290aa02179 [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work
### What changes were proposed in this pull request?

This reverts commit SPARK-33212 (cb3fa6c936) mostly with three exceptions:
1. `SparkSubmitUtils` was updated recently by SPARK-33580
2. `resource-managers/yarn/pom.xml` was updated recently by SPARK-33104 to add `hadoop-yarn-server-resourcemanager` test dependency.
3. Adjust `com.fasterxml.jackson.module:jackson-module-jaxb-annotations` dependency in K8s module which is updated recently by SPARK-33471.

### Why are the changes needed?

According to [HADOOP-16080](https://issues.apache.org/jira/browse/HADOOP-16080) since Apache Hadoop 3.1.1, `hadoop-aws` doesn't work with `hadoop-client-api`. It fails at write operation like the following.

**1. Spark distribution with `-Phadoop-cloud`**

```scala
$ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY
20/11/30 23:01:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context available as 'sc' (master = local[*], app id = local-1606806088715).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.parquet("s3a://dongjoon/users.parquet").show
20/11/30 23:01:34 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          null|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+

scala> Seq(1).toDF.write.parquet("s3a://dongjoon/out.parquet")
20/11/30 23:02:14 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)/ 1]
java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
```

**2. Spark distribution without `-Phadoop-cloud`**
```scala
$ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY -c spark.eventLog.enabled=true -c spark.eventLog.dir=s3a://dongjoon/spark-events/ --packages org.apache.hadoop:hadoop-aws:3.2.0,org.apache.hadoop:hadoop-common:3.2.0
...
java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
  at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:772)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI.

Closes #30508 from dongjoon-hyun/SPARK-33212-REVERT.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-12-02 18:23:48 +09:00
Yuming Wang 1de3fc4282 [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2
### What changes were proposed in this pull request?

We supported Hive metastore are 0.12.0 through 3.1.2, but we supported hive-jdbc are 0.12.0 through 2.3.7. It will throw `TProtocolException` if we use hive-jdbc 3.x:

```
[rootspark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet;
Unexpected end of file when reading from HS2 server. The root cause might be too many concurrent connections. Please ask the administrator to check the number of active connections, and adjust hive.server2.thrift.max.worker.threads if applicable.
Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
```
```
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:234)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)
```

This pr upgrade hive-service-rpc to 3.1.2 to fix this issue.

### Why are the changes needed?

To support hive-jdbc 3.x.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test:
```
[rootspark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet;
+---------+
| Result  |
+---------+
+---------+
No rows selected (1.051 seconds)
0: jdbc:hive2://localhost:10000/default> insert into t1 values(1);
+---------+
| Result  |
+---------+
+---------+
No rows selected (2.08 seconds)
0: jdbc:hive2://localhost:10000/default> select * from t1;
+-----+
| id  |
+-----+
| 1   |
+-----+
1 row selected (0.605 seconds)
```

Closes #30478 from wangyum/SPARK-33525.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-25 12:37:59 -08:00
Takeshi Yamamuro 74bd046d17 [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1
### What changes were proposed in this pull request?

This PR intends to upgrade ANTLR runtime from 4.7.1 to 4.8-1.

### Why are the changes needed?

Release note of v4.8 and v4.7.2 (the v4.7.2 release has a few minor bug fixes for java targets):
 - v4.8: https://github.com/antlr/antlr4/releases/tag/4.8
 - v4.7.2: https://github.com/antlr/antlr4/releases/tag/4.7.2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA tests.

Closes #30404 from maropu/UpgradeAntlr.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-11-18 21:20:28 +09:00
Rameshkrishnan Muthusamy 5e8549973d
[SPARK-33471][K8S][BUILD] Upgrade kubernetes-client to 4.12.0
### What changes were proposed in this pull request?

This PR aims to upgrade Kubernetes-client from 4.11.1 to 4.12.0

### Why are the changes needed?

This upgrades the dependency for Apache Spark 3.1.0.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #30401 from ramesh-muthusamy/SPARK-33471-k8s-clientupgrade.

Authored-by: Rameshkrishnan Muthusamy <rameshkrishnan_muthusamy@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-17 13:41:58 -08:00
Chao Sun c2caf2522b [SPARK-33213][BUILD] Upgrade Apache Arrow to 2.0.0
### What changes were proposed in this pull request?

This upgrade Apache Arrow version from 1.0.1 to 2.0.0

### Why are the changes needed?

Apache Arrow 2.0.0 was released with some improvements from Java side, so it's better to upgrade Spark to the new version.
Note that the format version in Arrow 2.0.0 is still 1.0.0 so API should still be compatible between 1.0.1 and 2.0.0.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.

Closes #30306 from sunchao/SPARK-33213.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-11-09 19:07:16 -08:00
Dongjoon Hyun 35ac314181 [SPARK-33405][BUILD] Upgrade commons-compress to 1.20
### What changes were proposed in this pull request?

This PR aims to upgrade `commons-compress` from 1.8 to 1.20.

### Why are the changes needed?

- https://commons.apache.org/proper/commons-compress/security-reports.html

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #30304 from dongjoon-hyun/SPARK-33405.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-11-10 11:08:55 +09:00
huangtianhua 83a80796aa [SPARK-32691][BUILD] Update commons-crypto to v1.1.0
### What changes were proposed in this pull request?
Update the package commons-crypto to v1.1.0 to support aarch64 platform
- https://issues.apache.org/jira/browse/CRYPTO-139

### Why are the changes needed?

The package commons-crypto-1.0.0 available in the Maven repository
doesn't support aarch64 platform. It costs long time in
CryptoRandomFactory.getCryptoRandom(properties).nextBytes(iv) when NettyBlockRpcSever
receive block data from client,  if the time more than the default value 120s, IOException raised and client
will retry replicate the block data to other executors. But in fact the replication is complete,
it makes the replication number incorrect.
This makes DistributedSuite tests pass.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Pass the CIs.

Closes #30275 from huangtianhua/SPARK-32691.

Authored-by: huangtianhua <huangtianhua223@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-09 14:33:27 -08:00
Dongjoon Hyun 27d8136934 [SPARK-33324][K8S][BUILD] Upgrade kubernetes-client to 4.11.1
### What changes were proposed in this pull request?

This PR aims to upgrade `Kubernetes-client` from 4.10.3 to 4.11.1.

### Why are the changes needed?

This upgrades the dependency for Apache Spark 3.1.0.
Since 4.12.0 is still new and has a breaking API changes, this PR chooses the latest compatible one.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the all CIs including K8s IT.

Closes #30233 from dongjoon-hyun/SPARK-33324.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-11-02 22:23:26 -08:00
Chao Sun cb3fa6c936 [SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile
### What changes were proposed in this pull request?

This switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x. For Hadoop 2.7, we'll still use the same modules such as hadoop-client.

In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties:

```
hadoop-client-api.artifact
hadoop-client-runtime.artifact
hadoop-client-minicluster.artifact
```

which default to:
```
hadoop-client-api
hadoop-client-runtime
hadoop-client-minicluster
```
but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`.

Besides above, there are the following changes:
- explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars.
- removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API.
- modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests).

### Why are the changes needed?

This serves two purposes:
- to unblock Spark from upgrading to Hadoop 3.2.2/3.3.0+. Latest Hadoop versions have upgraded to use Guava 27+ and in order to adopt the latest Hadoop versions in Spark, we'll need to resolve the Guava conflicts. This takes the approach by switching to shaded client jars provided by Hadoop.
- avoid pulling 3rd party dependencies from Hadoop and avoid potential future conflicts.

### Does this PR introduce _any_ user-facing change?

When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts.

### How was this patch tested?

Relying on existing tests.

Closes #29843 from sunchao/SPARK-29250.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2020-10-22 03:21:34 +00:00
Takeshi Yamamuro 1b7367ccd7 [SPARK-33205][BUILD] Bump snappy-java version to 1.1.8
### What changes were proposed in this pull request?

This PR intends to upgrade snappy-java from 1.1.7.5 to 1.1.8.

### Why are the changes needed?

For performance improvements; the released `snappy-java` bundles the latest `Snappy` v1.1.8 binaries with small performance improvements.
 - snappy-java release note: https://github.com/xerial/snappy-java/releases/tag/1.1.8
 - snappy release note: https://github.com/google/snappy/releases/tag/1.1.8

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA tests.

Closes #30120 from maropu/Snappy1.1.8.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
2020-10-21 13:04:39 -07:00
Kent Yao 2507301705 [SPARK-33159][SQL] Use hive-service-rpc as dependency instead of inlining the generated code
### What changes were proposed in this pull request?

Hive's `hive-service-rpc` module started since hive-2.1.0 and it contains only the thrift IDL file and the code generated by it.

Removing the inlined code will help maintain and upgrade builtin hive versions

### Why are the changes needed?

to simply the code.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing CI

Closes #30055 from yaooqinn/SPARK-33159.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-10-16 09:37:54 -07:00
Dongjoon Hyun 9896288b88 [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
### What changes were proposed in this pull request?

This PR aims to upgrade ZStandard library for Apache Spark 3.1.0.

### Why are the changes needed?

This will bring the latest bug fixes.
- 2662fbdc32
- bbe140b758

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI.

Closes #30010 from dongjoon-hyun/SPARK-33117.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-12 00:27:53 -07:00
HyukjinKwon 2e07ed3041 [SPARK-33082][SPARK-20202][BUILD][SQL][FOLLOW-UP] Remove Hive 1.2 workarounds and Hive 1.2 profile in Jenkins script
### What changes were proposed in this pull request?

This PR removes the leftover of Hive 1.2 workarounds and Hive 1.2 profile in Jenkins script.

- `test-hive1.2` title is not used anymore in Jenkins
- Remove some comments related to Hive 1.2
- Remove unused codes in `OrcFilters.scala`  Hive
- Test `spark.sql.hive.convertMetastoreOrc` disabled case for the tests added at SPARK-19809 and SPARK-22267

### Why are the changes needed?

To remove unused codes & improve test coverage

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually ran the unit tests. Also It will be tested in CI in this PR.

Closes #29973 from HyukjinKwon/SPARK-33082-SPARK-20202.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-09 03:04:26 -07:00
Dongjoon Hyun aa6657981a [SPARK-33050][BUILD] Upgrade Apache ORC to 1.5.12
### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.5.12.

### Why are the changes needed?

This brings us the latest bug patches like the followings.
- ORC-644 nested struct evolution does not respect to orc.force.positional.evolution
- ORC-667 Positional mapping for nested struct types should not applied by default

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI.

Closes #29930 from dongjoon-hyun/SPARK-ORC-1.5.12.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-02 00:06:03 -07:00
Dongjoon Hyun 9c618b3308 [SPARK-33047][BUILD] Upgrade hive-storage-api to 2.7.2
### What changes were proposed in this pull request?

This PR aims to upgrade Apache Hive `hive-storage-api` library from 2.7.1 to 2.7.2.

### Why are the changes needed?

[storage-api 2.7.2](https://github.com/apache/hive/commits/rel/storage-release-2.7.2/storage-api) has the following extension and can be used when users uses a provided orc dependency.

[HIVE-22959](dade9919d9 (diff-ccfc9dd7584117f531322cda3a29f3c3)) : Extend storage-api to expose FilterContext
[HIVE-23215](361925d2f3 (diff-ccfc9dd7584117f531322cda3a29f3c3)) : Make FilterContext and MutableFilterContext interfaces

### Does this PR introduce _any_ user-facing change?

Yes. This is a dependency change.

### How was this patch tested?

Pass the existing tests.

Closes #29923 from dongjoon-hyun/SPARK-33047.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-01 12:41:40 -07:00
jlafleche d75222dd1b [SPARK-33012][BUILD][K8S] Upgrade fabric8 to 4.10.3
### What changes were proposed in this pull request?

This PR aims to upgrade `kubernetes-client` library to track fabric8's declared compatibility for k8s 1.18.0:
https://github.com/fabric8io/kubernetes-client#compatibility-matrix

### Why are the changes needed?
According to fabric8, 4.9.2 is incompatible with k8s 1.18.0.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Not tested yet.

Closes #29888 from laflechejonathan/jlf/fabric8Ugprade.

Authored-by: jlafleche <jlafleche@palantir.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-09-30 19:00:18 -07:00