[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile

### What changes were proposed in this pull request?

This:
1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x.
2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2

Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client.

In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties:

```
hadoop-client-api.artifact
hadoop-client-runtime.artifact
hadoop-client-minicluster.artifact
```

which default to:
```
hadoop-client-api
hadoop-client-runtime
hadoop-client-minicluster
```
but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`.

Besides above, there are the following changes:
- explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars.
- removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API.
- modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests).

### Why are the changes needed?

Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts.

### Does this PR introduce _any_ user-facing change?

When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts.

### How was this patch tested?

Relying on existing tests.

Closes #30701 from sunchao/test-hadoop-3.2.2.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This commit is contained in:
Chao Sun 2021-01-15 14:06:50 -08:00 committed by Dongjoon Hyun
parent a235c3b254
commit b6f46ca297
18 changed files with 191 additions and 109 deletions

View file

@ -65,7 +65,13 @@
<!-- Provided dependencies -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>

View file

@ -66,7 +66,13 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
@ -177,6 +183,14 @@
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
</dependency>
<dependency>
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
</dependency>
<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>

View file

@ -128,7 +128,7 @@ javassist/3.25.0-GA//javassist-3.25.0-GA.jar
javax.inject/1//javax.inject-1.jar
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
javolution/5.5.1//javolution-5.5.1.jar
jaxb-api/2.2.2//jaxb-api-2.2.2.jar
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
jdo-api/3.0.1//jdo-api-3.0.1.jar
@ -227,7 +227,6 @@ spire-macros_2.12/0.17.0-M1//spire-macros_2.12-0.17.0-M1.jar
spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
stax-api/1.0-2//stax-api-1.0-2.jar
stax-api/1.0.1//stax-api-1.0.1.jar
stream/2.9.6//stream-2.9.6.jar
super-csv/2.2.0//super-csv-2.2.0.jar

View file

@ -3,7 +3,6 @@ JLargeArrays/1.5//JLargeArrays-1.5.jar
JTransforms/3.1//JTransforms-3.1.jar
RoaringBitmap/0.9.0//RoaringBitmap-0.9.0.jar
ST4/4.0.4//ST4-4.0.4.jar
accessors-smart/1.2//accessors-smart-1.2.jar
activation/1.1.1//activation-1.1.1.jar
aircompressor/0.16//aircompressor-0.16.jar
algebra_2.12/2.0.0-M2//algebra_2.12-2.0.0-M2.jar
@ -11,7 +10,6 @@ annotations/17.0.0//annotations-17.0.0.jar
antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
antlr4-runtime/4.8-1//antlr4-runtime-4.8-1.jar
aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
aopalliance/1.0//aopalliance-1.0.jar
arpack_combined_all/0.1//arpack_combined_all-0.1.jar
arrow-format/2.0.0//arrow-format-2.0.0.jar
arrow-memory-core/2.0.0//arrow-memory-core-2.0.0.jar
@ -28,15 +26,12 @@ breeze_2.12/1.0//breeze_2.12-1.0.jar
cats-kernel_2.12/2.0.0-M4//cats-kernel_2.12-2.0.0-M4.jar
chill-java/0.9.5//chill-java-0.9.5.jar
chill_2.12/0.9.5//chill_2.12-0.9.5.jar
commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar
commons-cli/1.2//commons-cli-1.2.jar
commons-codec/1.15//commons-codec-1.15.jar
commons-collections/3.2.2//commons-collections-3.2.2.jar
commons-compiler/3.0.16//commons-compiler-3.0.16.jar
commons-compress/1.20//commons-compress-1.20.jar
commons-configuration2/2.1.1//commons-configuration2-2.1.1.jar
commons-crypto/1.1.0//commons-crypto-1.1.0.jar
commons-daemon/1.0.13//commons-daemon-1.0.13.jar
commons-dbcp/1.4//commons-dbcp-1.4.jar
commons-httpclient/3.1//commons-httpclient-3.1.jar
commons-io/2.5//commons-io-2.5.jar
@ -56,30 +51,13 @@ datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
derby/10.14.2.0//derby-10.14.2.0.jar
dnsjava/2.1.7//dnsjava-2.1.7.jar
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
ehcache/3.3.1//ehcache-3.3.1.jar
flatbuffers-java/1.9.0//flatbuffers-java-1.9.0.jar
generex/1.0.2//generex-1.0.2.jar
geronimo-jcache_1.0_spec/1.0-alpha-1//geronimo-jcache_1.0_spec-1.0-alpha-1.jar
gson/2.2.4//gson-2.2.4.jar
guava/14.0.1//guava-14.0.1.jar
guice-servlet/4.0//guice-servlet-4.0.jar
guice/4.0//guice-4.0.jar
hadoop-annotations/3.2.0//hadoop-annotations-3.2.0.jar
hadoop-auth/3.2.0//hadoop-auth-3.2.0.jar
hadoop-client/3.2.0//hadoop-client-3.2.0.jar
hadoop-common/3.2.0//hadoop-common-3.2.0.jar
hadoop-hdfs-client/3.2.0//hadoop-hdfs-client-3.2.0.jar
hadoop-mapreduce-client-common/3.2.0//hadoop-mapreduce-client-common-3.2.0.jar
hadoop-mapreduce-client-core/3.2.0//hadoop-mapreduce-client-core-3.2.0.jar
hadoop-mapreduce-client-jobclient/3.2.0//hadoop-mapreduce-client-jobclient-3.2.0.jar
hadoop-yarn-api/3.2.0//hadoop-yarn-api-3.2.0.jar
hadoop-yarn-client/3.2.0//hadoop-yarn-client-3.2.0.jar
hadoop-yarn-common/3.2.0//hadoop-yarn-common-3.2.0.jar
hadoop-yarn-registry/3.2.0//hadoop-yarn-registry-3.2.0.jar
hadoop-yarn-server-common/3.2.0//hadoop-yarn-server-common-3.2.0.jar
hadoop-yarn-server-web-proxy/3.2.0//hadoop-yarn-server-web-proxy-3.2.0.jar
hadoop-client-api/3.2.2//hadoop-client-api-3.2.2.jar
hadoop-client-runtime/3.2.2//hadoop-client-runtime-3.2.2.jar
hive-beeline/2.3.7//hive-beeline-2.3.7.jar
hive-cli/2.3.7//hive-cli-2.3.7.jar
hive-common/2.3.7//hive-common-2.3.7.jar
@ -109,8 +87,6 @@ jackson-core/2.11.4//jackson-core-2.11.4.jar
jackson-databind/2.11.4//jackson-databind-2.11.4.jar
jackson-dataformat-yaml/2.11.4//jackson-dataformat-yaml-2.11.4.jar
jackson-datatype-jsr310/2.11.2//jackson-datatype-jsr310-2.11.2.jar
jackson-jaxrs-base/2.9.5//jackson-jaxrs-base-2.9.5.jar
jackson-jaxrs-json-provider/2.9.5//jackson-jaxrs-json-provider-2.9.5.jar
jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations/2.11.4//jackson-module-jaxb-annotations-2.11.4.jar
jackson-module-paranamer/2.11.4//jackson-module-paranamer-2.11.4.jar
@ -124,13 +100,10 @@ jakarta.ws.rs-api/2.1.6//jakarta.ws.rs-api-2.1.6.jar
jakarta.xml.bind-api/2.3.2//jakarta.xml.bind-api-2.3.2.jar
janino/3.0.16//janino-3.0.16.jar
javassist/3.25.0-GA//javassist-3.25.0-GA.jar
javax.inject/1//javax.inject-1.jar
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
javolution/5.5.1//javolution-5.5.1.jar
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
jcip-annotations/1.0-1//jcip-annotations-1.0-1.jar
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
jdo-api/3.0.1//jdo-api-3.0.1.jar
jersey-client/2.30//jersey-client-2.30.jar
@ -144,30 +117,14 @@ jline/2.14.6//jline-2.14.6.jar
joda-time/2.10.5//joda-time-2.10.5.jar
jodd-core/3.5.2//jodd-core-3.5.2.jar
jpam/1.1//jpam-1.1.jar
json-smart/2.3//json-smart-2.3.jar
json/1.8//json-1.8.jar
json4s-ast_2.12/3.7.0-M5//json4s-ast_2.12-3.7.0-M5.jar
json4s-core_2.12/3.7.0-M5//json4s-core_2.12-3.7.0-M5.jar
json4s-jackson_2.12/3.7.0-M5//json4s-jackson_2.12-3.7.0-M5.jar
json4s-scalap_2.12/3.7.0-M5//json4s-scalap_2.12-3.7.0-M5.jar
jsp-api/2.1//jsp-api-2.1.jar
jsr305/3.0.0//jsr305-3.0.0.jar
jta/1.1//jta-1.1.jar
jul-to-slf4j/1.7.30//jul-to-slf4j-1.7.30.jar
kerb-admin/1.0.1//kerb-admin-1.0.1.jar
kerb-client/1.0.1//kerb-client-1.0.1.jar
kerb-common/1.0.1//kerb-common-1.0.1.jar
kerb-core/1.0.1//kerb-core-1.0.1.jar
kerb-crypto/1.0.1//kerb-crypto-1.0.1.jar
kerb-identity/1.0.1//kerb-identity-1.0.1.jar
kerb-server/1.0.1//kerb-server-1.0.1.jar
kerb-simplekdc/1.0.1//kerb-simplekdc-1.0.1.jar
kerb-util/1.0.1//kerb-util-1.0.1.jar
kerby-asn1/1.0.1//kerby-asn1-1.0.1.jar
kerby-config/1.0.1//kerby-config-1.0.1.jar
kerby-pkix/1.0.1//kerby-pkix-1.0.1.jar
kerby-util/1.0.1//kerby-util-1.0.1.jar
kerby-xdr/1.0.1//kerby-xdr-1.0.1.jar
kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
kubernetes-client/4.12.0//kubernetes-client-4.12.0.jar
kubernetes-model-admissionregistration/4.12.0//kubernetes-model-admissionregistration-4.12.0.jar
@ -205,9 +162,7 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar
metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar
minlog/1.3.0//minlog-1.3.0.jar
netty-all/4.1.51.Final//netty-all-4.1.51.Final.jar
nimbus-jose-jwt/4.41.1//nimbus-jose-jwt-4.41.1.jar
objenesis/2.6//objenesis-2.6.jar
okhttp/2.7.5//okhttp-2.7.5.jar
okhttp/3.12.12//okhttp-3.12.12.jar
okio/1.14.0//okio-1.14.0.jar
opencsv/2.3//opencsv-2.3.jar
@ -226,7 +181,6 @@ parquet-jackson/1.10.1//parquet-jackson-1.10.1.jar
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
py4j/0.10.9.1//py4j-0.10.9.1.jar
pyrolite/4.30//pyrolite-4.30.jar
re2j/1.1//re2j-1.1.jar
scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar
scala-compiler/2.12.10//scala-compiler-2.12.10.jar
scala-library/2.12.10//scala-library-2.12.10.jar
@ -244,15 +198,12 @@ spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
stax-api/1.0.1//stax-api-1.0.1.jar
stax2-api/3.1.4//stax2-api-3.1.4.jar
stream/2.9.6//stream-2.9.6.jar
super-csv/2.2.0//super-csv-2.2.0.jar
threeten-extra/1.5.0//threeten-extra-1.5.0.jar
token-provider/1.0.1//token-provider-1.0.1.jar
transaction-api/1.1//transaction-api-1.1.jar
univocity-parsers/2.9.0//univocity-parsers-2.9.0.jar
velocity/1.5//velocity-1.5.jar
woodstox-core/5.0.3//woodstox-core-5.0.3.jar
xbean-asm7-shaded/4.15//xbean-asm7-shaded-4.15.jar
xz/1.5//xz-1.5.jar
zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar

View file

@ -71,9 +71,15 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>

View file

@ -79,6 +79,10 @@
<artifactId>kafka-clients</artifactId>
<version>${kafka.version}</version>
</dependency>
<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-pool2</artifactId>

View file

@ -58,6 +58,11 @@
<artifactId>mockito-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>

View file

@ -91,9 +91,15 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-ipc</artifactId>

View file

@ -58,10 +58,15 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<!--
the AWS module pulls in jackson; its transitive dependencies can create
intra-jackson-module version problems.

View file

@ -81,7 +81,14 @@
<!-- Not needed by the test code, but referenced by SparkSubmit which is used by the tests. -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
</dependencies>

59
pom.xml
View file

@ -120,7 +120,7 @@
<sbt.project.name>spark</sbt.project.name>
<slf4j.version>1.7.30</slf4j.version>
<log4j.version>1.2.17</log4j.version>
<hadoop.version>3.2.0</hadoop.version>
<hadoop.version>3.2.2</hadoop.version>
<protobuf.version>2.5.0</protobuf.version>
<yarn.version>${hadoop.version}</yarn.version>
<zookeeper.version>3.4.14</zookeeper.version>
@ -245,6 +245,15 @@
<parquet.deps.scope>compile</parquet.deps.scope>
<parquet.test.deps.scope>test</parquet.test.deps.scope>
<!--
These default to Hadoop 3.x shaded client/minicluster jars, but are switched to hadoop-client
when the Hadoop profile is hadoop-2.7, because these are only available in 3.x. Note that,
as result we have to include the same hadoop-client dependency multiple times in hadoop-2.7.
-->
<hadoop-client-api.artifact>hadoop-client-api</hadoop-client-api.artifact>
<hadoop-client-runtime.artifact>hadoop-client-runtime</hadoop-client-runtime.artifact>
<hadoop-client-minicluster.artifact>hadoop-client-minicluster</hadoop-client-minicluster.artifact>
<!--
Overridable test home. So that you can call individual pom files directly without
things breaking.
@ -859,6 +868,11 @@
<artifactId>javax.ws.rs-api</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
<version>2.2.11</version>
</dependency>
<dependency>
<groupId>org.scalanlp</groupId>
<artifactId>breeze_${scala.binary.version}</artifactId>
@ -1067,6 +1081,26 @@
<version>${curator.version}</version>
<scope>test</scope>
</dependency>
<!-- Hadoop 3.x dependencies -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-api</artifactId>
<version>${hadoop.version}</version>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-runtime</artifactId>
<version>${hadoop.version}</version>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-minicluster</artifactId>
<version>${yarn.version}</version>
<scope>test</scope>
</dependency>
<!-- End of Hadoop 3.x dependencies -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
@ -1656,6 +1690,14 @@
<groupId>org.apache.ant</groupId>
<artifactId>ant</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
@ -2420,17 +2462,6 @@
</rules>
</configuration>
</execution>
<execution>
<id>enforce-no-duplicate-dependencies</id>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<banDuplicatePomDependencyVersions/>
</rules>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
@ -2901,6 +2932,7 @@
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<createDependencyReducedPom>false</createDependencyReducedPom>
<artifactSet>
<includes>
<include>org.spark-project.spark:unused</include>
@ -3162,6 +3194,9 @@
<hadoop.version>2.7.4</hadoop.version>
<curator.version>2.7.1</curator.version>
<commons-io.version>2.4</commons-io.version>
<hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
<hadoop-client-runtime.artifact>hadoop-client</hadoop-client-runtime.artifact>
<hadoop-client-minicluster.artifact>hadoop-client</hadoop-client-minicluster.artifact>
</properties>
</profile>

View file

@ -63,10 +63,6 @@
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-jaxb-annotations</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
@ -85,11 +81,6 @@
<artifactId>jackson-dataformat-yaml</artifactId>
<version>${fasterxml.jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-jaxb-annotations</artifactId>
<version>${fasterxml.jackson.version}</version>
</dependency>
<!-- Explicitly depend on shaded dependencies from the parent, since shaded deps aren't transitive -->
<dependency>

View file

@ -40,6 +40,42 @@
<spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided>
</properties>
</profile>
<profile>
<id>hadoop-2.7</id>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-web-proxy</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-tests</artifactId>
<classifier>tests</classifier>
<scope>test</scope>
</dependency>
<!--
Hack to exclude org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:tests.
See the parent pom.xml for more details.
-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-resourcemanager</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
</profile>
</profiles>
<dependencies>
@ -69,23 +105,20 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-api</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-web-proxy</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-minicluster.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
<!-- Explicit listing of transitive deps that are shaded. Otherwise, odd compiler crashes. -->
@ -135,18 +168,6 @@
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-tests</artifactId>
<classifier>tests</classifier>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-resourcemanager</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>

View file

@ -19,7 +19,7 @@ package org.apache.spark.deploy.yarn
import java.io.{File, IOException}
import java.lang.reflect.{InvocationTargetException, Modifier}
import java.net.{URI, URL}
import java.net.{URI, URL, URLEncoder}
import java.security.PrivilegedExceptionAction
import java.util.concurrent.{TimeoutException, TimeUnit}
@ -36,7 +36,6 @@ import org.apache.hadoop.yarn.api._
import org.apache.hadoop.yarn.api.records._
import org.apache.hadoop.yarn.conf.YarnConfiguration
import org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException
import org.apache.hadoop.yarn.server.webproxy.ProxyUriUtils
import org.apache.hadoop.yarn.util.{ConverterUtils, Records}
import org.apache.spark._
@ -308,7 +307,8 @@ private[spark] class ApplicationMaster(
// The client-mode AM doesn't listen for incoming connections, so report an invalid port.
registerAM(Utils.localHostName, -1, sparkConf,
sparkConf.getOption("spark.driver.appUIAddress"), appAttemptId)
addAmIpFilter(Some(driverRef), ProxyUriUtils.getPath(appAttemptId.getApplicationId))
val encodedAppId = URLEncoder.encode(appAttemptId.getApplicationId.toString, "UTF-8")
addAmIpFilter(Some(driverRef), s"/proxy/$encodedAppId")
createAllocator(driverRef, sparkConf, clientRpcEnv, appAttemptId, cachedResourcesConf)
reporterThread.join()
} catch {

View file

@ -80,6 +80,16 @@ abstract class BaseYarnClusterSuite
yarnConf.set("yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage",
"100.0")
// capacity-scheduler.xml is missing in hadoop-client-minicluster so this is a workaround
yarnConf.set("yarn.scheduler.capacity.root.queues", "default")
yarnConf.setInt("yarn.scheduler.capacity.root.default.capacity", 100)
yarnConf.setFloat("yarn.scheduler.capacity.root.default.user-limit-factor", 1)
yarnConf.setInt("yarn.scheduler.capacity.root.default.maximum-capacity", 100)
yarnConf.set("yarn.scheduler.capacity.root.default.state", "RUNNING")
yarnConf.set("yarn.scheduler.capacity.root.default.acl_submit_applications", "*")
yarnConf.set("yarn.scheduler.capacity.root.default.acl_administer_queue", "*")
yarnConf.setInt("yarn.scheduler.capacity.node-locality-delay", -1)
yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
yarnCluster.init(yarnConf)
yarnCluster.start()

View file

@ -104,6 +104,10 @@
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
</dependency>
<dependency>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>

View file

@ -162,6 +162,11 @@
<groupId>org.datanucleus</groupId>
<artifactId>datanucleus-core</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>

View file

@ -112,11 +112,24 @@ private[hive] object IsolatedClientLoader extends Logging {
hadoopVersion: String,
ivyPath: Option[String],
remoteRepos: String): Seq[URL] = {
val hadoopJarNames = if (hadoopVersion.startsWith("3")) {
Seq(s"org.apache.hadoop:hadoop-client-api:$hadoopVersion",
s"org.apache.hadoop:hadoop-client-runtime:$hadoopVersion")
} else {
Seq(s"org.apache.hadoop:hadoop-client:$hadoopVersion")
}
val hiveArtifacts = version.extraDeps ++
Seq("hive-metastore", "hive-exec", "hive-common", "hive-serde")
.map(a => s"org.apache.hive:$a:${version.fullVersion}") ++
Seq("com.google.guava:guava:14.0.1",
s"org.apache.hadoop:hadoop-client:$hadoopVersion")
Seq("com.google.guava:guava:14.0.1") ++ hadoopJarNames
val extraExclusions = if (hadoopVersion.startsWith("3")) {
// this introduced from lower version of Hive could conflict with jars in Hadoop 3.2+, so
// exclude here in favor of the ones in Hadoop 3.2+
Seq("org.apache.hadoop:hadoop-auth")
} else {
Seq.empty
}
val classpaths = quietly {
SparkSubmitUtils.resolveMavenCoordinates(
@ -125,7 +138,7 @@ private[hive] object IsolatedClientLoader extends Logging {
Some(remoteRepos),
ivyPath),
transitive = true,
exclusions = version.exclusions)
exclusions = version.exclusions ++ extraExclusions)
}
val allFiles = classpaths.map(new File(_)).toSet