[SPARK-36835][FOLLOWUP][BUILD][TEST-HADOOP2.7] Fix maven issue for Hadoop 2.7 profile after enabling dependency reduced pom

### What changes were proposed in this pull request?

Fix an issue where Maven may stuck in an infinite loop when building Spark, for Hadoop 2.7 profile.

### Why are the changes needed?

After re-enabling `createDependencyReducedPom` for `maven-shade-plugin`, Spark build stopped working for Hadoop 2.7 profile and will stuck in an infinitely loop, likely due to a Maven shade plugin bug similar to https://issues.apache.org/jira/browse/MSHADE-148. This seems to be caused by the fact that, under `hadoop-2.7` profile, variable `hadoop-client-runtime.artifact` and `hadoop-client-api.artifact`are both `hadoop-client` which triggers the issue.

As a workaround, this changes `hadoop-client-runtime.artifact` to be `hadoop-yarn-api` when using `hadoop-2.7`. Since `hadoop-yarn-api` is a dependency of `hadoop-client`, this essentially moves the former to the same level as the latter. It should have no effect as both are dependencies of Spark.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #34100 from sunchao/SPARK-36835-followup.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
This commit is contained in:
Chao Sun 2021-09-26 13:39:36 +08:00 committed by Gengliang Wang
parent 7d496fb361
commit 937a74e6e7
2 changed files with 32 additions and 16 deletions

24
pom.xml
View file

@ -249,9 +249,22 @@
<parquet.test.deps.scope>test</parquet.test.deps.scope>
<!--
These default to Hadoop 3.x shaded client/minicluster jars, but are switched to hadoop-client
when the Hadoop profile is hadoop-2.7, because these are only available in 3.x. Note that,
as result we have to include the same hadoop-client dependency multiple times in hadoop-2.7.
These default to Hadoop 3.x shaded client/minicluster jars, but since the shaded jars are
only available in 3.x, we have to switch to non-shaded Hadoop client jars when the active
profile is hadoop-2.7.
To make the above work, in the hadoop-2.7 profile section we bind hadoop-client-api to
hadoop-client and hadoop-client-runtime to hadoop-yarn-api, which is a dependency of the
former. This effectively moves the hadoop-yarn-api to the same level as hadoop-client, but
should be fine since both dependencies are required. We cannot use hadoop-client for both
of these because the maven enforcer plugin bans duplicate dependencies.
We still leave hadoop-client-minicluster to use hadoop-client because it is of test scope,
and is only used by spark-yarn module. To avoid the duplicate dependency issue we only
declare the dependency when we're using hadoop-3.2 profile, in
resource-managers/yarn/pom.xml.
Please check SPARK-36835 for more details.
-->
<hadoop-client-api.artifact>hadoop-client-api</hadoop-client-api.artifact>
<hadoop-client-runtime.artifact>hadoop-client-runtime</hadoop-client-runtime.artifact>
@ -3272,8 +3285,11 @@
<hadoop.version>2.7.4</hadoop.version>
<curator.version>2.7.1</curator.version>
<commons-io.version>2.4</commons-io.version>
<!--
the declaration site above of these variables explains why we need to re-assign them here
-->
<hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
<hadoop-client-runtime.artifact>hadoop-client</hadoop-client-runtime.artifact>
<hadoop-client-runtime.artifact>hadoop-yarn-api</hadoop-client-runtime.artifact>
<hadoop-client-minicluster.artifact>hadoop-client</hadoop-client-minicluster.artifact>
</properties>
</profile>

View file

@ -82,6 +82,18 @@
<activeByDefault>true</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-minicluster.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
<!-- Used by MiniYARNCluster -->
<dependency>
<groupId>org.bouncycastle</groupId>
@ -127,18 +139,6 @@
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-minicluster.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
<!-- Explicit listing of transitive deps that are shaded. Otherwise, odd compiler crashes. -->
<dependency>