290aa02179
### What changes were proposed in this pull request?
This reverts commit SPARK-33212 (cb3fa6c936
) mostly with three exceptions:
1. `SparkSubmitUtils` was updated recently by SPARK-33580
2. `resource-managers/yarn/pom.xml` was updated recently by SPARK-33104 to add `hadoop-yarn-server-resourcemanager` test dependency.
3. Adjust `com.fasterxml.jackson.module:jackson-module-jaxb-annotations` dependency in K8s module which is updated recently by SPARK-33471.
### Why are the changes needed?
According to [HADOOP-16080](https://issues.apache.org/jira/browse/HADOOP-16080) since Apache Hadoop 3.1.1, `hadoop-aws` doesn't work with `hadoop-client-api`. It fails at write operation like the following.
**1. Spark distribution with `-Phadoop-cloud`**
```scala
$ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY
20/11/30 23:01:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context available as 'sc' (master = local[*], app id = local-1606806088715).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.read.parquet("s3a://dongjoon/users.parquet").show
20/11/30 23:01:34 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
+------+--------------+----------------+
| name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa| null| [3, 9, 15, 20]|
| Ben| red| []|
+------+--------------+----------------+
scala> Seq(1).toDF.write.parquet("s3a://dongjoon/out.parquet")
20/11/30 23:02:14 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)/ 1]
java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
```
**2. Spark distribution without `-Phadoop-cloud`**
```scala
$ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY -c spark.eventLog.enabled=true -c spark.eventLog.dir=s3a://dongjoon/spark-events/ --packages org.apache.hadoop:hadoop-aws:3.2.0,org.apache.hadoop:hadoop-common:3.2.0
...
java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:772)
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CI.
Closes #30508 from dongjoon-hyun/SPARK-33212-REVERT.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
298 lines
10 KiB
XML
298 lines
10 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
~ Licensed to the Apache Software Foundation (ASF) under one or more
|
|
~ contributor license agreements. See the NOTICE file distributed with
|
|
~ this work for additional information regarding copyright ownership.
|
|
~ The ASF licenses this file to You under the Apache License, Version 2.0
|
|
~ (the "License"); you may not use this file except in compliance with
|
|
~ the License. You may obtain a copy of the License at
|
|
~
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
~
|
|
~ Unless required by applicable law or agreed to in writing, software
|
|
~ distributed under the License is distributed on an "AS IS" BASIS,
|
|
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
~ See the License for the specific language governing permissions and
|
|
~ limitations under the License.
|
|
-->
|
|
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
|
<modelVersion>4.0.0</modelVersion>
|
|
<parent>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-parent_2.12</artifactId>
|
|
<version>3.1.0-SNAPSHOT</version>
|
|
<relativePath>../pom.xml</relativePath>
|
|
</parent>
|
|
|
|
<artifactId>spark-hadoop-cloud_2.12</artifactId>
|
|
<packaging>jar</packaging>
|
|
<name>Spark Project Hadoop Cloud Integration</name>
|
|
<description>
|
|
Contains Hadoop JARs and transitive dependencies needed to interact with cloud infrastructures.
|
|
</description>
|
|
<properties>
|
|
<sbt.project.name>hadoop-cloud</sbt.project.name>
|
|
</properties>
|
|
|
|
<build>
|
|
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
|
|
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
|
|
</build>
|
|
|
|
<dependencies>
|
|
<!--used during compilation but not exported as transitive dependencies-->
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-sql_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
<scope>provided</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-core_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
<type>test-jar</type>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-client</artifactId>
|
|
<version>${hadoop.version}</version>
|
|
<scope>provided</scope>
|
|
</dependency>
|
|
<!--
|
|
the AWS module pulls in jackson; its transitive dependencies can create
|
|
intra-jackson-module version problems.
|
|
-->
|
|
<dependency>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-aws</artifactId>
|
|
<version>${hadoop.version}</version>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-common</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>commons-logging</groupId>
|
|
<artifactId>commons-logging</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>org.codehaus.jackson</groupId>
|
|
<artifactId>jackson-mapper-asl</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>org.codehaus.jackson</groupId>
|
|
<artifactId>jackson-core-asl</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-core</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-databind</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-annotations</artifactId>
|
|
</exclusion>
|
|
<!-- Keep old SDK out of the assembly to avoid conflict with Kinesis module -->
|
|
<exclusion>
|
|
<groupId>com.amazonaws</groupId>
|
|
<artifactId>aws-java-sdk</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-openstack</artifactId>
|
|
<version>${hadoop.version}</version>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-common</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>commons-logging</groupId>
|
|
<artifactId>commons-logging</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>junit</groupId>
|
|
<artifactId>junit</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>org.mockito</groupId>
|
|
<artifactId>mockito-all</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
|
|
<!--
|
|
Add joda time to ensure that anything downstream which doesn't pull in spark-hive
|
|
gets the correct joda time artifact, so doesn't have auth failures on later Java 8 JVMs
|
|
-->
|
|
<dependency>
|
|
<groupId>joda-time</groupId>
|
|
<artifactId>joda-time</artifactId>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
</dependency>
|
|
<!-- explicitly declare the jackson artifacts desired -->
|
|
<dependency>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-databind</artifactId>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-annotations</artifactId>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>com.fasterxml.jackson.dataformat</groupId>
|
|
<artifactId>jackson-dataformat-cbor</artifactId>
|
|
<version>${fasterxml.jackson.version}</version>
|
|
</dependency>
|
|
<!--Explicit declaration to force in Spark version into transitive dependencies -->
|
|
<dependency>
|
|
<groupId>org.apache.httpcomponents</groupId>
|
|
<artifactId>httpclient</artifactId>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
</dependency>
|
|
<!--Explicit declaration to force in Spark version into transitive dependencies -->
|
|
<dependency>
|
|
<groupId>org.apache.httpcomponents</groupId>
|
|
<artifactId>httpcore</artifactId>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-azure</artifactId>
|
|
<version>${hadoop.version}</version>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-common</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>org.codehaus.jackson</groupId>
|
|
<artifactId>jackson-mapper-asl</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-core</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>com.google.guava</groupId>
|
|
<artifactId>guava</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
</dependencies>
|
|
|
|
<profiles>
|
|
|
|
<!--
|
|
Hadoop 3 simplifies the classpath, and adds a new committer base class which
|
|
enables store-specific committers.
|
|
-->
|
|
<profile>
|
|
<id>hadoop-3.2</id>
|
|
<properties>
|
|
<extra.source.dir>src/hadoop-3/main/scala</extra.source.dir>
|
|
<extra.testsource.dir>src/hadoop-3/test/scala</extra.testsource.dir>
|
|
</properties>
|
|
|
|
<build>
|
|
<plugins>
|
|
<plugin>
|
|
<groupId>org.codehaus.mojo</groupId>
|
|
<artifactId>build-helper-maven-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<id>add-scala-sources</id>
|
|
<phase>generate-sources</phase>
|
|
<goals>
|
|
<goal>add-source</goal>
|
|
</goals>
|
|
<configuration>
|
|
<sources>
|
|
<source>${extra.source.dir}</source>
|
|
</sources>
|
|
</configuration>
|
|
</execution>
|
|
<execution>
|
|
<id>add-scala-test-sources</id>
|
|
<phase>generate-test-sources</phase>
|
|
<goals>
|
|
<goal>add-test-source</goal>
|
|
</goals>
|
|
<configuration>
|
|
<sources>
|
|
<source>${extra.testsource.dir}</source>
|
|
</sources>
|
|
</configuration>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
</plugins>
|
|
</build>
|
|
<dependencies>
|
|
<!--
|
|
There's now a hadoop-cloud-storage which transitively pulls in the store JARs,
|
|
but it still needs some selective exclusion across versions, especially 3.0.x.
|
|
-->
|
|
<dependency>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-cloud-storage</artifactId>
|
|
<version>${hadoop.version}</version>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
<exclusions>
|
|
<exclusion>
|
|
<groupId>org.apache.hadoop</groupId>
|
|
<artifactId>hadoop-common</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>org.codehaus.jackson</groupId>
|
|
<artifactId>jackson-mapper-asl</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-core</artifactId>
|
|
</exclusion>
|
|
<exclusion>
|
|
<groupId>com.google.guava</groupId>
|
|
<artifactId>guava</artifactId>
|
|
</exclusion>
|
|
</exclusions>
|
|
</dependency>
|
|
<!--
|
|
The jetty declarations are made
|
|
(a) to keep that jetty-util-ajax version in sync with the rest of Spark.
|
|
(b) to minimise the effects which Spark's jetty shading has on the
|
|
availability of the jetty JARs on for hadoop-azure, which depends
|
|
on them.
|
|
-->
|
|
<dependency>
|
|
<groupId>org.eclipse.jetty</groupId>
|
|
<artifactId>jetty-util</artifactId>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.eclipse.jetty</groupId>
|
|
<artifactId>jetty-util-ajax</artifactId>
|
|
<version>${jetty.version}</version>
|
|
<scope>${hadoop.deps.scope}</scope>
|
|
</dependency>
|
|
</dependencies>
|
|
</profile>
|
|
|
|
</profiles>
|
|
|
|
</project>
|