spark-instrumented-optimizer/common/network-yarn/pom.xml
Mark Grover 3630911004 [SPARK-20756][YARN] yarn-shuffle jar references unshaded guava
and contains scala classes

## What changes were proposed in this pull request?
This change ensures that all references to guava from within the yarn shuffle jar pointed to the shaded guava class already provided in the jar.

Also, it explicitly excludes scala classes from being added to the jar.

## How was this patch tested?
Ran unit tests on the module and they passed.
javap now returns the expected result - reference to the shaded guava under `org/spark_project` (previously this was referring to `com.google...`
```
javap -cp common/network-yarn/target/scala-2.11/spark-2.3.0-SNAPSHOT-yarn-shuffle.jar -c org/apache/spark/network/yarn/YarnShuffleService | grep Lists
      57: invokestatic  #138                // Method org/spark_project/guava/collect/Lists.newArrayList:()Ljava/util/ArrayList;
```

Guava is still shaded in the jar:
```
jar -tf common/network-yarn/target/scala-2.11/spark-2.3.0-SNAPSHOT-yarn-shuffle.jar | grep guava | head
META-INF/maven/com.google.guava/
META-INF/maven/com.google.guava/guava/
META-INF/maven/com.google.guava/guava/pom.properties
META-INF/maven/com.google.guava/guava/pom.xml
org/spark_project/guava/
org/spark_project/guava/annotations/
org/spark_project/guava/annotations/Beta.class
org/spark_project/guava/annotations/GwtCompatible.class
org/spark_project/guava/annotations/GwtIncompatible.class
org/spark_project/guava/annotations/VisibleForTesting.class
```
(not sure if the above META-INF/* is a problem or not)

I took this jar, deployed it on a yarn cluster with shuffle service enabled, and made sure the YARN node managers came up. An application with a shuffle was run and it succeeded.

Author: Mark Grover <mark@apache.org>

Closes #17990 from markgrover/spark-20756.
2017-05-22 10:10:41 -07:00

169 lines
6.2 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.11</artifactId>
<version>2.3.0-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>
<artifactId>spark-network-yarn_2.11</artifactId>
<packaging>jar</packaging>
<name>Spark Project YARN Shuffle Service</name>
<url>http://spark.apache.org/</url>
<properties>
<sbt.project.name>network-yarn</sbt.project.name>
<!-- Make sure all Hadoop dependencies are provided to avoid repackaging. -->
<hadoop.deps.scope>provided</hadoop.deps.scope>
<shuffle.jar>${project.build.directory}/scala-${scala.binary.version}/spark-${project.version}-yarn-shuffle.jar</shuffle.jar>
<shade>org/spark_project/</shade>
</properties>
<dependencies>
<!-- Core dependencies -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-network-shuffle_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
</dependency>
<!--
This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
them will yield errors.
-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<!-- Provided dependencies -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<outputFile>${shuffle.jar}</outputFile>
<artifactSet>
<includes>
<include>*:*</include>
</includes>
<excludes>
<exclude>org.scala-lang:scala-library</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<relocations combine.children="append">
<relocation>
<pattern>com.fasterxml.jackson</pattern>
<shadedPattern>${spark.shade.packageName}.com.fasterxml.jackson</shadedPattern>
<includes>
<include>com.fasterxml.jackson.**</include>
</includes>
</relocation>
<relocation>
<pattern>io.netty</pattern>
<shadedPattern>${spark.shade.packageName}.io.netty</shadedPattern>
<includes>
<include>io.netty.**</include>
</includes>
</relocation>
</relocations>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
<!-- probes to validate that those dependencies which must be shaded are -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<phase>verify</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<target>
<macrodef name="shaded">
<attribute name="resource"/>
<sequential>
<fail message="Not found ${shade}@{resource}">
<condition>
<not>
<resourceexists>
<zipentry zipfile="${shuffle.jar}" name="${shade}@{resource}"/>
</resourceexists>
</not>
</condition>
</fail>
</sequential>
</macrodef>
<echo>Verifying dependency shading</echo>
<shaded resource="com/fasterxml/jackson/core/JsonParser.class" />
<shaded resource="com/fasterxml/jackson/annotation/JacksonAnnotation.class" />
<shaded resource="com/fasterxml/jackson/databind/JsonSerializer.class" />
</target>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>