spark-instrumented-optimizer/resource-managers/kubernetes/integration-tests/pom.xml
Stavros Kontopoulos 5e74570c8f [SPARK-23153][K8S] Support client dependencies with a Hadoop Compatible File System
## What changes were proposed in this pull request?
- solves the current issue with --packages in cluster mode (there is no ticket for it). Also note of some [issues](https://issues.apache.org/jira/browse/SPARK-22657) of the past here when hadoop libs are used at the spark submit side.
- supports spark.jars, spark.files, app jar.

It works as follows:
Spark submit uploads the deps to the HCFS. Then the driver serves the deps via the Spark file server.
No hcfs uris are propagated.

The related design document is [here](https://docs.google.com/document/d/1peg_qVhLaAl4weo5C51jQicPwLclApBsdR1To2fgc48/edit). the next option to add is the RSS but has to be improved given the discussion in the past about it (Spark 2.3).
## How was this patch tested?

- Run integration test suite.
- Run an example using S3:

```
 ./bin/spark-submit \
...
 --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.6 \
 --deploy-mode cluster \
 --name spark-pi \
 --class org.apache.spark.examples.SparkPi \
 --conf spark.executor.memory=1G \
 --conf spark.kubernetes.namespace=spark \
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
 --conf spark.driver.memory=1G \
 --conf spark.executor.instances=2 \
 --conf spark.sql.streaming.metricsEnabled=true \
 --conf "spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp" \
 --conf spark.kubernetes.container.image.pullPolicy=Always \
 --conf spark.kubernetes.container.image=skonto/spark:k8s-3.0.0 \
 --conf spark.kubernetes.file.upload.path=s3a://fdp-stavros-test \
 --conf spark.hadoop.fs.s3a.access.key=... \
 --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
 --conf spark.hadoop.fs.s3a.fast.upload=true \
 --conf spark.kubernetes.executor.deleteOnTermination=false \
 --conf spark.hadoop.fs.s3a.secret.key=... \
 --conf spark.files=client:///...resolv.conf \
file:///my.jar **
```
Added integration tests based on [Ceph nano](https://github.com/ceph/cn). Looks very [active](http://www.sebastien-han.fr/blog/2019/02/24/Ceph-nano-is-getting-better-and-better/).
Unfortunately minio needs hadoop >= 2.8.

Closes #23546 from skonto/support-client-deps.

Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: Erik Erlandson <eerlands@redhat.com>
2019-05-22 16:15:42 -07:00

186 lines
8.2 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.12</artifactId>
<version>3.0.0-SNAPSHOT</version>
<relativePath>../../../pom.xml</relativePath>
</parent>
<artifactId>spark-kubernetes-integration-tests_2.12</artifactId>
<properties>
<download-maven-plugin.version>1.3.0</download-maven-plugin.version>
<exec-maven-plugin.version>1.4.0</exec-maven-plugin.version>
<extraScalaTestArgs></extraScalaTestArgs>
<kubernetes-client.version>4.1.2</kubernetes-client.version>
<scala-maven-plugin.version>3.2.2</scala-maven-plugin.version>
<scalatest-maven-plugin.version>1.0</scalatest-maven-plugin.version>
<sbt.project.name>kubernetes-integration-tests</sbt.project.name>
<!-- Integration Test Configuration Properties -->
<!-- Please see README.md in this directory for explanation of these -->
<spark.kubernetes.test.sparkTgz></spark.kubernetes.test.sparkTgz>
<spark.kubernetes.test.unpackSparkDir>${project.build.directory}/spark-dist-unpacked</spark.kubernetes.test.unpackSparkDir>
<spark.kubernetes.test.imageTag>N/A</spark.kubernetes.test.imageTag>
<spark.kubernetes.test.imageTagFile>${project.build.directory}/imageTag.txt</spark.kubernetes.test.imageTagFile>
<spark.kubernetes.test.deployMode>minikube</spark.kubernetes.test.deployMode>
<spark.kubernetes.test.imageRepo>docker.io/kubespark</spark.kubernetes.test.imageRepo>
<spark.kubernetes.test.kubeConfigContext></spark.kubernetes.test.kubeConfigContext>
<spark.kubernetes.test.master></spark.kubernetes.test.master>
<spark.kubernetes.test.namespace></spark.kubernetes.test.namespace>
<spark.kubernetes.test.serviceAccountName></spark.kubernetes.test.serviceAccountName>
<test.exclude.tags></test.exclude.tags>
<test.include.tags></test.include.tags>
</properties>
<packaging>jar</packaging>
<name>Spark Project Kubernetes Integration Tests</name>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.fabric8</groupId>
<artifactId>kubernetes-client</artifactId>
<version>${kubernetes-client.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
<type>test-jar</type>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>${exec-maven-plugin.version}</version>
<executions>
<execution>
<id>setup-integration-test-env</id>
<phase>pre-integration-test</phase>
<goals>
<goal>exec</goal>
</goals>
<configuration>
<executable>scripts/setup-integration-test-env.sh</executable>
<arguments>
<argument>--unpacked-spark-tgz</argument>
<argument>${spark.kubernetes.test.unpackSparkDir}</argument>
<argument>--image-repo</argument>
<argument>${spark.kubernetes.test.imageRepo}</argument>
<argument>--image-tag</argument>
<argument>${spark.kubernetes.test.imageTag}</argument>
<argument>--image-tag-output-file</argument>
<argument>${spark.kubernetes.test.imageTagFile}</argument>
<argument>--deploy-mode</argument>
<argument>${spark.kubernetes.test.deployMode}</argument>
<argument>--spark-tgz</argument>
<argument>${spark.kubernetes.test.sparkTgz}</argument>
</arguments>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<skipTests>true</skipTests>
</configuration>
</plugin>
<plugin>
<!-- Triggers scalatest plugin in the integration-test phase instead of
the test phase. -->
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
<version>${scalatest-maven-plugin.version}</version>
<configuration>
<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
<junitxml>.</junitxml>
<filereports>SparkTestSuite.txt</filereports>
<argLine>-ea -Xmx4g -XX:ReservedCodeCacheSize=512m ${extraScalaTestArgs}</argLine>
<stderr/>
<systemProperties>
<log4j.configuration>file:src/test/resources/log4j.properties</log4j.configuration>
<java.awt.headless>true</java.awt.headless>
<spark.kubernetes.test.imageTagFile>${spark.kubernetes.test.imageTagFile}</spark.kubernetes.test.imageTagFile>
<spark.kubernetes.test.unpackSparkDir>${spark.kubernetes.test.unpackSparkDir}</spark.kubernetes.test.unpackSparkDir>
<spark.kubernetes.test.imageRepo>${spark.kubernetes.test.imageRepo}</spark.kubernetes.test.imageRepo>
<spark.kubernetes.test.deployMode>${spark.kubernetes.test.deployMode}</spark.kubernetes.test.deployMode>
<spark.kubernetes.test.kubeConfigContext>${spark.kubernetes.test.kubeConfigContext}</spark.kubernetes.test.kubeConfigContext>
<spark.kubernetes.test.master>${spark.kubernetes.test.master}</spark.kubernetes.test.master>
<spark.kubernetes.test.namespace>${spark.kubernetes.test.namespace}</spark.kubernetes.test.namespace>
<spark.kubernetes.test.serviceAccountName>${spark.kubernetes.test.serviceAccountName}</spark.kubernetes.test.serviceAccountName>
<spark.kubernetes.test.jvmImage>${spark.kubernetes.test.jvmImage}</spark.kubernetes.test.jvmImage>
<spark.kubernetes.test.pythonImage>${spark.kubernetes.test.pythonImage}</spark.kubernetes.test.pythonImage>
<spark.kubernetes.test.rImage>${spark.kubernetes.test.rImage}</spark.kubernetes.test.rImage>
</systemProperties>
<tagsToExclude>${test.exclude.tags}</tagsToExclude>
<tagsToInclude>${test.include.tags}</tagsToInclude>
</configuration>
<executions>
<execution>
<id>test</id>
<phase>none</phase>
<goals>
<goal>test</goal>
</goals>
</execution>
<execution>
<id>integration-test</id>
<phase>integration-test</phase>
<goals>
<goal>test</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>