spark-instrumented-optimizer/resource-managers/kubernetes/integration-tests/README.md
“attilapiros” 8b94eff1ca [SPARK-34736][K8S][TESTS] Kubernetes and Minikube version upgrade for integration tests
### What changes were proposed in this pull request?

This PR upgrades Kubernetes and Minikube version for integration tests and removes/updates the old code for this new version.

Details of this changes:

- As [discussed in the mailing list](http://apache-spark-developers-list.1001551.n3.nabble.com/minikube-and-kubernetes-cluster-versions-for-integration-testing-td30856.html): updating Minikube version from v0.34.1 to v1.7.3 and kubernetes version from v1.15.12 to v1.17.3.
- making Minikube version checked and fail with an explanation when the test is started with on a version <  v1.7.3.
- removing minikube status checking code related to old Minikube versions
- in the Minikube backend using fabric8's `Config.autoConfigure()` method to configure the kubernetes client to use the `minikube` k8s context (like it was in [one of the Minikube's example](https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/kubectl/equivalents/ConfigUseContext.java#L36))
- Introducing `persistentVolume` test tag: this would be a temporary change to skip PVC tests in the Kubernetes integration test, as currently the PCV tests are blocking the move to Docker as Minikube's driver (for details please check https://issues.apache.org/jira/browse/SPARK-34738).

### Why are the changes needed?

With the current suggestion one can run into several problems without noticing the Minikube/kubernetes version is the problem.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

It was tested on Mac with [this script](https://gist.github.com/attilapiros/cd58a16bdde833c80c5803c337fffa94#file-check_minikube_versions-zsh) which installs each Minikube versions from v1.7.2 (including this version to test the negative case of the version check) and runs the integration tests.

It was started with:
```
./check_minikube_versions.zsh > test_log 2>&1
```

And there was only one build failure the rest was successful:

```
$ grep "BUILD SUCCESS" test_log | wc -l
      26
$ grep "BUILD FAILURE" test_log | wc -l
       1
```

It was for Minikube v1.7.2  and the log is:

```
KubernetesSuite:
*** RUN ABORTED ***
  java.lang.AssertionError: assertion failed: Unsupported Minikube version is detected: minikube version: v1.7.2.For integration testing Minikube version 1.7.3 or greater is expected.
  at scala.Predef$.assert(Predef.scala:223)
  at org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:52)
  at org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:163)
  at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
  at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
  at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org$scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:43)
  at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:273)
  at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:271)
  ...
```

Moreover I made a test with having multiple k8s cluster contexts, too.

Closes #31829 from attilapiros/SPARK-34736.

Lead-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: attilapiros <piros.attila.zsolt@gmail.com>
Signed-off-by: attilapiros <piros.attila.zsolt@gmail.com>
2021-05-10 18:56:52 +02:00

258 lines
11 KiB
Markdown

---
layout: global
title: Spark on Kubernetes Integration Tests
---
# Running the Kubernetes Integration Tests
Note that the integration test framework is currently being heavily revised and
is subject to change.
The simplest way to run the integration tests is to install and run Minikube, then run the following from this
directory:
./dev/dev-run-integration-tests.sh
To run tests with Java 11 instead of Java 8, use `--java-image-tag` to specify the base image.
./dev/dev-run-integration-tests.sh --java-image-tag 11-jre-slim
To run tests with Hadoop 2.7 instead of Hadoop 3.2, use `--hadoop-profile`.
./dev/dev-run-integration-tests.sh --hadoop-profile hadoop-2.7
The minimum tested version of Minikube is 1.7.3. The kube-dns addon must be enabled. Minikube should
run with a minimum of 4 CPUs and 6G of memory:
minikube start --cpus 4 --memory 6144
You can download Minikube [here](https://github.com/kubernetes/minikube/releases).
# Integration test customization
Configuration of the integration test runtime is done through passing different arguments to the test script.
The main useful options are outlined below.
## Using a different backend
The integration test backend i.e. the K8S cluster used for testing is controlled by the `--deploy-mode` option. By
default this is set to `minikube`, the available backends are their prerequisites are as follows.
### `minikube`
Uses the local `minikube` cluster, this requires that `minikube` 1.7.3 or greater be installed and that it be allocated
at least 4 CPUs and 6GB memory (some users have reported success with as few as 3 CPUs and 4GB memory). The tests will
check if `minikube` is started and abort early if it isn't currently running.
### `docker-for-desktop`
Since July 2018 Docker for Desktop provide an optional Kubernetes cluster that can be enabled as described in this
[blog post](https://blog.docker.com/2018/07/kubernetes-is-now-available-in-docker-desktop-stable-channel/). Assuming
this is enabled using this backend will auto-configure itself from the `docker-for-desktop` context that Docker creates
in your `~/.kube/config` file. If your config file is in a different location you should set the `KUBECONFIG`
environment variable appropriately.
### `cloud`
The cloud backend configures the tests to use an arbitrary Kubernetes cluster running in the cloud or otherwise.
The `cloud` backend auto-configures the cluster to use from your K8S config file, this is assumed to be `~/.kube/config`
unless the `KUBECONFIG` environment variable is set to override this location. By default this will use whatever your
current context is in the config file, to use an alternative context from your config file you can specify the
`--context <context>` flag with the desired context.
You can optionally use a different K8S master URL than the one your K8S config file specified, this should be supplied
via the `--spark-master <master-url>` flag.
## Re-using Docker Images
By default, the test framework will build new Docker images on every test execution. A unique image tag is generated,
and it is written to file at `target/imageTag.txt`. To reuse the images built in a previous run, or to use a Docker
image tag that you have built by other means already, pass the tag to the test script:
./dev/dev-run-integration-tests.sh --image-tag <tag>
where if you still want to use images that were built before by the test framework:
./dev/dev-run-integration-tests.sh --image-tag $(cat target/imageTag.txt)
### Customising the Image Names
If your image names do not follow the standard Spark naming convention - `spark`, `spark-py` and `spark-r` - then you can customise the names using several options.
If you use the same basic pattern but a different prefix for the name e.g. `apache-spark` you can just set `--base-image-name <base-name>` e.g.
./dev/dev-run-integration-tests.sh --base-image-name apache-spark
Alternatively if you use completely custom names then you can set each individually via the `--jvm-image-name <name>`, `--python-image-name <name>` and `--r-image-name <name>` arguments e.g.
./dev/dev-run-integration-tests.sh --jvm-image-name jvm-spark --python-image-name pyspark --r-image-name sparkr
## Spark Distribution Under Test
The Spark code to test is handed to the integration test system via a tarball. Here is the option that is used to
specify the tarball:
* `--spark-tgz <path-to-tgz>` - set `<path-to-tgz>` to point to a tarball containing the Spark distribution to test.
This Tarball should be created by first running `dev/make-distribution.sh` passing the `--tgz` flag and `-Pkubernetes`
as one of the options to ensure that Kubernetes support is included in the distribution. For more details on building a
runnable distribution please see the
[Building Spark](https://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution)
documentation.
**TODO:** Don't require the packaging of the built Spark artifacts into this tarball, just read them out of the current
tree.
## Customizing the Namespace and Service Account
If no namespace is specified then a temporary namespace will be created and deleted during the test run. Similarly if
no service account is specified then the `default` service account for the namespace will be used.
Using the `--namespace <namespace>` flag sets `<namespace>` to the namespace in which the tests should be run. If this
is supplied then the tests assume this namespace exists in the K8S cluster and will not attempt to create it.
Additionally this namespace must have an appropriately authorized service account which can be customised via the
`--service-account` flag.
The `--service-account <service account name>` flag sets `<service account name>` to the name of the Kubernetes service
account to use in the namespace specified by the `--namespace` flag. The service account is expected to have permissions
to get, list, watch, and create pods. For clusters with RBAC turned on, it's important that the right permissions are
granted to the service account in the namespace through an appropriate role and role binding. A reference RBAC
configuration is provided in `dev/spark-rbac.yaml`.
# Running the Test Directly
If you prefer to run just the integration tests directly, then you can customise the behaviour via passing system
properties to Maven. For example:
mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
-Pkubernetes -Pkubernetes-integration-tests \
-Phadoop-2.7 -Dhadoop.version=2.7.4 \
-Dspark.kubernetes.test.sparkTgz=spark-3.0.0-SNAPSHOT-bin-example.tgz \
-Dspark.kubernetes.test.imageTag=sometag \
-Dspark.kubernetes.test.imageRepo=docker.io/somerepo \
-Dspark.kubernetes.test.namespace=spark-int-tests \
-Dspark.kubernetes.test.deployMode=docker-for-desktop \
-Dtest.include.tags=k8s
## Available Maven Properties
The following are the available Maven properties that can be passed. For the most part these correspond to flags passed
to the wrapper scripts and using the wrapper scripts will simply set these appropriately behind the scenes.
<table>
<tr>
<th>Property</th>
<th>Description</th>
<th>Default</th>
</tr>
<tr>
<td><code>spark.kubernetes.test.sparkTgz</code></td>
<td>
A runnable Spark distribution to test.
</td>
<td></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.unpackSparkDir</code></td>
<td>
The directory where the runnable Spark distribution will be unpacked.
</td>
<td><code>${project.build.directory}/spark-dist-unpacked</code></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.deployMode</code></td>
<td>
The integration test backend to use. Acceptable values are <code>minikube</code>,
<code>docker-for-desktop</code> and <code>cloud</code>.
<td><code>minikube</code></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.kubeConfigContext</code></td>
<td>
When using the <code>cloud</code> backend specifies the context from the users K8S config file that should be used
as the target cluster for integration testing. If not set and using the <code>cloud</code> backend then your
current context will be used.
</td>
<td></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.master</code></td>
<td>
When using the <code>cloud-url</code> backend must be specified to indicate the K8S master URL to communicate
with.
</td>
<td></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.imageTag</code></td>
<td>
A specific image tag to use, when set assumes images with those tags are already built and available in the
specified image repository. When set to <code>N/A</code> (the default) fresh images will be built.
</td>
<td><code>N/A</code></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.javaImageTag</code></td>
<td>
A specific OpenJDK base image tag to use, when set uses it instead of 8-jre-slim.
</td>
<td><code>8-jre-slim</code></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.imageTagFile</code></td>
<td>
A file containing the image tag to use, if no specific image tag is set then fresh images will be built with a
generated tag and that tag written to this file.
</td>
<td><code>${project.build.directory}/imageTag.txt</code></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.imageRepo</code></td>
<td>
The Docker image repository that contains the images to be used if a specific image tag is set or to which the
images will be pushed to if fresh images are being built.
</td>
<td><code>docker.io/kubespark</code></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.jvmImage</code></td>
<td>
The image name for the JVM based Spark image to test
</td>
<td><code>spark</code></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.pythonImage</code></td>
<td>
The image name for the Python based Spark image to test
</td>
<td><code>spark-py</code></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.rImage</code></td>
<td>
The image name for the R based Spark image to test
</td>
<td><code>spark-r</code></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.namespace</code></td>
<td>
A specific Kubernetes namespace to run the tests in. If specified then the tests assume that this namespace
already exists. When not specified a temporary namespace for the tests will be created and deleted as part of the
test run.
</td>
<td></td>
</tr>
<tr>
<td><code>spark.kubernetes.test.serviceAccountName</code></td>
<td>
A specific Kubernetes service account to use for running the tests. If not specified then the namespaces default
service account will be used and that must have sufficient permissions or the tests will fail.
</td>
<td></td>
</tr>
</table>