Commit graph

128 commits

Author SHA1 Message Date
Dongjoon Hyun 020e84e92f [SPARK-34486][K8S] Upgrade kubernetes-client to 4.13.2
### What changes were proposed in this pull request?

This PR aims to upgrade `kubernetes-client` library from 4.12.0 to 4.13.2 for Apache Spark 3.2.0.

### Why are the changes needed?

This will bring [K8s 1.19.1](https://github.com/fabric8io/kubernetes-client/pull/2541) models officially and the latest bug fixes.

- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.0
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.1
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.2

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the K8s IT and UT.

```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
- Run SparkR on simple dataframe.R example
Run completed in 19 minutes, 25 seconds.
Total number of tests run: 27
Suites: completed 2, aborted 0
Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #31602 from dongjoon-hyun/SPARK-34486.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-02-21 18:35:38 +09:00
“attilapiros” 76e5d75e36 [SPARK-33763] Add metrics for better tracking of dynamic allocation
### What changes were proposed in this pull request?

This PR adds the following metrics to track executor remove reasons during dynamic allocation:
-  `numberExecutorsGracefullyDecommissioned`: number of executors which reached the finished decommissioning state and shut itself down cleanly
- `numberExecutorsDecommissionUnfinished`: executors which requested to decommission but they stopped without reaching the finished decommissioning state
- `numberExecutorsKilledByDriver`: executors killed by the driver (requested to stop)
-  `numberExecutorsExitedUnexpectedly`: executors exited without driver request

### Why are the changes needed?

For supporting monitoring of dynamic allocation better with these metrics.

### Does this PR introduce _any_ user-facing change?

Yes. The new metrics will be available for monitoring.

### How was this patch tested?

With unit and integration tests.

Finally manually checked the new metrics in jconsole:
<img width="1054" alt="jmx" src="https://user-images.githubusercontent.com/2017933/107458686-de8adf00-6b54-11eb-86f7-41faf2fb638f.png">

Closes #31450 from attilapiros/SPARK-33763-final.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2021-02-17 13:44:36 -08:00
“attilapiros” 5f91245cc2 [SPARK-34426][K8S][TESTS] Add driver and executors POD logs to integration tests log when the test fails
### What changes were proposed in this pull request?

This PR introduces a new protected method in `SparkFunSuite` which is only called when the test failed and can be used to collect logs for failed test. By this PR it is implemented in the Kubernetes tests by `KubernetesSuite` class where it collects all the POD logs and logs them out.

This unfortunately cannot be realized with a simple "after" method as in the "after" method the test outcome is not available.

Moreover this PR removes the `appLocator` as a method argument as `appLocator` is available as a member variable.

### Why are the changes needed?

Currently both the driver and executors logs are lost.

In [developer-tools](https://spark.apache.org/developer-tools.html) there is a hint:
"Getting logs from the pods and containers directly is an exercise left to the reader."

But when the test is executed by Jenkins and a failure happened we really need the POD logs to analyze problem.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

By integration testing. I have checked what would happen if one test fails, the output would be:

```
21/02/14 11:05:34.261 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite:

===== EXTRA LOGS FOR THE FAILED TEST

21/02/14 11:05:34.278 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: BEGIN driver POD log
++ id -u
+ myuid=185
++ id -g
+ mygid=0
+ set +e
++ getent passwd 185
+ uidentry=
+ set -e
+ '[' -z '' ']'
+ '[' -w /etc/passwd ']'
+ echo '185185:0:anonymous uid:/opt/spark:/bin/false'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.3 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///opt/spark/tests/decommissioning.py
21/02/14 10:02:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting decom test
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/02/14 10:02:29 INFO SparkContext: Running Spark version 3.2.0-SNAPSHOT
21/02/14 10:02:29 INFO ResourceUtils: ==============================================================
21/02/14 10:02:29 INFO ResourceUtils: No custom resources configured for spark.driver.
21/02/14 10:02:29 INFO ResourceUtils: ==============================================================
...
21/02/14 10:03:17 INFO ShutdownHookManager: Deleting directory /var/data/spark-fa6961ed-a2c1-444c-bfeb-20e63ba0b5cf/spark-ab4b0287-6e24-4b39-837e-9b0b62c1f26f
21/02/14 10:03:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-d6b11e7d-6a03-4a1d-8559-37cb853319bf

21/02/14 11:05:34.279 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: END driver POD log
```

Closes #31561 from attilapiros/SPARK-34426.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
2021-02-17 05:49:16 +09:00
Holden Karau 5248ecb5ab [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes
### What changes were proposed in this pull request?

Allow users to have Spark attempt to decommission excluded executors.
Since excluded executors may be flaky, this also adds the ability for users to specify a time limit after which a decommissioning executor will be killed by Spark.

### Why are the changes needed?

This may help prevent fetch failures from excluded executors, and also handle the situation in which executors

### Does this PR introduce _any_ user-facing change?

Yes, two new configuration flags for the behaviour.

### How was this patch tested?

Extended unit and integration tests.

Closes #31539 from holdenk/re=enable-SPARK-34104-SPARK-34105.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2021-02-09 18:16:09 -08:00
HyukjinKwon c8628c943c Revert "[SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes"
This reverts commit 50641d2e3d.
2021-02-10 08:00:03 +09:00
Holden Karau 50641d2e3d [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes
### What changes were proposed in this pull request?

Allow users to have Spark attempt to decommission excluded executors.
Since excluded executors may be flaky, this also adds the ability for users to specify a time limit after which a decommissioning executor will be killed by Spark.

### Why are the changes needed?

This may help prevent fetch failures from excluded executors, and also handle the situation in which executors

### Does this PR introduce _any_ user-facing change?

Yes, two new configuration flags for the behaviour.

### How was this patch tested?

Extended unit and integration tests.

Closes #31249 from holdenk/configure-inaccessibleList-kill-to-use-decommissioning.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
2021-02-09 14:21:24 -08:00
HyukjinKwon a99a47ca1d [SPARK-33748][K8S] Respect environment variables and configurations for Python executables
### What changes were proposed in this pull request?

This PR proposes:

- Respect `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations in Kubernates just like other cluster types in Spark.

- Depreate `spark.kubernetes.pyspark.pythonVersion` and guide users to set the environment variables and configurations for Python executables.
    NOTE that `spark.kubernetes.pyspark.pythonVersion` is already a no-op configuration without this PR. Default is `3` and other values are disallowed.

- In order for Python executable settings to be consistently used, fix `spark.archives` option to unpack into the current working directory in the driver of Kubernates' cluster mode. This behaviour is identical with Yarn's cluster mode. By doing this, users can leverage Conda or virtuenenv in cluster mode as below:

   ```python
    conda create -y -n pyspark_conda_env -c conda-forge pyarrow pandas conda-pack
    conda activate pyspark_conda_env
    conda pack -f -o pyspark_conda_env.tar.gz
    PYSPARK_PYTHON=./environment/bin/python spark-submit --archives pyspark_conda_env.tar.gz#environment app.py
   ```

- Removed several unused or useless codes such as `extractS3Key` and `renameResourcesToLocalFS`

### Why are the changes needed?

- To provide a consistent support of PySpark by using `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations.
- To provide Conda and virtualenv support via `spark.archives` options.

### Does this PR introduce _any_ user-facing change?

Yes:

- `spark.kubernetes.pyspark.pythonVersion` is deprecated.
- `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, and `spark.pyspark.python` and `spark.pyspark.driver.python` configurations are respected.

### How was this patch tested?

Manually tested via:

```bash
minikube delete
minikube start --cpus 12 --memory 16384
kubectl create namespace spark-integration-test
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: spark-integration-test
EOF
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test
dev/make-distribution.sh --pip --tgz -Pkubernetes
resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.2.0-SNAPSHOT-bin-3.2.0.tgz  --service-account spark --namespace spark-integration-test
```

Unittests were also added.

Closes #30735 from HyukjinKwon/SPARK-33748.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-12-15 08:56:45 +09:00
Kousuke Saruta d662b95535 [SPARK-33754][K8S][DOCS] Update kubernetes/integration-tests/README.md to follow the default Hadoop profile updated
### What changes were proposed in this pull request?

This PR updates `kubernetes/integration-tests/README.md`.

### Why are the changes needed?

To follow the current Hadoop profile (hadoop-3.2).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I have confirmed that the integration tests pass with the following command for both Hadoop 3.2 an 2.7.
```
build/mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
  -Pkubernetes \
  -Pkubernetes-integration-tests \
  -Dspark.kubernetes.test.imageTag=${IMAGE_TAG} \
  -Dspark.kubernetes.test.imageRepo=docker.io/kubespark \
  -Dspark.kubernetes.test.namespace=default \
  -Dspark.kubernetes.test.deployMode=minikube \
  -Dtest.include.tags=k8s
```

Closes #30726 from sarutak/update-kube-integ-readme.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-11 01:52:13 -08:00
Kousuke Saruta 795db05bf6
[SPARK-33732][K8S][TESTS] Kubernetes integration tests doesn't work with Minikube 1.9+
### What changes were proposed in this pull request?

This PR changes `Minikube.scala` for Kubernetes integration tests to work with Minikube 1.9+.
`Minikube.scala` assumes that `apiserver.key` and `apiserver.crt` are in `~/.minikube/`.
But as of Minikube 1.9, they are in `~/.minikube/profiles/<profile>`.

### Why are the changes needed?

Currently, Kubernetes integration tests doesn't work with Minikube 1.9+.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed the following test passes.
```
$ build/sbt -Pkubernetes -Pkubernetes-integration-tests package 'kubernetes-integration-tests/testOnly -- -z "SparkPi with no"'
```

Closes #30700 from sarutak/minikube-1.9.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-09 22:04:09 -08:00
Prashant Sharma 6317ba29a1
[SPARK-33668][K8S][TEST] Fix flaky test "Verify logging configuration is picked from the provided
### What changes were proposed in this pull request?
Fix flaky test "Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties."
The test is flaking, with multiple flaked instances - the reason for the failure has been similar to:

```

The code passed to eventually never returned normally. Attempted 109 times over 3.0079882413999997 minutes. Last failure message: Failure executing: GET at:
https://192.168.39.167:8443/api/v1/namespaces/b37fc72a991b49baa68a2eaaa1516463/pods/spark-pi-97a9bc76308e7fe3-exec-1/log?pretty=false. Message: pods "spark-pi-97a9bc76308e7fe3-exec-1" not found. Received status: Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=pods, name=spark-pi-97a9bc76308e7fe3-exec-1, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "spark-pi-97a9bc76308e7fe3-exec-1" not found, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={}).. (KubernetesSuite.scala:402)

```
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36854/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36852/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36850/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36848/console
From the above failures, it seems, that executor finishes too quickly and is removed by spark before the test can complete.
So, in order to mitigate this situation, one way is to turn on the flag
   "spark.kubernetes.executor.deleteOnTermination"

### Why are the changes needed?

Fixes a flaky test.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests.
May be a few runs of jenkins integration test, may reveal if the problem is resolved or not.

Closes #30616 from ScrapCodes/SPARK-33668/fix-flaky-k8s-integration-test.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-05 23:04:55 -08:00
Dongjoon Hyun de9818f043
[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT
### What changes were proposed in this pull request?

This PR aims to update `master` branch version to 3.2.0-SNAPSHOT.

### Why are the changes needed?

Start to prepare Apache Spark 3.2.0.

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

Pass the CIs.

Closes #30606 from dongjoon-hyun/SPARK-3.2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-04 14:10:42 -08:00
HyukjinKwon 990bee9c58 [SPARK-33615][K8S] Make 'spark.archives' working in Kubernates
### What changes were proposed in this pull request?

This PR proposes to make `spark.archives` configuration working in Kubernates.
It works without a problem in standalone cluster but there seems a bug in Kubernates.
It fails to fetch the file on the driver side as below:

```
20/12/03 13:33:53 INFO SparkContext: Added JAR file:/tmp/spark-75004286-c83a-4369-b624-14c5d2d2a748/spark-examples_2.12-3.1.0-SNAPSHOT.jar at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/jars/spark-examples_2.12-3.1.0-SNAPSHOT.jar with timestamp 1607002432558
20/12/03 13:33:53 INFO SparkContext: Added archive file:///tmp/tmp4542734800151332666.txt.tar.gz#test_tar_gz at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz with timestamp 1607002432558
20/12/03 13:33:53 INFO TransportClientFactory: Successfully created connection to spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc/172.17.0.4:7078 after 83 ms (47 ms spent in bootstraps)
20/12/03 13:33:53 INFO Utils: Fetching spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz to /tmp/spark-66573e24-27a3-427c-99f4-36f06d9e9cd5/fetchFileTemp2665785666227461849.tmp
20/12/03 13:33:53 ERROR SparkContext: Error initializing SparkContext.
java.lang.RuntimeException: Stream '/files/tmp4542734800151332666.txt.tar.gz' was not found.
	at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:242)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
```

This is because `spark.archives` was not actually added on the driver side correctly. The changes here fix it by adding and resolving URIs correctly.

### Why are the changes needed?

`spark.archives` feature can be leveraged for many things such as Conda support. We should make it working in Kubernates as well.
This is a bug fix too.

### Does this PR introduce _any_ user-facing change?

No, this feature is not out yet.

### How was this patch tested?

I manually tested with Minikube 1.15.1. For an environment issue (?), I had to use a custom namespace, service account and roles. `default` service account does not work for me and complains it doesn't have permissions to get/list pods, etc.

```bash
minikube delete
minikube start --cpus 12 --memory 16384
kubectl create namespace spark-integration-test
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: spark-integration-test
EOF
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test
dev/make-distribution.sh --pip --tgz -Pkubernetes
resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.1.0-SNAPSHOT-bin-3.2.0.tgz  --service-account spark --namespace spark-integration-test
```

Closes #30581 from HyukjinKwon/SPARK-33615.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-12-04 19:37:03 +09:00
Prashant Sharma 91182d6cce
[SPARK-33626][K8S][TEST] Allow k8s integration tests to assert both driver and executor logs for expected log(s)
### What changes were proposed in this pull request?

Allow k8s integration tests to assert both driver and executor logs for expected log(s)

### Why are the changes needed?

Some of the tests will be able to provide full coverage of the use case, by asserting both driver and executor logs.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

TBD

Closes #30568 from ScrapCodes/expectedDriverLogChanges.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-02 08:43:30 -08:00
yangjie01 e3058ba17c [SPARK-33441][BUILD] Add unused-imports compilation check and remove all unused-imports
### What changes were proposed in this pull request?
This pr add a new Scala compile arg to `pom.xml` to defense against new unused imports:

- `-Ywarn-unused-import` for Scala 2.12
- `-Wconf:cat=unused-imports:e` for Scala 2.13

The other fIles change are remove all unused imports in Spark code

### Why are the changes needed?
Cleanup code and add guarantee to defense against new unused imports

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #30351 from LuciferYang/remove-imports-core-module.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-11-19 14:20:39 +09:00
Stavros Kontopoulos dcac78e12b
[SPARK-27936][K8S] Support python deps
Supports python client deps from the launcher fs.
This is a feature that was added for java deps. This PR adds support fo rpythona s well.

yes

Manually running different scenarios and via examining the driver & executors logs. Also there is an integration test added.
I verified that the python resources are added to the spark file server and they are named properly so they dont fail the executors. Note here that as previously the following will not work:
primary resource `A.py`: uses a closure defined in submited pyfile `B.py`, context.py only adds to the pythonpath files with certain extension eg. zip, egg, jar.

Closes #25870 from skonto/python-deps.

Lead-authored-by: Stavros Kontopoulos <skontopo@redhat.com>
Co-authored-by: Stavros Kontopoulos <st.kontopoulos@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-18 10:43:41 -08:00
Rameshkrishnan Muthusamy 5e8549973d
[SPARK-33471][K8S][BUILD] Upgrade kubernetes-client to 4.12.0
### What changes were proposed in this pull request?

This PR aims to upgrade Kubernetes-client from 4.11.1 to 4.12.0

### Why are the changes needed?

This upgrades the dependency for Apache Spark 3.1.0.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #30401 from ramesh-muthusamy/SPARK-33471-k8s-clientupgrade.

Authored-by: Rameshkrishnan Muthusamy <rameshkrishnan_muthusamy@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-17 13:41:58 -08:00
Prashant Sharma 2a8e253cdb
[SPARK-32222][K8S][TESTS] Add K8s IT for conf propagation
### What changes were proposed in this pull request?

Added integration test - which tries to configure a log4j.properties and checks if, it is the one pickup by the driver.

### Why are the changes needed?

Improved test coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

By running integration tests.

Closes #30388 from ScrapCodes/SPARK-32222/k8s-it-spark-conf-propagate.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-17 08:47:04 -08:00
Dongjoon Hyun 22baf05a9e [SPARK-33408][SPARK-32354][K8S][R] Use R 3.6.3 in K8s R image and re-enable RTestsSuite
### What changes were proposed in this pull request?

This PR aims to use R 3.6.3 in K8s R image and re-enable `RTestsSuite`.

### Why are the changes needed?

Jenkins Server is using `R 3.6.3`.
```
+ SPARK_HOME=/home/jenkins/workspace/SparkPullRequestBuilder-K8s
+ /usr/bin/R CMD check --as-cran --no-tests SparkR_3.1.0.tar.gz
* using log directory ‘/home/jenkins/workspace/SparkPullRequestBuilder-K8s/R/SparkR.Rcheck’
* using R version 3.6.3 (2020-02-29)
```

OpenJDK docker image is using `R 3.5.2 (2018-12-20)` which is old and currently `spark-3.0.1` fails to run SparkR.
```
$ cd spark-3.0.1-bin-hadoop3.2

$ bin/docker-image-tool.sh -R kubernetes/dockerfiles/spark/bindings/R/Dockerfile -n build
...
	 exit code: 1
	 termination reason: Error
...

$ bin/spark-submit --master k8s://https://192.168.64.49:8443 --deploy-mode cluster --conf spark.kubernetes.container.image=spark-r:latest local:///opt/spark/examples/src/main/r/dataframe.R

$ k logs dataframe-r-b1c14b75b0c09eeb-driver
...
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.RRunner local:///opt/spark/examples/src/main/r/dataframe.R
20/11/10 06:03:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error: package or namespace load failed for ‘SparkR’ in rbind(info, getNamespaceInfo(env, "S3methods")):
 number of columns of matrices must match (see arg 2)
In addition: Warning message:
package ‘SparkR’ was built under R version 4.0.2
Execution halted
```

In addition, this PR aims to recover the test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass K8S IT Jenkins job.

Closes #30130 from dongjoon-hyun/SPARK-32354.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-11-12 15:36:31 +09:00
yangjie01 02fd52cfbc [SPARK-33352][CORE][SQL][SS][MLLIB][AVRO][K8S] Fix procedure-like declaration compilation warnings in Scala 2.13
### What changes were proposed in this pull request?
There are two similar compilation warnings about procedure-like declaration in Scala 2.13:

```
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: procedure syntax is deprecated for constructors: add `=`, as in method definition
```
and

```
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `run`'s return type
```

this pr is the first part to resolve SPARK-33352:

- For constructors method definition add `=` to convert to function syntax

- For without `return type` methods definition add `: Unit =` to convert to function syntax

### Why are the changes needed?
Eliminate compilation warnings in Scala 2.13 and this change should be compatible with Scala 2.12

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #30255 from LuciferYang/SPARK-29392-FOLLOWUP.1.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-11-08 12:51:48 -06:00
Dongjoon Hyun 27d8136934 [SPARK-33324][K8S][BUILD] Upgrade kubernetes-client to 4.11.1
### What changes were proposed in this pull request?

This PR aims to upgrade `Kubernetes-client` from 4.10.3 to 4.11.1.

### Why are the changes needed?

This upgrades the dependency for Apache Spark 3.1.0.
Since 4.12.0 is still new and has a breaking API changes, this PR chooses the latest compatible one.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the all CIs including K8s IT.

Closes #30233 from dongjoon-hyun/SPARK-33324.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-11-02 22:23:26 -08:00
Dongjoon Hyun afa6aee4f5 [SPARK-33237][K8S][TESTS] Use default Hadoop-3.2 profile from K8s IT Jenkins job
### What changes were proposed in this pull request?

This PR aims to use `hadoop-3.2` profile in K8s IT Jenkins jobs.
- [x] Switch the default value of `HADOOP_PROFILE` from `hadoop-2.7` to `hadoop-3.2`.
- [x] Remove `-Phadoop2.7` from Jenkins K8s IT job.
    - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/configure

**BEFORE**
```
./dev/make-distribution.sh --name ${DATE}-${REVISION} --r --pip --tgz -DzincPort=${ZINC_PORT} \
     -Phadoop-2.7 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver
```

**AFTER**
```
./dev/make-distribution.sh --name ${DATE}-${REVISION} --r --pip --tgz -DzincPort=${ZINC_PORT} \
     -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver
```

### Why are the changes needed?

Since Apache Spark 3.1.0, Hadoop 3 is the default.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Check the Jenkins K8s IT log and result.
- https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34899/
```
+ /home/jenkins/workspace/SparkPullRequestBuilder-K8s/build/mvn clean package -DskipTests -DzincPort=4021 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver
Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/mvn
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
```

Closes #30153 from dongjoon-hyun/SPARK-33237.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-26 15:29:12 -07:00
yi.wu edeecada66 [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission
### What changes were proposed in this pull request?

This PR cleans up the RPC message flow among the multiple decommission use cases, it includes changes:

* Keep `Worker`'s decommission status be consistent between the case where decommission starts from `Worker` and the case where decommission starts from the `MasterWebUI`: sending `DecommissionWorker` from `Master` to `Worker` in the latter case.

* Change from two-way communication to one-way communication when notifying decommission between driver and executor: it's obviously unnecessary for the executor to acknowledge the decommission status to the driver since the decommission request is from the driver. And it's same in reverse.

* Only send one message instead of two(`DecommissionSelf`/`DecommissionBlockManager`) when decommission the executor: executor and `BlockManager` are in the same JVM.

* Clean up codes around here.

### Why are the changes needed?

Before:

<img width="1948" alt="WeChat56c00cc34d9785a67a544dca036d49da" src="https://user-images.githubusercontent.com/16397174/92850308-dc461c80-f41e-11ea-8ac0-287825f4e0c4.png">

After:
<img width="1968" alt="WeChat05f7afb017e3f0132394c5e54245e49e" src="https://user-images.githubusercontent.com/16397174/93189571-de88dd80-f774-11ea-9300-1943920aa27d.png">

(Note the diagrams only counts those RPC calls that needed to go through the network. Local RPC calls are not counted here.)

After this change, We reduced 6 original RPC calls and added one more RPC call for keeping the consistent decommission status for the Worker. And the RPC flow becomes more clear.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated existing tests.

Closes #29817 from Ngone51/simplify-decommission-rpc.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-10-23 13:58:44 +09:00
Holden Karau ce6180c8c3 [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration
### What changes were proposed in this pull request?

If a block is removed between discovery to transfer fo the block, we short circuit that block and remove it from the list to transfer and increment the transferred blocks. This is complicated since both RPC errors and local read errors may be reported with the same exception class.

### Why are the changes needed?

Slow shuffle refreshes could waste time when decommissioning has already finished. Decommissioning might avoid transferring some some blocks to an otherwise live host which is marked as "full" if a deleted block fails to transfer to that host.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit and integration tests.

Closes #30046 from holdenk/handle-cleaned-shuffles-during0migration.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-16 14:47:46 -07:00
Dongjoon Hyun 8e7c39089f [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3'
### What changes were proposed in this pull request?

This PR makes `spark.kubernetes.pyspark.pythonVersion` allow only `3`. In other words, it will reject `2` for `Python 2`.
- [x] Configuration description and check is updated.
- [x] Documentation is updated
- [x] Unit test cases are updated.
- [x] Docker image script is updated.

### Why are the changes needed?

After SPARK-32138, Apache Spark 3.1 dropped Python 2 support.

### Does this PR introduce _any_ user-facing change?

Yes, but Python 2 support is already dropped officially.

### How was this patch tested?

Pass the CI.

Closes #30049 from dongjoon-hyun/SPARK-DROP-PYTHON2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-15 01:51:01 -07:00
jlafleche d75222dd1b [SPARK-33012][BUILD][K8S] Upgrade fabric8 to 4.10.3
### What changes were proposed in this pull request?

This PR aims to upgrade `kubernetes-client` library to track fabric8's declared compatibility for k8s 1.18.0:
https://github.com/fabric8io/kubernetes-client#compatibility-matrix

### Why are the changes needed?
According to fabric8, 4.9.2 is incompatible with k8s 1.18.0.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Not tested yet.

Closes #29888 from laflechejonathan/jlf/fabric8Ugprade.

Authored-by: jlafleche <jlafleche@palantir.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-09-30 19:00:18 -07:00
Holden Karau 27f6b5a103 [SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage
### What changes were proposed in this pull request?

Fixes the log strings the decom integration tests looks for and add comments reminding people to run the K8s integration tests when changing those code paths.

### Why are the changes needed?

The strings it looks for have been changed.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

WIP: Verify that the K8s jenkins job succeeds

Closes #29854 from holdenk/SPARK-32979-spark-k8s-decom-test-is-broken.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-09-23 15:39:31 -07:00
Holden Karau 059fb6571e [SPARK-32657][K8S] Update the log strings we check for & imports in decommission K8s
### What changes were proposed in this pull request?

Update the log strings to match the new log messages.

### Why are the changes needed?

Tests are failing

### Does this PR introduce _any_ user-facing change?

No, test only change.

### How was this patch tested?
WIP: Make sure the DecommissionSuite passes in Jenkins.

Closes #29479 from holdenk/SPARK-32657-Decommissioning-tests-update-log-string.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-08-19 18:28:21 -07:00
Holden Karau 548ac7c4af [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
### What changes were proposed in this pull request?

If graceful decommissioning is enabled, Spark's dynamic scaling uses this instead of directly killing executors.

### Why are the changes needed?

When scaling down Spark we should avoid triggering recomputes as much as possible.

### Does this PR introduce _any_ user-facing change?

Hopefully their jobs run faster or at the same speed. It also enables experimental shuffle service free dynamic scaling when graceful decommissioning is enabled (using the same code as the shuffle tracking dynamic scaling).

### How was this patch tested?

For now I've extended the ExecutorAllocationManagerSuite for both core & streaming.

Closes #29367 from holdenk/SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-08-12 17:07:18 -07:00
Dongjoon Hyun 13c64c2980 [SPARK-32448][K8S][TESTS] Use single version for exec-maven-plugin/scalatest-maven-plugin
### What changes were proposed in this pull request?

Two different versions are used for the same artifacts, `exec-maven-plugin` and `scalatest-maven-plugin`. This PR aims to use the same versions for `exec-maven-plugin` and `scalatest-maven-plugin`. In addition, this PR removes `scala-maven-plugin.version` from `K8s` integration suite because it's unused.

### Why are the changes needed?

This will prevent the mistake which upgrades only one place and forgets the others.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins K8S IT.

Closes #29248 from dongjoon-hyun/SPARK-32448.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-26 19:25:41 -07:00
Sean Owen be2eca22e9 [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
### What changes were proposed in this pull request?

Updates to scalatest 3.2.0. Though it looks large, it is 99% changes to the new location of scalatest classes.

### Why are the changes needed?

3.2.0+ has a fix that is required for Scala 2.13.3+ compatibility.

### Does this PR introduce _any_ user-facing change?

No, only affects tests.

### How was this patch tested?

Existing tests.

Closes #29196 from srowen/SPARK-32398.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-23 16:20:17 -07:00
maruilei ffdca8285e [SPARK-32367][K8S][TESTS] Correct the spelling of parameter in KubernetesTestComponents
### What changes were proposed in this pull request?

Correct the spelling of parameter 'spark.executor.instances' in KubernetesTestComponents

### Why are the changes needed?

Parameter spelling error

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Test is not needed.

Closes #29164 from merrily01/SPARK-32367.

Authored-by: maruilei <maruilei@jd.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-20 13:48:57 -07:00
Holden Karau a4ca355af8 [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
### What is changed?

This pull request adds the ability to migrate shuffle files during Spark's decommissioning. The design document associated with this change is at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE .

To allow this change the `MapOutputTracker` has been extended to allow the location of shuffle files to be updated with `updateMapOutput`. When a shuffle block is put, a block update message will be sent which triggers the `updateMapOutput`.

Instead of rejecting remote puts of shuffle blocks `BlockManager` delegates the storage of shuffle blocks to it's shufflemanager's resolver (if supported). A new, experimental, trait is added for shuffle resolvers to indicate they handle remote putting of blocks.

The existing block migration code is moved out into a separate file, and a producer/consumer model is introduced for migrating shuffle files from the host as quickly as possible while not overwhelming other executors.

### Why are the changes needed?

Recomputting shuffle blocks can be expensive, we should take advantage of our decommissioning time to migrate these blocks.

### Does this PR introduce any user-facing change?

This PR introduces two new configs parameters, `spark.storage.decommission.shuffleBlocks.enabled` & `spark.storage.decommission.rddBlocks.enabled` that control which blocks should be migrated during storage decommissioning.

### How was this patch tested?

New unit test & expansion of the Spark on K8s decom test to assert that decommisioning with shuffle block migration means that the results are not recomputed even when the original executor is terminated.

This PR is a cleaned-up version of the previous WIP PR I made https://github.com/apache/spark/pull/28331 (thanks to attilapiros for his very helpful reviewing on it :)).

Closes #28708 from holdenk/SPARK-20629-copy-shuffle-data-when-nodes-are-being-shutdown-cleaned-up.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: Attila Zsolt Piros <attilazsoltpiros@apiros-mbp16.lan>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-07-19 21:33:13 -07:00
Sean Owen ee624821a9 [SPARK-29292][YARN][K8S][MESOS] Fix Scala 2.13 compilation for remaining modules
### What changes were proposed in this pull request?

See again the related PRs like https://github.com/apache/spark/pull/28971
This completes fixing compilation for 2.13 for all but `repl`, which is a separate task.

### Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests. (2.13 was not tested; this is about getting it to compile without breaking 2.12)

Closes #29147 from srowen/SPARK-29292.4.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-18 15:08:00 -07:00
Dongjoon Hyun fb51925123 [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT
### What changes were proposed in this pull request?

This PR aims to remove Python 2 test case from K8s IT.

### Why are the changes needed?

Since Apache Spark 3.1.0 dropped Python 2.7, 3.4 and 3.5 support officially via SPARK-32138, K8s IT fails.

```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example *** FAILED ***
  The code passed to eventually never returned normally. Attempted 113 times over 2.0014854648999996 minutes. Last failure message: false was not true. (KubernetesSuite.scala:370)
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning
- Run SparkR on simple dataframe.R example
Run completed in 11 minutes, 15 seconds.
Total number of tests run: 20
Suites: completed 2, aborted 0
Tests: succeeded 19, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass Jenkins K8s IT.

Closes #29136 from dongjoon-hyun/SPARK-32335.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-16 11:21:14 -07:00
HyukjinKwon 4ad9bfd53b [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
### What changes were proposed in this pull request?

This PR aims to drop Python 2.7, 3.4 and 3.5.

Roughly speaking, it removes all the widely known Python 2 compatibility workarounds such as `sys.version` comparison, `__future__`. Also, it removes the Python 2 dedicated codes such as `ArrayConstructor` in Spark.

### Why are the changes needed?

 1. Unsupport EOL Python versions
 2. Reduce maintenance overhead and remove a bit of legacy codes and hacks for Python 2.
 3. PyPy2 has a critical bug that causes a flaky test, SPARK-28358 given my testing and investigation.
 4. Users can use Python type hints with Pandas UDFs without thinking about Python version
 5. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.

### Does this PR introduce _any_ user-facing change?

Yes, users cannot use Python 2.7, 3.4 and 3.5 in the upcoming Spark version.

### How was this patch tested?

Manually tested and also tested in Jenkins.

Closes #28957 from HyukjinKwon/SPARK-32138.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-14 11:22:44 +09:00
Dongjoon Hyun 9c134b57bf [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default
### What changes were proposed in this pull request?

According to the dev mailing list discussion, this PR aims to switch the default Apache Hadoop dependency from 2.7.4 to 3.2.0 for Apache Spark 3.1.0 on December 2020.

| Item | Default Hadoop Dependency |
|------|-----------------------------|
| Apache Spark Website | 3.2.0 |
| Apache Download Site | 3.2.0 |
| Apache Snapshot | 3.2.0 |
| Maven Central | 3.2.0 |
| PyPI | 2.7.4 (We will switch later) |
| CRAN | 2.7.4 (We will switch later) |
| Homebrew | 3.2.0 (already) |

In Apache Spark 3.0.0 release, we focused on the other features. This PR targets for [Apache Spark 3.1.0 scheduled on December 2020](https://spark.apache.org/versioning-policy.html).

### Why are the changes needed?

Apache Hadoop 3.2 has many fixes and new cloud-friendly features.

**Reference**
- 2017-08-04: https://hadoop.apache.org/release/2.7.4.html
- 2019-01-16: https://hadoop.apache.org/release/3.2.0.html

### Does this PR introduce _any_ user-facing change?

Since the default Hadoop dependency changes, the users will get a better support in a cloud environment.

### How was this patch tested?

Pass the Jenkins.

Closes #28897 from dongjoon-hyun/SPARK-32058.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-26 19:43:29 -07:00
Dongjoon Hyun e5b9b862e6 [SPARK-31881][K8S][TESTS][FOLLOWUP] Activate hadoop-2.7 by default in K8S IT
### What changes were proposed in this pull request?

This PR aims to activate `hadoop-2.7` profile by default in Kubernetes IT module.

### Why are the changes needed?

While SPARK-31881 added Hadoop 3.2 support, one default test dependency was moved to `hadoop-2.7` profile. It works when we give one of `hadoop-2.7` and `hadoop-3.2`, but it fails when we don't give any profile.

**BEFORE**
```
$ mvn test-compile -pl resource-managers/kubernetes/integration-tests -Pkubernetes-integration-tests
...
[ERROR] [Error] /APACHE/spark-merge/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:23:
object amazonaws is not a member of package com
```

**AFTER**
```
$ mvn test-compile -pl resource-managers/kubernetes/integration-tests -Pkubernetes-integration-tests
..
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
```

The default activated profile will be override when we give `hadoop-3.2`.
```
$ mvn help:active-profiles -Pkubernetes-integration-tests
...
Active Profiles for Project 'org.apache.spark:spark-kubernetes-integration-tests_2.12🫙3.1.0-SNAPSHOT':

The following profiles are active:

 - hadoop-2.7 (source: org.apache.spark:spark-kubernetes-integration-tests_2.12:3.1.0-SNAPSHOT)
 - kubernetes-integration-tests (source: org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT)
 - test-java-home (source: org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT)
```
```
$ mvn help:active-profiles -Pkubernetes-integration-tests -Phadoop-3.2
...
Active Profiles for Project 'org.apache.spark:spark-kubernetes-integration-tests_2.12🫙3.1.0-SNAPSHOT':

The following profiles are active:

 - hadoop-3.2 (source: org.apache.spark:spark-kubernetes-integration-tests_2.12:3.1.0-SNAPSHOT)
 - hadoop-3.2 (source: org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT)
 - kubernetes-integration-tests (source: org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT)
 - test-java-home (source: org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins UT and IT.

Currently, all Jenkins build and tests (UT & IT) passes without this patch. This should be tested manually with the above command.

`hadoop-3.2` K8s IT also passed like the following.
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning
Run completed in 8 minutes, 33 seconds.
Total number of tests run: 19
Suites: completed 2, aborted 0
Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #28716 from dongjoon-hyun/SPARK-31881-2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-03 02:17:25 -07:00
Dongjoon Hyun 17586f9ed2 [SPARK-31881][K8S][TESTS] Support Hadoop 3.2 K8s integration tests
### What changes were proposed in this pull request?

This PR aims to support Hadoop 3.2 K8s integration tests.

### Why are the changes needed?

Currently, K8s integration suite assumes Hadoop 2.7 and has hard-coded parts.

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change.

### How was this patch tested?

Pass the Jenkins K8s IT (with Hadoop 2.7) and do the manual testing for Hadoop 3.2 as described in `README.md`.

```
./dev/dev-run-integration-tests.sh --hadoop-profile hadoop-3.2
```

I verified this manually like the following.
```
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh \
--spark-tgz .../spark-3.1.0-SNAPSHOT-bin-3.2.0.tgz \
--exclude-tags r \
--hadoop-profile hadoop-3.2
...
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning
Run completed in 8 minutes, 49 seconds.
Total number of tests run: 19
Suites: completed 2, aborted 0
Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #28689 from dongjoon-hyun/SPARK-31881.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-01 11:19:42 -07:00
Dongjoon Hyun 64ffc66496
[SPARK-31786][K8S][BUILD] Upgrade kubernetes-client to 4.9.2
### What changes were proposed in this pull request?

This PR aims to upgrade `kubernetes-client` library to bring the JDK8 related fixes. Please note that JDK11 works fine without any problem.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.9.2
  - JDK8 always uses http/1.1 protocol (Prevent OkHttp from wrongly enabling http/2)

### Why are the changes needed?

OkHttp "wrongly" detects the Platform as Jdk9Platform on JDK 8u251.
- https://github.com/fabric8io/kubernetes-client/issues/2212
- https://stackoverflow.com/questions/61565751/why-am-i-not-able-to-run-sparkpi-example-on-a-kubernetes-k8s-cluster

Although there is a workaround `export HTTP2_DISABLE=true` and `Downgrade JDK or K8s`, we had better avoid this problematic situation.

### Does this PR introduce _any_ user-facing change?

No. This will recover the failures on JDK 8u252.

### How was this patch tested?

- [x] Pass the Jenkins UT (https://github.com/apache/spark/pull/28601#issuecomment-632474270)
- [x] Pass the Jenkins K8S IT with the K8s 1.13 (https://github.com/apache/spark/pull/28601#issuecomment-632438452)
- [x] Manual testing with K8s 1.17.3. (Below)

**v1.17.6 result (on Minikube)**
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning
Run completed in 8 minutes, 27 seconds.
Total number of tests run: 19
Suites: completed 2, aborted 0
Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #28601 from dongjoon-hyun/SPARK-K8S-CLIENT.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-23 11:07:45 -07:00
Dongjoon Hyun a06768ec4d
[SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
### What changes were proposed in this pull request?

This PR aims to skip R image building and one R test during integration tests by using `--exclude-tags r`.

### Why are the changes needed?

We have only one R integration test case, `Run SparkR on simple dataframe.R example`, for submission test coverage. Since this is rarely changed, we can skip this and save the efforts required for building the whole R image and running the single test.
```
KubernetesSuite:
...
- Run SparkR on simple dataframe.R example
Run completed in 10 minutes, 20 seconds.
Total number of tests run: 20
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the K8S integration test and do the following manually. (Note that R test is skipped)
```
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --deploy-mode docker-for-desktop --exclude-tags r --spark-tgz $PWD/spark-*.tgz
...
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning
Run completed in 10 minutes, 23 seconds.
Total number of tests run: 19
Suites: completed 2, aborted 0
Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #28594 from dongjoon-hyun/SPARK-31780.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-20 18:33:38 -07:00
Dongjoon Hyun b7947e0285 [SPARK-31766][K8S][TESTS] Add Spark version prefix to K8s UUID test image tag
### What changes were proposed in this pull request?

This PR aims to add Spark version prefix during generating test image tag for K8s integration testing.

### Why are the changes needed?

This helps to distinguish the images by version.

**BEFORE**
```
$ docker images | grep kubespark
kubespark/spark-py  F7188CBD-AE08-4705-9C8A-D0DD3DC8B86F  ...
kubespark/spark     F7188CBD-AE08-4705-9C8A-D0DD3DC8B86F  ...
```

**AFTER**
```
$ docker images | grep kubespark
kubespark/spark-py  3.1.0-SNAPSHOT_F7188CBD-AE08-4705-9C8A-D0DD3DC8B86F ...
kubespark/spark     3.1.0-SNAPSHOT_F7188CBD-AE08-4705-9C8A-D0DD3DC8B86F ...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the K8s integration test.

```
...
Successfully tagged kubespark/spark:3.1.0-SNAPSHOT_688b46c8-c119-404d-aadb-d05a14262db7
...
Successfully tagged kubespark/spark-py:3.1.0-SNAPSHOT_688b46c8-c119-404d-aadb-d05a14262db7
...
Successfully tagged kubespark/spark-r:3.1.0-SNAPSHOT_688b46c8-c119-404d-aadb-d05a14262db7
```

Closes #28587 from dongjoon-hyun/SPARK-31766.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-05-20 14:55:23 +09:00
William Hyun 5bb1a09b5f
[SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
This PR aims to use GitHub URL instead of a broken link in `BasicTestsSuite.scala`.

Currently, K8s integration test is broken:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20K8s%20Builds/job/spark-master-test-k8s/534/console

```
- Run SparkRemoteFileTest using a remote data file *** FAILED ***
  The code passed to eventually never returned normally. Attempted 130 times over 2.00109555135 minutes. Last failure message: false was not true. (KubernetesSuite.scala:370)
```

No.

Pass the K8s integration test.

Closes #28561 from williamhyun/williamhyun-patch-1.

Authored-by: williamhyun <62487364+williamhyun@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-17 22:13:16 -07:00
Dongjoon Hyun dba525c997 [SPARK-31313][K8S][TEST] Add m01 node name to support Minikube 1.8.x
### What changes were proposed in this pull request?

This PR aims to add `m01` as a node name additionally to `PVTestsSuite`.

### Why are the changes needed?

minikube 1.8.0 ~ 1.8.2 generate a cluster with a nodename `m01` while all the other versions have `minikube`. This causes `PVTestSuite` failure.
```
$ minikube --vm-driver=hyperkit start --memory 6000 --cpus 8
* minikube v1.8.2 on Darwin 10.15.3
  - MINIKUBE_ACTIVE_DOCKERD=minikube
* Using the hyperkit driver based on user configuration
* Creating hyperkit VM (CPUs=8, Memory=6000MB, Disk=20000MB) ...
* Preparing Kubernetes v1.18.0 on Docker 19.03.6 ...
* Launching Kubernetes ...
* Enabling addons: default-storageclass, storage-provisioner
* Waiting for cluster to come online ...
* Done! kubectl is now configured to use "minikube"

$ kubectl get nodes
NAME   STATUS   ROLES    AGE   VERSION
m01    Ready    master   22s   v1.17.3
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This only adds a new node name. So, K8S Jenkins job should passed.
In addition, `K8s` integration test suite should be tested on `minikube 1.8.2` manually.

```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning
- Run SparkR on simple dataframe.R example
Run completed in 10 minutes, 23 seconds.
Total number of tests run: 20
Suites: completed 2, aborted 0
Tests: succeeded 20, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

For the above test, Minikube 1.8.2 and K8s v1.18.0 is used.
```
$ minikube version
minikube version: v1.8.2
commit: eb13446e786c9ef70cb0a9f85a633194e62396a1

$ kubectl version --short
Client Version: v1.18.0
Server Version: v1.18.0
```

Closes #28080 from dongjoon-hyun/SPARK-31313.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2020-04-01 03:42:26 +00:00
Dongjoon Hyun f206bbde3a
[SPARK-31244][K8S][TEST] Use Minio instead of Ceph in K8S DepsTestsSuite
### What changes were proposed in this pull request?

This PR (SPARK-31244) replaces `Ceph` with `Minio` in K8S `DepsTestSuite`.

### Why are the changes needed?

Currently, `DepsTestsSuite` is using `ceph` for S3 storage. However, the used version and all new releases are broken on new `minikube` releases. We had better use more robust and small one.

```
$ minikube version
minikube version: v1.8.2

$ minikube -p minikube docker-env | source

$ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo ceph/daemon:v4.0.3-stable-4.0-nautilus-centos-7-x86_64 /bin/sh
2020-03-25 04:26:21  /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks like we have not been able to discover the network settings

$ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo ceph/daemon:v4.0.11-stable-4.0-nautilus-centos-7 /bin/sh
2020-03-25 04:20:30  /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks like we have not been able to discover the network settings
```

Also, the image size is unnecessarily big (almost `1GB`) and growing while `minio` is `55.8MB` with the same features.
```
$ docker images | grep ceph
ceph/daemon v4.0.3-stable-4.0-nautilus-centos-7-x86_64 a6a05ccdf924 6 months ago 852MB
ceph/daemon v4.0.11-stable-4.0-nautilus-centos-7       87f695550d8e 12 hours ago 901MB

$ docker images | grep minio
minio/minio latest                                     95c226551ea6 5 days ago   55.8MB
```

### Does this PR introduce any user-facing change?

No. (This is a test case change)

### How was this patch tested?

Pass the existing Jenkins K8s integration test job and test with the latest minikube.
```
$ minikube version
minikube version: v1.8.2

$ kubectl version --short
Client Version: v1.17.4
Server Version: v1.17.4

$ NO_MANUAL=1 ./dev/make-distribution.sh --r --pip --tgz -Pkubernetes
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz $PWD/spark-*.tgz
...
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage *** FAILED *** // This is irrelevant to this PR.
- Launcher client dependencies          // This is the fixed test case by this PR.
- Test basic decommissioning
- Run SparkR on simple dataframe.R example
Run completed in 12 minutes, 4 seconds.
...
```

The following is the working snapshot of `DepsTestSuite` test.
```
$ kubectl get all -ncf9438dd8a65436686b1196a6b73000f
NAME                                                  READY   STATUS    RESTARTS   AGE
pod/minio-0                                           1/1     Running   0          70s
pod/spark-test-app-8494bddca3754390b9e59a2ef47584eb   1/1     Running   0          55s

NAME                                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/minio-s3                                     NodePort    10.109.54.180   <none>        9000:30678/TCP               70s
service/spark-test-app-fd916b711061c7b8-driver-svc   ClusterIP   None            <none>        7078/TCP,7079/TCP,4040/TCP   55s

NAME                     READY   AGE
statefulset.apps/minio   1/1     70s
```

Closes #28015 from dongjoon-hyun/SPARK-31244.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-25 12:38:15 -07:00
Prashant Sharma 3799d2b9d8
[SPARK-30715][K8S][TESTS][FOLLOWUP] Update k8s client version in IT as well
### What changes were proposed in this pull request?
This is a follow up for SPARK-30715 . Kubernetes client version in sync in integration-tests and kubernetes/core

### Why are the changes needed?
More than once, the kubernetes client version has gone out of sync between integration tests and kubernetes/core. So brought them up in sync again and added a comment to save us from future need of this additional followup.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Manually.

Closes #27948 from ScrapCodes/follow-up-spark-30715.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-21 18:26:53 -07:00
Holden Karau 2825237448 [SPARK-31062][K8S][TESTS] Improve spark decommissioning k8s test reliability
### What changes were proposed in this pull request?

Replace a sleep with waiting for the first collect to happen to try and make the K8s test code more reliable.

### Why are the changes needed?

Currently the Decommissioning test appears to be flaky in Jenkins.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Ran K8s test suite in a loop on minikube on my desktop for 10 iterations without this test failing on any of the runs.

Closes #27858 from holdenk/SPARK-31062-Improve-Spark-Decommissioning-K8s-test-teliability.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-03-11 14:42:31 -07:00
gatorsmile 28b8713036 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT
### What changes were proposed in this pull request?
This patch is to bump the master branch version to 3.1.0-SNAPSHOT.

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27698 from gatorsmile/updateVersion.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 19:44:31 -08:00
Holden Karau d273a2bb0f [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
This PR is based on an existing/previou PR - https://github.com/apache/spark/pull/19045

### What changes were proposed in this pull request?

This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache.

There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing

### Why are the changes needed?

With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required.

### Does this PR introduce any user-facing change?

There is no API change, however an additional configuration flag is added to enable/disable this behaviour.

### How was this patch tested?

New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s.

Closes #26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-02-14 12:36:52 -08:00
Dongjoon Hyun 74cd46eb69 [SPARK-30816][K8S][TESTS] Fix dev-run-integration-tests.sh to ignore empty params
### What changes were proposed in this pull request?

This PR aims to fix `dev-run-integration-tests.sh` to ignore empty params correctly.

### Why are the changes needed?

The following script runs `mvn` integration test like the following.
```
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
...
build/mvn integration-test
-f /Users/dongjoon/APACHE/spark/pom.xml
-pl resource-managers/kubernetes/integration-tests
-am
-Pscala-2.12
-Pkubernetes
-Pkubernetes-integration-tests
-Djava.version=8
-Dspark.kubernetes.test.sparkTgz=N/A
-Dspark.kubernetes.test.imageTag=N/A
-Dspark.kubernetes.test.imageRepo=docker.io/kubespark
-Dspark.kubernetes.test.deployMode=minikube
-Dtest.include.tags=k8s
-Dspark.kubernetes.test.namespace=
-Dspark.kubernetes.test.serviceAccountName=
-Dspark.kubernetes.test.kubeConfigContext=
-Dspark.kubernetes.test.master=
-Dtest.exclude.tags=
-Dspark.kubernetes.test.jvmImage=spark
-Dspark.kubernetes.test.pythonImage=spark-py
-Dspark.kubernetes.test.rImage=spark-r
```

After this PR, the empty parameters like the followings will be skipped like the original design.
```
-Dspark.kubernetes.test.namespace=
-Dspark.kubernetes.test.serviceAccountName=
-Dspark.kubernetes.test.kubeConfigContext=
-Dspark.kubernetes.test.master=
-Dtest.exclude.tags=
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins K8S integration test.

Closes #27566 from dongjoon-hyun/SPARK-30816.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-13 11:42:00 -08:00
Dongjoon Hyun 859699135c [SPARK-30807][K8S][TESTS] Support Java 11 in K8S integration tests
### What changes were proposed in this pull request?

This PR aims to support JDK11 test in K8S integration tests.
- This is an update in testing framework instead of individual tests.
- This will enable JDK11 runtime test when you didn't installed JDK11 on your local system.

### Why are the changes needed?

Apache Spark 3.0.0 adds JDK11 support, but K8s integration tests use JDK8 until now.

### Does this PR introduce any user-facing change?

No. This is a dev-only test-related PR.

### How was this patch tested?

This is irrelevant to Jenkins UT, but Jenkins K8S IT (JDK8) should pass.
- https://github.com/apache/spark/pull/27559#issuecomment-585903489 (JDK8 Passed)

And, manually do the following for JDK11 test.
```
$ NO_MANUAL=1 ./dev/make-distribution.sh --r --pip --tgz -Phadoop-3.2 -Pkubernetes
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --java-image-tag 11-jre-slim --spark-tgz $PWD/spark-*.tgz
```

```
$ docker run -it --rm kubespark/spark:1318DD8A-2B15-4A00-BC69-D0E90CED235B /usr/local/openjdk-11/bin/java --version | tail -n1
OpenJDK 64-Bit Server VM 18.9 (build 11.0.6+10, mixed mode)
```

Closes #27559 from dongjoon-hyun/SPARK-30807.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-13 11:17:27 -08:00