Commit graph

178 commits

Author SHA1 Message Date
Dongjoon Hyun 64ffc66496
[SPARK-31786][K8S][BUILD] Upgrade kubernetes-client to 4.9.2
### What changes were proposed in this pull request?

This PR aims to upgrade `kubernetes-client` library to bring the JDK8 related fixes. Please note that JDK11 works fine without any problem.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.9.2
  - JDK8 always uses http/1.1 protocol (Prevent OkHttp from wrongly enabling http/2)

### Why are the changes needed?

OkHttp "wrongly" detects the Platform as Jdk9Platform on JDK 8u251.
- https://github.com/fabric8io/kubernetes-client/issues/2212
- https://stackoverflow.com/questions/61565751/why-am-i-not-able-to-run-sparkpi-example-on-a-kubernetes-k8s-cluster

Although there is a workaround `export HTTP2_DISABLE=true` and `Downgrade JDK or K8s`, we had better avoid this problematic situation.

### Does this PR introduce _any_ user-facing change?

No. This will recover the failures on JDK 8u252.

### How was this patch tested?

- [x] Pass the Jenkins UT (https://github.com/apache/spark/pull/28601#issuecomment-632474270)
- [x] Pass the Jenkins K8S IT with the K8s 1.13 (https://github.com/apache/spark/pull/28601#issuecomment-632438452)
- [x] Manual testing with K8s 1.17.3. (Below)

**v1.17.6 result (on Minikube)**
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning
Run completed in 8 minutes, 27 seconds.
Total number of tests run: 19
Suites: completed 2, aborted 0
Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #28601 from dongjoon-hyun/SPARK-K8S-CLIENT.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-23 11:07:45 -07:00
Thomas Graves b64688ebba [SPARK-29303][WEB UI] Add UI support for stage level scheduling
### What changes were proposed in this pull request?

This adds UI updates to support stage level scheduling and ResourceProfiles. 3 main things have been added. ResourceProfile id added to the Stage page, the Executors page now has an optional selectable column to show the ResourceProfile Id of each executor, and the Environment page now has a section with the ResourceProfile ids.  Along with this the rest api for environment page was updated to include the Resource profile information.

I debating on splitting the resource profile information into its own page but I wasn't sure it called for a completely separate page. Open to peoples thoughts on this.

Screen shots:
![Screen Shot 2020-04-01 at 3 07 46 PM](https://user-images.githubusercontent.com/4563792/78185169-469a7000-7430-11ea-8b0c-d9ede2d41df8.png)
![Screen Shot 2020-04-01 at 3 08 14 PM](https://user-images.githubusercontent.com/4563792/78185175-48fcca00-7430-11ea-8d1d-6b9333700f32.png)
![Screen Shot 2020-04-01 at 3 09 03 PM](https://user-images.githubusercontent.com/4563792/78185176-4a2df700-7430-11ea-92d9-73c382bb0f32.png)
![Screen Shot 2020-04-01 at 11 05 48 AM](https://user-images.githubusercontent.com/4563792/78185186-4dc17e00-7430-11ea-8962-f749dd47ea60.png)

### Why are the changes needed?

For user to be able to know what resource profile was used with which stage and executors. The resource profile information is also available so user debugging can see exactly what resources were requested with that profile.

### Does this PR introduce any user-facing change?

Yes, UI updates.

### How was this patch tested?

Unit tests and tested on yarn both active applications and with the history server.

Closes #28094 from tgravescs/SPARK-29303-pr.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-05-21 13:11:35 -05:00
Dongjoon Hyun c8f3bd861d
[SPARK-31696][K8S] Support driver service annotation in K8S
### What changes were proposed in this pull request?

This PR aims to add `spark.kubernetes.driver.service.annotation` like `spark.kubernetes.driver.service.annotation`.

### Why are the changes needed?

Annotations are used in many ways. One example is that Prometheus monitoring system search metric endpoint via annotation.
- https://github.com/helm/charts/tree/master/stable/prometheus#scraping-pod-metrics-via-annotations

### Does this PR introduce _any_ user-facing change?

Yes. The documentation is added.

### How was this patch tested?

Pass Jenkins with the updated unit tests.

Closes #28518 from dongjoon-hyun/SPARK-31696.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-13 13:59:42 -07:00
Dongjoon Hyun 85dad37f69 [SPARK-31601][K8S] Fix spark.kubernetes.executor.podNamePrefix to work
### What changes were proposed in this pull request?

This PR aims to fix `spark.kubernetes.executor.podNamePrefix` to work.

### Why are the changes needed?

Currently, the configuration is broken like the following.
```
bin/spark-submit \
--master k8s://$K8S_MASTER \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
-c spark.kubernetes.container.image=spark:pr \
-c spark.kubernetes.driver.pod.name=mypod \
-c spark.kubernetes.executor.podNamePrefix=mypod \
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.0-SNAPSHOT.jar
```

**BEFORE SPARK-31601**
```
pod/mypod                              1/1     Running     0          9s
pod/spark-pi-7469dd71c499fafb-exec-1   1/1     Running     0          4s
pod/spark-pi-7469dd71c499fafb-exec-2   1/1     Running     0          4s
```

**AFTER SPARK-31601**
```
pod/mypod                              1/1     Running     0          8s
pod/mypod-exec-1                       1/1     Running     0          3s
pod/mypod-exec-2                       1/1     Running     0          3s
```

### Does this PR introduce any user-facing change?

Yes. This is a bug fix. The conf will work as described in the documentation.

### How was this patch tested?

Pass the Jenkins and run the above comment manually.

Closes #28401 from dongjoon-hyun/SPARK-31601.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Prashant Sharma <prashsh1@in.ibm.com>
2020-04-30 09:15:12 +05:30
Marcelo Vanzin b8ccd75524
[SPARK-29905][K8S] Improve pod lifecycle manager behavior with dynamic allocation
This issue mainly shows up when you enable dynamic allocation:
because there are many executor state changes (because of executors
being requested and starting to run, and later stopped), the lifecycle
manager class could end up logging information about the same executor
multiple times, since the different events would cause the same
executor update to be present in multiple pod snapshots. On top of that,
it could end up making multiple redundant calls into the API server
for the same pod.

Another issue was when the config was set to not delete executor
pods; with dynamic allocation, that means pods keep accumulating
in the API server, and every time the full sync is done by the
polling source, all executors, even the finished ones that Spark
technically does not care about anymore, would be processed.

The change modifies the lifecycle monitor so that it:

- logs executor updates a single time, even if it shows up in
  multiple snapshots, by checking whether the state change
  happened before.
- marks finished-but-not-deleted-in-k8s executors with a label
  so that they can be easily filtered out.

This reduces the amount of logging done by the lifecycle manager,
which is a minor thing in general since the logs are at debug level.
But it also reduces the amount of data that needs to be fetched
from the API server under certain configurations, and overall
reduces interaction with the API server when dynamic allocation is on.

There's also a change in the snapshot store to ensure that the
same subscriber is not called concurrently. That is kind of a bug,
since it means subscribers could be processing snapshots out of order,
or even that they could block multiple threads (e.g. the allocator
callback was synchronized). I actually ran into the "concurrent calls"
situation in the lifecycle manager during testing, and while it did not
seem to cause problems, it did make for some head scratching while
looking at the logs. It seemed safer to fix that.

Unit tests were updated to check for the changes. Also tested in real
cluster with dynamic allocation on.

Closes #26535 from vanzin/SPARK-29905.

Lead-authored-by: Marcelo Vanzin <vanzin@apache.org>
Co-authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-16 14:15:10 -07:00
Seongjin Cho 7699f765f5
[SPARK-31394][K8S] Adds support for Kubernetes NFS volume mounts
### What changes were proposed in this pull request?
This PR (SPARK-31394) aims to add a new feature that enables mounting of Kubernetes NFS volumes. Most of the codes are just slight modifications from the existing codes for EmptyDir/HostDir/PVC support.

### Why are the changes needed?
Kubernetes supports various kinds of volumes, but Spark for Kubernetes supports only EmptyDir/HostDir/PVC. By adding support for NFS, we can use Spark for Kubernetes with NFS storage.

In order to use NFS with the current Spark using PVC, the user needs to first create an empty new PVC with NFS. Kubernetes' NFS provisioner will create a new empty dir in NFS under some pre-configured dir for this PVC, for example, `/nfs/k8s/sjcho-my-notebook-pvc-dce84888-7a9d-11e6-b1ee-5254001e0c1b`. Then the user should add files to process in the newly created PVC using some file-copying job, and then run the desired Spark job using that populated PVC. And then to get the final results out, the user should run another file-copying job.

This in theory works, but for data analysis tasks, is quite cumbersome. With this change, one could simply use existing files in NFS, say `/nfs/home/sjcho/myfiles10.sstable` from the Spark job directly, and also write the results directly to some existing dir under NFS such as `/nfs/home/sjcho/output`.

This PR doesn't use any features other than the features already provided by Kubernetes itself, so there should be no compatibility issues (other than limited by k8s) between the wide variety of NFS choices. This PR merely enables an existing volume type `nfs` supported officially by Kubernetes, just like Spark is currently supporting `hostPath` and `persistentVolumeClaim` right now.

### Does this PR introduce any user-facing change?
Users can now mount NFS volumes by running commands like:
```
spark-submit \
--conf spark.kubernetes.driver.volumes.nfs.myshare.mount.path=/myshare \
--conf spark.kubernetes.driver.volumes.nfs.myshare.mount.readOnly=false \
--conf spark.kubernetes.driver.volumes.nfs.myshare.options.server=nfs.example.com \
--conf spark.kubernetes.driver.volumes.nfs.myshare.options.path=/storage/myshare \
...
```

### How was this patch tested?
Test cases were added just like the existing EmptyDir support.

The code were tested using minikube using the following script:
https://gist.github.com/w4-sjcho/4ba48f8c35a9685f5307fbd46b2c0656#file-run-test-sh

The script creates a new minikube cluster, launches an NFS server inside the cluster, copy `README.md` file to the NFS share, and run `JavaWordCount` example against the file located in NFS.

Closes #27364 from w4-sjcho/master.

Authored-by: Seongjin Cho <sjcho@wisefour.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-15 03:45:39 -07:00
Prashant Sharma 3799d2b9d8
[SPARK-30715][K8S][TESTS][FOLLOWUP] Update k8s client version in IT as well
### What changes were proposed in this pull request?
This is a follow up for SPARK-30715 . Kubernetes client version in sync in integration-tests and kubernetes/core

### Why are the changes needed?
More than once, the kubernetes client version has gone out of sync between integration tests and kubernetes/core. So brought them up in sync again and added a comment to save us from future need of this additional followup.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Manually.

Closes #27948 from ScrapCodes/follow-up-spark-30715.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-21 18:26:53 -07:00
Holden Karau 57d27e900f
[SPARK-31125][K8S] Terminating pods have a deletion timestamp but they are not yet dead
### What changes were proposed in this pull request?

Change what we consider a deleted pod to not include "Terminating"

### Why are the changes needed?

If we get a new snapshot while a pod is in the process of being cleaned up we shouldn't delete the executor until it is fully terminated.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

This should be covered by the decommissioning tests in that they currently are flaky because we sometimes delete the executor instead of allowing it to decommission all the way.

I also ran this in a loop locally ~80 times with the only failures being the PV suite because of unrelated minikube mount issues.

Closes #27905 from holdenk/SPARK-31125-Processing-state-snapshots-incorrect.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-17 12:04:06 -07:00
Pedro Rossi ed06d98044
[SPARK-25355][K8S] Add proxy user to driver if present on spark-submit
### What changes were proposed in this pull request?

This PR adds the proxy user on the spark-submit command to the childArgs, so the proxy user can be retrieved and used in the KubernetesAplication to add the proxy user in the driver container args

### Why are the changes needed?

The proxy user when used on the spark submit doesn't work on the Kubernetes environment since it doesn't add the `--proxy-user` argument on the driver container and when I added it manually to the Pod definition it worked just fine.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Tests were added

Closes #27422 from PedroRossi/SPARK-25355.

Authored-by: Pedro Rossi <pgrr@cin.ufpe.br>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-16 21:53:58 -07:00
beliefer 1254c88034 [SPARK-31118][K8S][DOC] Add version information to the configuration of K8S
### What changes were proposed in this pull request?
Add version information to the configuration of `K8S`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.kubernetes.context | 3.0.0 | SPARK-25887 | c542c247bbfe1214c0bf81076451718a9e8931dc#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.master | 3.0.0 | SPARK-30371 | f14061c6a4729ad419902193aa23575d8f17f597#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.namespace | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image | 2.3.0 | SPARK-22994 | b94debd2b01b87ef1d2a34d48877e38ade0969e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.container.image | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.container.image | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image.pullPolicy | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image.pullSecrets | 2.4.0 | SPARK-23668 | cccaaa14ad775fb981e501452ba2cc06ff5c0f0a#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.requestTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.connectionTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.requestTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.connectionTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
KUBERNETES_AUTH_DRIVER_CONF_PREFIX.serviceAccountName | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 | spark.kubernetes.authenticate.driver
KUBERNETES_AUTH_EXECUTOR_CONF_PREFIX.serviceAccountName | 3.1.0 | SPARK-30122 | f9f06eee9853ad4b6458ac9d31233e729a1ca226#diff-6e882d5561424e7e6651eb46f10104b8 | spark.kubernetes.authenticate.executor
spark.kubernetes.driver.limit.cores | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.request.cores | 3.0.0 | SPARK-27754 | 1a8c09334db87b0e938c38cd6b59d326bdcab3c3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submitInDriver | 2.4.0 | SPARK-22839 | f15906da153f139b698e192ec6f82f078f896f1e#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.limit.cores | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.scheduler.name | 3.0.0 | SPARK-29436 | f800fa383131559c4e841bf062c9775d09190935#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.request.cores | 2.4.0 | SPARK-23285 | fe2b7a4568d65a62da6e6eb00fff05f248b4332c#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.pod.name | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.resourceNamePrefix | 3.0.0 | SPARK-25876 | 6be272b75b4ae3149869e19df193675cc4117763#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podNamePrefix | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.allocation.batch.size | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.allocation.batch.delay | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.lostCheck.maxAttempts | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.waitAppCompletion | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.report.interval | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.apiPollingInterval | 2.4.0 | SPARK-24248 | 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.eventProcessingInterval | 2.4.0 | SPARK-24248 | 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.memoryOverheadFactor | 2.4.0 | SPARK-23984 | 1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.pyspark.pythonVersion | 2.4.0 | SPARK-23984 | a791c29bd824adadfb2d85594bc8dad4424df936#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.krb5.path | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.krb5.configMapName | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.hadoop.configMapName | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.tokenSecret.name | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.tokenSecret.itemKey | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.resource.type | 2.4.1 | SPARK-25021 | 9031c784847353051bc0978f63ef4146ae9095ff#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.local.dirs.tmpfs | 3.0.0 | SPARK-25262 | da6fa3828bb824b65f50122a8a0a0d4741551257#diff-6e882d5561424e7e6651eb46f10104b8 | It exists in branch-3.0, but in pom.xml it is 2.4.0-snapshot
spark.kubernetes.driver.podTemplateFile | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podTemplateFile | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.podTemplateContainerName | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podTemplateContainerName | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.deleteOnTermination | 3.0.0 | SPARK-25515 | 0c2935b01def8a5f631851999d9c2d57b63763e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.dynamicAllocation.deleteGracePeriod | 3.0.0 | SPARK-28487 | 0343854f54b48b206ca434accec99355011560c2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.appKillPodDeletionGracePeriod | 3.0.0 | SPARK-24793 | 05168e725d2a17c4164ee5f9aa068801ec2454f4#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.file.upload.path | 3.0.0 | SPARK-23153 | 5e74570c8f5e7dfc1ca1c53c177827c5cea57bf1#diff-6e882d5561424e7e6651eb46f10104b8 |  
The following appears in the document |   |   |   |  
spark.kubernetes.authenticate.submission.caCertFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.clientKeyFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.clientCertFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.oauthToken | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.oauthTokenFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.caCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.clientKeyFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.clientCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.oauthToken | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.oauthTokenFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.caCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.clientKeyFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.clientCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.oauthTokenFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.caCertFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.clientKeyFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.clientCertFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.oauthToken | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.oauthTokenFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.label.[LabelName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.annotation.[AnnotationName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.label.[LabelName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.annotation.[AnnotationName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.node.selector.[labelKey] | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driverEnv.[EnvironmentVariableName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.secrets.[SecretName] | 2.3.0 | SPARK-22757 | 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.secrets.[SecretName] | 2.3.0 | SPARK-22757 | 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.secretKeyRef.[EnvName] | 2.4.0 | SPARK-24232 | 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.secretKeyRef.[EnvName] | 2.4.0 | SPARK-24232 | 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.subPath | 3.0.0 | SPARK-25960 | 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.readOnly | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].options.[OptionName] | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.path | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPath | 3.0.0 | SPARK-25960 | 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.readOnly | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].options.[OptionName] | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'

### How was this patch tested?
Exists UT

Closes #27875 from beliefer/add-version-to-k8s-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 09:54:08 +09:00
gatorsmile 28b8713036 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT
### What changes were proposed in this pull request?
This patch is to bump the master branch version to 3.1.0-SNAPSHOT.

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27698 from gatorsmile/updateVersion.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 19:44:31 -08:00
Holden Karau d273a2bb0f [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
This PR is based on an existing/previou PR - https://github.com/apache/spark/pull/19045

### What changes were proposed in this pull request?

This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache.

There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing

### Why are the changes needed?

With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required.

### Does this PR introduce any user-facing change?

There is no API change, however an additional configuration flag is added to enable/disable this behaviour.

### How was this patch tested?

New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s.

Closes #26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-02-14 12:36:52 -08:00
Thomas Graves 496f6ac860 [SPARK-29148][CORE] Add stage level scheduling dynamic allocation and scheduler backend changes
### What changes were proposed in this pull request?

This is another PR for stage level scheduling. In particular this adds changes to the dynamic allocation manager and the scheduler backend to be able to track what executors are needed per ResourceProfile.  Note the api is still private to Spark until the entire feature gets in, so this functionality will be there but only usable by tests for profiles other then the DefaultProfile.

The main changes here are simply tracking things on a ResourceProfile basis as well as sending the executor requests to the scheduler backend for all ResourceProfiles.

I introduce a ResourceProfileManager in this PR that will track all the actual ResourceProfile objects so that we can keep them all in a single place and just pass around and use in datastructures the resource profile id. The resource profile id can be used with the ResourceProfileManager to get the actual ResourceProfile contents.

There are various places in the code that use executor "slots" for things.  The ResourceProfile adds functionality to keep that calculation in it.   This logic is more complex then it should due to standalone mode and mesos coarse grained not setting the executor cores config. They default to all cores on the worker, so calculating slots is harder there.
This PR keeps the functionality to make the cores the limiting resource because the scheduler still uses that for "slots" for a few things.

This PR does also add the resource profile id to the Stage and stage info classes to be able to test things easier.   That full set of changes will come with the scheduler PR that will be after this one.

The PR stops at the scheduler backend pieces for the cluster manager and the real YARN support hasn't been added in this PR, that again will be in a separate PR, so this has a few of the API changes up to the cluster manager and then just uses the default profile requests to continue.

The code for the entire feature is here for reference: https://github.com/apache/spark/pull/27053/files although it needs to be upmerged again as well.

### Why are the changes needed?

Needed for stage level scheduling feature.

### Does this PR introduce any user-facing change?

No user facing api changes added here.

### How was this patch tested?

Lots of unit tests and manually testing. I tested on yarn, k8s, standalone, local modes. Ran both failure and success cases.

Closes #27313 from tgravescs/SPARK-29148.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-02-12 16:45:42 -06:00
yudovin f9f06eee98 [SPARK-30122][K8S] Support spark.kubernetes.authenticate.executor.serviceAccountName
### What changes were proposed in this pull request?

Currently, it doesn't seem to be possible to have Spark Driver set the serviceAccountName for executor pods it launches.

### Why are the changes needed?

it will allow settings serviceAccountName for executors pods.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

It was covered by unit test.

Closes #27034 from ayudovin/srevice-account-name-for-executor-pods.

Authored-by: yudovin <artsiom.yudovin@profitero.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 14:16:59 -08:00
Onur Satici 86fdb818bf [SPARK-30715][K8S] Bump fabric8 to 4.7.1
### What changes were proposed in this pull request?
Bump fabric8 kubernetes-client to 4.7.1

### Why are the changes needed?
New fabric8 version brings support for Kubernetes 1.17 clusters.
Full release notes:
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.0
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.1

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing unit and integration tests cover creation of K8S objects. Adjusted them to work with the new fabric8 version

Closes #27443 from onursatici/os/bump-fabric8.

Authored-by: Onur Satici <onursatici@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 01:17:30 -08:00
Thomas Graves 878094f972 [SPARK-30689][CORE][YARN] Add resource discovery plugin api to support YARN versions with resource scheduling
### What changes were proposed in this pull request?

This change is to allow custom resource scheduler (GPUs,FPGAs,etc) resource discovery to be more flexible. Users are asking for it to work with hadoop 2.x versions that do not support resource scheduling in YARN and/or also they may not run in an isolated environment.
This change creates a plugin api that users can write their own resource discovery class that allows a lot more flexibility. The user can chain plugins for different resource types. The user specified plugins execute in the order specified and will fall back to use the discovery script plugin if they don't return information for a particular resource.

I had to open up a few of the classes to be public and change them to not be case classes and make them developer api in order for the the plugin to get enough information it needs.

I also relaxed the yarn side so that if yarn isn't configured for resource scheduling we just warn and go on. This helps users that have yarn 3.1 but haven't configured the resource scheduling side on their cluster yet, or aren't running in isolated environment.

The user would configured this like:
--conf spark.resources.discovery.plugin="org.apache.spark.resource.ResourceDiscoveryFPGAPlugin, org.apache.spark.resource.ResourceDiscoveryGPUPlugin"

Note the executor side had to be wrapped with a classloader to make sure we include the user classpath for jars they specified on submission.

Note this is more flexible because the discovery script has limitations such as spawning it in a separate process. This means if you are trying to allocate resources in that process they might be released when the script returns. Other things are the class makes it more flexible to be able to integrate with existing systems and solutions for assigning resources.

### Why are the changes needed?

to more easily use spark resource scheduling with older versions of hadoop or in non-isolated enivronments.

### Does this PR introduce any user-facing change?

Yes a plugin api

### How was this patch tested?

Unit tests added and manual testing done on yarn and standalone modes.

Closes #27410 from tgravescs/hadoop27spark3.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-01-31 22:20:28 -06:00
Jiaxin Shan f86a1b9590 [SPARK-30626][K8S] Add SPARK_APPLICATION_ID into driver pod env
### What changes were proposed in this pull request?
Add SPARK_APPLICATION_ID environment when spark configure driver pod.

### Why are the changes needed?
Currently, driver doesn't have this in environments and it's no convenient to retrieve spark id.
The use case is we want to look up spark application id and create application folder and redirect driver logs to application folder.

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
unit tested. I also build new distribution and container image to kick off a job in Kubernetes and I do see SPARK_APPLICATION_ID added there. .

Closes #27347 from Jeffwan/SPARK-30626.

Authored-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-24 12:00:30 -08:00
xushiwei 00425595 f14061c6a4 [SPARK-30371][K8S] Add spark.kubernetes.driver.master conf
### What changes were proposed in this pull request?

make KUBERNETES_MASTER_INTERNAL_URL configurable

### Why are the changes needed?

we do not always use the default port number 443 to access our kube-apiserver, and even in some mulit-tenant cluster,  people do not use the service `kubernetes.default.svc` to access the kube-apiserver, so make the internal master configurable is necessary。

### Does this PR introduce any user-facing change?

user can configure the internal master url by
```
--conf spark.kubernetes.internal.master=https://kubernetes.default.svc:6443
```

### How was this patch tested?

run in multi-cluster that do not use the https://kubernetes.default.svc to access the kube-apiserver

Closes #27029 from wackxu/internalmaster.

Authored-by: xushiwei 00425595 <xushiwei5@huawei.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-19 14:14:45 -08:00
Marcelo Vanzin dca838058f [SPARK-29950][K8S] Blacklist deleted executors in K8S with dynamic allocation
The issue here is that when Spark is downscaling the application and deletes
a few pod requests that aren't needed anymore, it may actually race with the
K8S scheduler, who may be bringing up those executors. So they may have enough
time to connect back to the driver, register, to just be deleted soon after.
This wastes resources and causes misleading entries in the driver log.

The change (ab)uses the blacklisting mechanism to consider the deleted excess
pods as blacklisted, so that if they try to connect back, the driver will deny
it.

It also changes the executor registration slightly, since even with the above
change there were misleading logs. That was because the executor registration
message was an RPC that always succeeded (bar network issues), so the executor
would always try to send an unregistration message to the driver, which would
then log several messages about not knowing anything about the executor. The
change makes the registration RPC succeed or fail directly, instead of using
the separate failure message that would lead to this issue.

Note the last change required some changes in a standalone test suite related
to dynamic allocation, since it relied on the driver not throwing exceptions
when a duplicate executor registration happened.

Tested with existing unit tests, and with live cluster with dyn alloc on.

Closes #26586 from vanzin/SPARK-29950.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2020-01-16 13:37:11 -08:00
Yuming Wang 696288f623 [INFRA] Reverts commit 56dcd79 and c216ef1
### What changes were proposed in this pull request?
1. Revert "Preparing development version 3.0.1-SNAPSHOT": 56dcd79

2. Revert "Preparing Spark release v3.0.0-preview2-rc2": c216ef1

### Why are the changes needed?
Shouldn't change master.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master

Closes #26915 from wangyum/revert-master.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-16 19:57:44 -07:00
Yuming Wang 56dcd79992 Preparing development version 3.0.1-SNAPSHOT 2019-12-17 01:57:27 +00:00
Yuming Wang c216ef1d03 Preparing Spark release v3.0.0-preview2-rc2 2019-12-17 01:57:21 +00:00
Dongjoon Hyun cc276f8a6e [SPARK-30243][BUILD][K8S] Upgrade K8s client dependency to 4.6.4
### What changes were proposed in this pull request?

This PR aims to upgrade K8s client library from 4.6.1 to 4.6.4 for `3.0.0-preview2`.

### Why are the changes needed?

This will bring the latest bug fixes.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.4
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.3
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.2

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with K8s integration test.

Closes #26874 from dongjoon-hyun/SPARK-30243.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-13 08:25:51 -08:00
Marcelo Vanzin b095232f63 [SPARK-29865][K8S] Ensure client-mode executors have same name prefix
This basically does what BasicDriverFeatureStep already does to achieve the
same thing in cluster mode; but since that class (or any other feature) is
not invoked in client mode, it needs to be done elsewhere.

I also modified the client mode integration test to check the executor name
prefix; while there I had to fix the minikube backend to parse the output
from newer minikube versions (I have 1.5.2).

Closes #26488 from vanzin/SPARK-29865.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Erik Erlandson <eerlands@redhat.com>
2019-11-14 15:52:39 -07:00
Xingbo Jiang 8207c835b4 Revert "Prepare Spark release v3.0.0-preview-rc2"
This reverts commit 007c873ae3.
2019-10-30 17:45:44 -07:00
Xingbo Jiang 007c873ae3 Prepare Spark release v3.0.0-preview-rc2
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the sparkR version number check logic to allow jvm version like `3.0.0-preview`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A
2019-10-30 17:42:59 -07:00
Xingbo Jiang b33a58c0c6 Revert "Prepare Spark release v3.0.0-preview-rc1"
This reverts commit 5eddbb5f1d.
2019-10-28 22:32:34 -07:00
Xingbo Jiang 5eddbb5f1d Prepare Spark release v3.0.0-preview-rc1
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the PySpark version from `3.0.0.dev0` to `3.0.0`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A

Closes #26243 from jiangxb1987/3.0.0-preview-prepare.

Lead-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2019-10-28 22:31:29 -07:00
igor.calabria 78bdcfade1 [SPARK-27812][K8S] Bump K8S client version to 4.6.1
### What changes were proposed in this pull request?

Updated kubernetes client.

### Why are the changes needed?

https://issues.apache.org/jira/browse/SPARK-27812
https://issues.apache.org/jira/browse/SPARK-27927

We need this fix https://github.com/fabric8io/kubernetes-client/pull/1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in https://github.com/apache/spark/pull/25785

### Does this PR introduce any user-facing change?

Nope, it should be transparent to users

### How was this patch tested?

This patch was tested manually using a simple pyspark job

```python
from pyspark.sql import SparkSession

if __name__ == '__main__':
    spark = SparkSession.builder.getOrCreate()
```

The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running

```
"OkHttp WebSocket https://10.96.0.1/..." #121 prio=5 os_prio=0 tid=0x00007fb27c005800 nid=0x24b waiting on condition [0x00007fb300847000]
"OkHttp WebSocket https://10.96.0.1/..." #117 prio=5 os_prio=0 tid=0x00007fb28c004000 nid=0x247 waiting on condition [0x00007fb300e4b000]
```
This is caused by a bug on `kubernetes-client` library, which is fixed on the version that we are upgrading to.

When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.3 is restored and both processes terminate successfully

Closes #26093 from igorcalabria/k8s-client-update.

Authored-by: igor.calabria <igor.calabria@ubee.in>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-17 12:23:24 -07:00
maruilei f800fa3831 [SPARK-29436][K8S] Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario
### What changes were proposed in this pull request?

Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario.

### Why are the changes needed?

If there is no such function, spark can not support the case of k8s multi-scheduler scenario.

### Does this PR introduce any user-facing change?

Yes, users can add scheduler name through configuration.

### How was this patch tested?

Manually tested with spark + k8s cluster

Closes #26088 from merrily01/SPARK-29436.

Authored-by: maruilei <maruilei@jd.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-17 07:24:13 -07:00
Kent Yao 02c5b4f763 [SPARK-28947][K8S] Status logging not happens at an interval for liveness
### What changes were proposed in this pull request?

This pr invoke the start method of `LoggingPodStatusWatcherImpl` for status logging at intervals.

### Why are the changes needed?

This pr invoke the start method of `LoggingPodStatusWatcherImpl` is declared but never called

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

manually test

Closes #25648 from yaooqinn/SPARK-28947.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-10-15 12:34:39 -07:00
maruilei 77510c602a [SPARK-29233][K8S] Add regex expression checks for executorEnv…
### What changes were proposed in this pull request?

In kubernetes, there are some naming regular expression requirements and restrictions on environment variable names, such as:

- In kubernetes version release-1.7 and earlier, the naming rules of pod environment variable names should meet the requirements of regular expressions: [[A-Za-z_] [A-Za-z0-9_]*](https://github.com/kubernetes/kubernetes/blob/release-1.7/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L169)
- In kubernetes version release-1.8 and later, the naming rules of pod environment variable names should meet the requirements of regular expressions: [[-. _ A-ZA-Z][-. _ A-ZA-Z0-9].*](https://github.com/kubernetes/kubernetes/blob/release-1.8/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L305)

However, in spark on k8s mode, spark should add restrictions on environmental variable names when creating executorEnv.

In addition, we need to use regular expressions adapted to the high version of k8s to increase the restrictions on the names of environmental variables.

Otherwise, the pod will not be created properly and the spark application will be suspended.

To solve the problem above, a regular validation to executorEnv is added and committed. 

### Why are the changes needed?

If no validation rules are added, the environment variable names that don't meet the requirements will cause the pod to not be created properly and the application will be suspended.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Add unit tests and manually run.

Closes #25920 from merrily01/SPARK-29233.

Authored-by: maruilei <maruilei@jd.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-06 09:41:11 -05:00
Dongjoon Hyun bd031c2173 [SPARK-29307][BUILD][TESTS] Remove scalatest deprecation warnings
### What changes were proposed in this pull request?

This PR aims to remove `scalatest` deprecation warnings with the following changes.
- `org.scalatest.mockito.MockitoSugar` -> `org.scalatestplus.mockito.MockitoSugar`
- `org.scalatest.selenium.WebBrowser` -> `org.scalatestplus.selenium.WebBrowser`
- `org.scalatest.prop.Checkers` -> `org.scalatestplus.scalacheck.Checkers`
- `org.scalatest.prop.GeneratorDrivenPropertyChecks` -> `org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks`

### Why are the changes needed?

According to the Jenkins logs, there are 118 warnings about this.
```
 grep "is deprecated" ~/consoleText | grep scalatest | wc -l
     118
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

After Jenkins passes, we need to check the Jenkins log.

Closes #25982 from dongjoon-hyun/SPARK-29307.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-30 21:00:11 -07:00
Sean Owen e1ea806b30 [SPARK-29291][CORE][SQL][STREAMING][MLLIB] Change procedure-like declaration to function + Unit for 2.13
### What changes were proposed in this pull request?

Scala 2.13 emits a deprecation warning for procedure-like declarations:

```
def foo() {
 ...
```

This is equivalent to the following, so should be changed to avoid a warning:

```
def foo(): Unit = {
  ...
```

### Why are the changes needed?

It will avoid about a thousand compiler warnings when we start to support Scala 2.13. I wanted to make the change in 3.0 as there are less likely to be back-ports from 3.0 to 2.4 than 3.1 to 3.0, for example, minimizing that downside to touching so many files.

Unfortunately, that makes this quite a big change.

### Does this PR introduce any user-facing change?

No behavior change at all.

### How was this patch tested?

Existing tests.

Closes #25968 from srowen/SPARK-29291.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-30 10:03:23 -07:00
Andy Grove 35d4edffa2 [SPARK-28921][BUILD][K8S] Upgrade kubernetes client to 4.4.2
### What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

### Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Pass the Jenkins and manually test on EKS.

Closes #25640 from andygrove/SPARK-28921.

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-02 16:50:58 -07:00
Dongjoon Hyun f7c9de9035 [SPARK-28765][BUILD] Add explict exclusions to avoid JDK11 dependency issue
### What changes were proposed in this pull request?

This PR adds explicit exclusions to avoid Maven `JDK11` dependency issues.

### Why are the changes needed?

Maven/Ivy seems to be confused during dependency generation on `JDK11` environment.
This is not only wrong, but also causes a Jenkins failure during dependency manifest check on `JDK11` environment.

**JDK8**
```
$ cd core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG]       org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2)
[DEBUG]          org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]          javax.validation:validation-api:jar:2.0.1.Final:compile
```

**JDK11**
```
[DEBUG]       org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2)
[DEBUG]          org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]          javax.validation:validation-api:jar:2.0.1.Final:compile
[DEBUG]          jakarta.xml.bind:jakarta.xml.bind-api🫙2.3.2:compile
[DEBUG]             jakarta.activation:jakarta.activation-api🫙1.2.1:compile
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Do the following in both `JDK8` and `JDK11` environment. The dependency manifest should not be changed. In the current `master` branch, `JDK11` changes the dependency manifest.
```
$ dev/test-dependencies.sh --replace-manifest
```

Closes #25481 from dongjoon-hyun/SPARK-28765.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-17 10:16:22 -07:00
Marcelo Vanzin 0343854f54 [SPARK-28487][K8S] More responsive dynamic allocation with K8S
This change implements a few changes to the k8s pod allocator so
that it behaves a little better when dynamic allocation is on.

(i) Allow the application to ramp up immediately when there's a
change in the target number of executors. Without this change,
scaling would only trigger when a change happened in the state of
the cluster, e.g. an executor going down, or when the periodical
snapshot was taken (default every 30s).

(ii) Get rid of pending pod requests, both acknowledged (i.e. Spark
knows that a pod is pending resource allocation) and unacknowledged
(i.e. Spark has requested the pod but the API server hasn't created it
yet), when they're not needed anymore. This avoids starting those
executors to just remove them after the idle timeout, wasting resources
in the meantime.

(iii) Re-work some of the code to avoid unnecessary logging. While not
bad without dynamic allocation, the existing logging was very chatty
when dynamic allocation was on. With the changes, all the useful
information is still there, but only when interesting changes happen.

(iv) Gracefully shut down executors when they become idle. Just deleting
the pod causes a lot of ugly logs to show up, so it's better to ask pods
to exit nicely. That also allows Spark to respect the "don't delete
pods" option when dynamic allocation is on.

Tested on a small k8s cluster running different TPC-DS workloads.

Closes #25236 from vanzin/SPARK-28487.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-08-13 17:29:54 -07:00
Junjie Chen 780d176136 [SPARK-28042][K8S] Support using volume mount as local storage
## What changes were proposed in this pull request?

This pr is used to support using hostpath/PV volume mounts as local storage. In KubernetesExecutorBuilder.scala, the LocalDrisFeatureStep is built before MountVolumesFeatureStep which means we cannot use any volumes mount later. This pr adjust the order of feature building steps which moves localDirsFeature at last so that we can check if directories in SPARK_LOCAL_DIRS are set to volumes mounted such as hostPath, PV, or others.

## How was this patch tested?
Unit tests

Closes #24879 from chenjunjiedada/SPARK-28042.

Lead-authored-by: Junjie Chen <jimmyjchen@tencent.com>
Co-authored-by: Junjie Chen <cjjnjust@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-07-29 10:44:17 -07:00
Onur Satici e7c97a3d86 [SPARK-28145][K8S] safe runnable in polling executor source
## What changes were proposed in this pull request?

Add error handling to `ExecutorPodsPollingSnapshotSource`

Closes #24952 from onursatici/os/polling-source.

Authored-by: Onur Satici <onursatici@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-06-28 09:38:43 -05:00
Xiangrui Meng 7056e004ee [SPARK-27823][CORE] Refactor resource handling code
## What changes were proposed in this pull request?

Continue the work from https://github.com/apache/spark/pull/24821. Refactor resource handling code to make the code more readable. Major changes:

* Moved resource-related classes to `spark.resource` from `spark`.
* Added ResourceUtils and helper classes so we don't need to directly deal with Spark conf.
 * ResourceID: resource identifier and it provides conf keys
 * ResourceRequest/Allocation: abstraction for requested and allocated resources
* Added `TestResourceIDs` to reference commonly used resource IDs in tests like `spark.executor.resource.gpu`.

cc: tgravescs jiangxb1987 Ngone51

## How was this patch tested?

Unit tests for added utils and existing unit tests.

Closes #24856 from mengxr/SPARK-27823.

Lead-authored-by: Xiangrui Meng <meng@databricks.com>
Co-authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2019-06-18 17:18:17 -07:00
Stavros Kontopoulos 7912ab85a6 [SPARK-27872][K8S] Fix executor service account inconsistency
## What changes were proposed in this pull request?

Fixes the service account inconsistency that breaks pull secrets. It gives the option to the user to setup a specific service account for the executors if he has to
(via `spark.kubernetes.authenticate.executor.serviceAccountName`). Defaults to the driver's one.
We are not supporting special authentication credentials for the executors with this PR.

## How was this patch tested?

Tested manually by launching a Spark job exercising the introduced settings.
Added a new integration tests for this fix.

Closes #24748 from skonto/fix_executor_sa.

Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-06-09 16:28:37 -05:00
Thomas Graves d30284b5a5 [SPARK-27760][CORE] Spark resources - change user resource config from .count to .amount
## What changes were proposed in this pull request?

Change the resource config spark.{executor/driver}.resource.{resourceName}.count to .amount to allow future usage of containing both a count and a unit.  Right now we only support counts - # of gpus for instance, but in the future we may want to support units for things like memory - 25G. I think making the user only have to specify a single config .amount is better then making them specify 2 separate configs of a .count and then a .unit.  Change it now since its a user facing config.

Amount also matches how the spark on yarn configs are setup.

## How was this patch tested?

Unit tests and manually verified on yarn and local cluster mode

Closes #24810 from tgravescs/SPARK-27760-amount.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2019-06-06 14:16:05 -05:00
Thomas Graves 1277f8fa92 [SPARK-27362][K8S] Resource Scheduling support for k8s
## What changes were proposed in this pull request?

Add ability to map the spark resource configs spark.{executor/driver}.resource.{resourceName} to kubernetes Container builder so that we request resources (gpu,s/fpgas/etc) from kubernetes.
Note that the spark configs will overwrite any resource configs users put into a pod template.
I added a generic vendor config which is only used by kubernetes right now.  I intentionally didn't put it into the kubernetes config namespace just to avoid adding more config prefixes.

I will add more documentation for this under jira SPARK-27492. I think it will be easier to do all at once to get cohesive story.

## How was this patch tested?

Unit tests and manually testing on k8s cluster.

Closes #24703 from tgravescs/SPARK-27362.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2019-05-31 15:26:14 -05:00
Yuming Wang db3e746b64 [SPARK-27875][CORE][SQL][ML][K8S] Wrap all PrintWriter with Utils.tryWithResource
## What changes were proposed in this pull request?

This pr wrap all `PrintWriter` with `Utils.tryWithResource` to prevent resource leak.

## How was this patch tested?

Existing test

Closes #24739 from wangyum/SPARK-27875.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-05-30 19:54:32 +09:00
Stavros Kontopoulos 5e74570c8f [SPARK-23153][K8S] Support client dependencies with a Hadoop Compatible File System
## What changes were proposed in this pull request?
- solves the current issue with --packages in cluster mode (there is no ticket for it). Also note of some [issues](https://issues.apache.org/jira/browse/SPARK-22657) of the past here when hadoop libs are used at the spark submit side.
- supports spark.jars, spark.files, app jar.

It works as follows:
Spark submit uploads the deps to the HCFS. Then the driver serves the deps via the Spark file server.
No hcfs uris are propagated.

The related design document is [here](https://docs.google.com/document/d/1peg_qVhLaAl4weo5C51jQicPwLclApBsdR1To2fgc48/edit). the next option to add is the RSS but has to be improved given the discussion in the past about it (Spark 2.3).
## How was this patch tested?

- Run integration test suite.
- Run an example using S3:

```
 ./bin/spark-submit \
...
 --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.6 \
 --deploy-mode cluster \
 --name spark-pi \
 --class org.apache.spark.examples.SparkPi \
 --conf spark.executor.memory=1G \
 --conf spark.kubernetes.namespace=spark \
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
 --conf spark.driver.memory=1G \
 --conf spark.executor.instances=2 \
 --conf spark.sql.streaming.metricsEnabled=true \
 --conf "spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp" \
 --conf spark.kubernetes.container.image.pullPolicy=Always \
 --conf spark.kubernetes.container.image=skonto/spark:k8s-3.0.0 \
 --conf spark.kubernetes.file.upload.path=s3a://fdp-stavros-test \
 --conf spark.hadoop.fs.s3a.access.key=... \
 --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
 --conf spark.hadoop.fs.s3a.fast.upload=true \
 --conf spark.kubernetes.executor.deleteOnTermination=false \
 --conf spark.hadoop.fs.s3a.secret.key=... \
 --conf spark.files=client:///...resolv.conf \
file:///my.jar **
```
Added integration tests based on [Ceph nano](https://github.com/ceph/cn). Looks very [active](http://www.sebastien-han.fr/blog/2019/02/24/Ceph-nano-is-getting-better-and-better/).
Unfortunately minio needs hadoop >= 2.8.

Closes #23546 from skonto/support-client-deps.

Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: Erik Erlandson <eerlands@redhat.com>
2019-05-22 16:15:42 -07:00
Arun Mahadevan 1a8c09334d [SPARK-27754][K8S] Introduce additional config (spark.kubernetes.driver.request.cores) for driver request cores for spark on k8s
## What changes were proposed in this pull request?

Spark on k8s supports config for specifying the executor cpu requests
(spark.kubernetes.executor.request.cores) but a similar config is missing
for the driver. Instead, currently `spark.driver.cores` value is used for integer value.

Although `pod spec` can have `cpu` for the fine-grained control like the following, this PR proposes additional configuration `spark.kubernetes.driver.request.cores` for driver request cores.
```
resources:
  requests:
    memory: "64Mi"
    cpu: "250m"
```

## How was this patch tested?

Unit tests

Closes #24630 from arunmahadevan/SPARK-27754.

Authored-by: Arun Mahadevan <arunm@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-18 21:28:46 -07:00
Adi Muraru 8ef4da753d [SPARK-27610][YARN] Shade netty native libraries
## What changes were proposed in this pull request?

Fixed the `spark-<version>-yarn-shuffle.jar` artifact packaging to shade the native netty libraries:
- shade the `META-INF/native/libnetty_*` native libraries when packagin
the yarn shuffle service jar. This is required as netty library loader
derives that based on shaded package name.
- updated the `org/spark_project` shade package prefix to `org/sparkproject`
(i.e. removed underscore) as the former breaks the netty native lib loading.

This was causing the yarn external shuffle service to fail
when spark.shuffle.io.mode=EPOLL

## How was this patch tested?
Manual tests

Closes #24502 from amuraru/SPARK-27610_master.

Authored-by: Adi Muraru <amuraru@adobe.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-05-07 10:47:36 -07:00
Sean Owen d4420b455a [SPARK-27323][CORE][SQL][STREAMING] Use Single-Abstract-Method support in Scala 2.12 to simplify code
## What changes were proposed in this pull request?

Use Single Abstract Method syntax where possible (and minor related cleanup). Comments below. No logic should change here.

## How was this patch tested?

Existing tests.

Closes #24241 from srowen/SPARK-27323.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-02 07:37:05 -07:00
Yuming Wang b670f39fc6 [SPARK-24793][FOLLOW-UP][K8S] Remove duplicate declaration of mockito-core
## What changes were proposed in this pull request?

```
[WARNING] Some problems were encountered while building the effective model for org.apache.spark:spark-kubernetes_2.12🫙3.0.0-SNAPSHOT
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.mockito:mockito-core:jar -> duplicate declaration of version (?)  org.apache.spark:spark-kubernetes_2.12:[unknown-version], /Users/yumwang/spark/resource-managers/kubernetes/core/pom.xml, line 98, column 17
```
This pr remove duplicate declaration of `mockito-core`.

## How was this patch tested?

N/A

Closes #24256 from wangyum/SPARK-24793-FOLLOW-UP.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-30 21:29:32 -07:00
Stavros Kontopoulos 05168e725d [SPARK-24793][K8S] Enhance spark-submit for app management
- supports `--kill` & `--status` flags.
- supports globs which is useful in general check this long standing [issue](https://github.com/kubernetes/kubernetes/issues/17144#issuecomment-272052461) for kubectl.

Manually against running apps. Example output:

Submission Id reported at launch time:

```
2019-01-20 23:47:56 INFO  Client:58 - Waiting for application spark-pi with submissionId spark:spark-pi-1548020873671-driver to finish...
```

Killing the app:

```
./bin/spark-submit --kill spark:spark-pi-1548020873671-driver --master  k8s://https://192.168.2.8:8443
2019-01-20 23:48:07 WARN  Utils:70 - Your hostname, universe resolves to a loopback address: 127.0.0.1; using 192.168.2.8 instead (on interface wlp2s0)
2019-01-20 23:48:07 WARN  Utils:70 - Set SPARK_LOCAL_IP if you need to bind to another address

```

App terminates with 143 (SIGTERM, since we have tiny this should lead to [graceful shutdown](https://cloud.google.com/solutions/best-practices-for-building-containers)):

```
2019-01-20 23:48:08 INFO  LoggingPodStatusWatcherImpl:58 - State changed, new state:
	 pod name: spark-pi-1548020873671-driver
	 namespace: spark
	 labels: spark-app-selector -> spark-e4730c80e1014b72aa77915a2203ae05, spark-role -> driver
	 pod uid: 0ba9a794-1cfd-11e9-8215-a434d9270a65
	 creation time: 2019-01-20T21:47:55Z
	 service account name: spark-sa
	 volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm
	 node name: minikube
	 start time: 2019-01-20T21:47:55Z
	 phase: Running
	 container status:
		 container name: spark-kubernetes-driver
		 container image: skonto/spark:k8s-3.0.0
		 container state: running
		 container started at: 2019-01-20T21:48:00Z
2019-01-20 23:48:09 INFO  LoggingPodStatusWatcherImpl:58 - State changed, new state:
	 pod name: spark-pi-1548020873671-driver
	 namespace: spark
	 labels: spark-app-selector -> spark-e4730c80e1014b72aa77915a2203ae05, spark-role -> driver
	 pod uid: 0ba9a794-1cfd-11e9-8215-a434d9270a65
	 creation time: 2019-01-20T21:47:55Z
	 service account name: spark-sa
	 volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm
	 node name: minikube
	 start time: 2019-01-20T21:47:55Z
	 phase: Failed
	 container status:
		 container name: spark-kubernetes-driver
		 container image: skonto/spark:k8s-3.0.0
		 container state: terminated
		 container started at: 2019-01-20T21:48:00Z
		 container finished at: 2019-01-20T21:48:08Z
		 exit code: 143
		 termination reason: Error
2019-01-20 23:48:09 INFO  LoggingPodStatusWatcherImpl:58 - Container final statuses:
	 container name: spark-kubernetes-driver
	 container image: skonto/spark:k8s-3.0.0
	 container state: terminated
	 container started at: 2019-01-20T21:48:00Z
	 container finished at: 2019-01-20T21:48:08Z
	 exit code: 143
	 termination reason: Error
2019-01-20 23:48:09 INFO  Client:58 - Application spark-pi with submissionId spark:spark-pi-1548020873671-driver finished.
2019-01-20 23:48:09 INFO  ShutdownHookManager:58 - Shutdown hook called
2019-01-20 23:48:09 INFO  ShutdownHookManager:58 - Deleting directory /tmp/spark-f114b2e0-5605-4083-9203-a4b1c1f6059e

```

Glob scenario:

```
./bin/spark-submit --status spark:spark-pi* --master  k8s://https://192.168.2.8:8443
2019-01-20 22:27:44 WARN  Utils:70 - Your hostname, universe resolves to a loopback address: 127.0.0.1; using 192.168.2.8 instead (on interface wlp2s0)
2019-01-20 22:27:44 WARN  Utils:70 - Set SPARK_LOCAL_IP if you need to bind to another address
Application status (driver):
	 pod name: spark-pi-1547948600328-driver
	 namespace: spark
	 labels: spark-app-selector -> spark-f13f01702f0b4503975ce98252d59b94, spark-role -> driver
	 pod uid: c576e1c6-1c54-11e9-8215-a434d9270a65
	 creation time: 2019-01-20T01:43:22Z
	 service account name: spark-sa
	 volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm
	 node name: minikube
	 start time: 2019-01-20T01:43:22Z
	 phase: Running
	 container status:
		 container name: spark-kubernetes-driver
		 container image: skonto/spark:k8s-3.0.0
		 container state: running
		 container started at: 2019-01-20T01:43:27Z
Application status (driver):
	 pod name: spark-pi-1547948792539-driver
	 namespace: spark
	 labels: spark-app-selector -> spark-006d252db9b24f25b5069df357c30264, spark-role -> driver
	 pod uid: 38375b4b-1c55-11e9-8215-a434d9270a65
	 creation time: 2019-01-20T01:46:35Z
	 service account name: spark-sa
	 volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm
	 node name: minikube
	 start time: 2019-01-20T01:46:35Z
	 phase: Succeeded
	 container status:
		 container name: spark-kubernetes-driver
		 container image: skonto/spark:k8s-3.0.0
		 container state: terminated
		 container started at: 2019-01-20T01:46:39Z
		 container finished at: 2019-01-20T01:46:56Z
		 exit code: 0
		 termination reason: Completed

```

Closes #23599 from skonto/submit_ops_extension.

Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-03-26 11:55:03 -07:00
Jiaxin Shan 2d0b7cfe44 [SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2
## What changes were proposed in this pull request?
https://github.com/apache/spark/pull/23814 was reverted because of Jenkins integration tests failure. After minikube upgrade, Kubernetes client SDK v1.4.2 work with kubernetes v1.13. We can bring this change back.

Reference:
[Bump Kubernetes Client Version to 4.1.2](https://issues.apache.org/jira/browse/SPARK-26742)
[Original PR against master](https://github.com/apache/spark/pull/23814)
[Kubernetes client upgrade for Spark 2.4](https://github.com/apache/spark/pull/23993)

## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Unit Tests:
```
All tests passed.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [  2.343 s]
[INFO] Spark Project Tags ................................. SUCCESS [  2.039 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 12.714 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  2.185 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 38.154 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  7.989 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  2.297 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  2.813 s]
[INFO] Spark Project Core ................................. SUCCESS [38:03 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [  3.848 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 56.084 s]
[INFO] Spark Project Streaming ............................ SUCCESS [04:58 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [06:39 min]
[INFO] Spark Project SQL .................................. SUCCESS [37:12 min]
[INFO] Spark Project ML Library ........................... SUCCESS [18:59 min]
[INFO] Spark Project Tools ................................ SUCCESS [  0.767 s]
[INFO] Spark Project Hive ................................. SUCCESS [33:45 min]
[INFO] Spark Project REPL ................................. SUCCESS [01:14 min]
[INFO] Spark Project Assembly ............................. SUCCESS [  1.444 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:12 min]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [  6.719 s]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [07:00 min]
[INFO] Spark Project Examples ............................. SUCCESS [ 21.805 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  0.906 s]
[INFO] Spark Avro ......................................... SUCCESS [ 50.486 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  02:32 h
[INFO] Finished at: 2019-03-07T08:39:34Z
[INFO] ------------------------------------------------------------------------

```

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes #24002 from Jeffwan/update_k8s_sdk_master.

Authored-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-03-13 15:04:27 -07:00
chandulal.kavar d4542a8ba8 [SPARK-27061][K8S] Expose Driver UI port on driver service to access …
## What changes were proposed in this pull request?

Expose Spark UI port on driver service to access logs from service.

## How was this patch tested?

The patch was tested using unit tests being contributed as a part of the PR

Closes #23990 from chandulal/SPARK-27061.

Authored-by: chandulal.kavar <cckavar@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-03-11 10:41:31 -07:00
Onur Satici e9e8bb33ef [SPARK-27023][K8S] Make k8s client timeouts configurable
## What changes were proposed in this pull request?

Make k8s client timeouts configurable. No test suite exists for the client factory class, happy to add one if needed

Closes #23928 from onursatici/os/k8s-client-timeouts.

Lead-authored-by: Onur Satici <osatici@palantir.com>
Co-authored-by: Onur Satici <onursatici@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-03-06 11:14:39 -08:00
Marcelo Vanzin 14f714fb30 [SPARK-26420][K8S] Generate more unique IDs when creating k8s resource names.
Using the current time as an ID is more prone to clashes than people generally
realize, so try to make things a bit more unique without necessarily using a
UUID, which would eat too much space in the names otherwise.

The implemented approach uses some bits from the current time, plus some random
bits, which should be more resistant to clashes.

Closes #23805 from vanzin/SPARK-26420.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-02-28 20:39:13 -08:00
Marcelo Vanzin a6ddc9d083 [SPARK-24736][K8S] Let spark-submit handle dependency resolution.
Before this change, there was some code in the k8s backend to deal
with how to resolve dependencies and make them available to the
Spark application. It turns out that none of that code is necessary,
since spark-submit already handles all that for applications started
in client mode - like the k8s driver that is run inside a Spark-created
pod.

For that reason, specifically for pyspark, there's no need for the
k8s backend to deal with PYTHONPATH; or, in general, to change the URIs
provided by the user at all. spark-submit takes care of that.

For testing, I created a pyspark script that depends on another module
that is shipped with --py-files. Then I used:

- --py-files http://.../dep.py http://.../test.py
- --py-files http://.../dep.zip http://.../test.py
- --py-files local:/.../dep.py local:/.../test.py
- --py-files local:/.../dep.zip local:/.../test.py

Without this change, all of the above commands fail. With the change, the
driver is able to see the dependencies in all the above cases; but executors
don't see the dependencies in the last two. That's a bug in shared Spark code
that deals with local: dependencies in pyspark (SPARK-26934).

I also tested a Scala app using the main jar from an http server.

Closes #23793 from vanzin/SPARK-24736.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-02-27 09:49:31 -08:00
liuxian 7912dbb88f [MINOR] Simplify boolean expression
## What changes were proposed in this pull request?

Comparing whether Boolean expression is equal to true is redundant
For example:
The datatype of `a` is boolean.
Before:
if (a == true)
After:
if (a)

## How was this patch tested?
N/A

Closes #23884 from 10110346/simplifyboolean.

Authored-by: liuxian <liu.xian3@zte.com.cn>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-02-27 08:38:00 -06:00
Marcelo Vanzin afbff6446f Revert "[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2"
This reverts commit a3192d966a.
2019-02-26 13:42:07 -08:00
Jiaxin Shan a3192d966a [SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2
## What changes were proposed in this pull request?
Changed the `kubernetes-client` version to 4.1.2.  Latest version fix error with exec credentials (used by aws eks) and this will be used to talk with kubernetes API server. Users can submit spark job to EKS api endpoint now with this patch.

## How was this patch tested?
unit tests and manual tests.

Closes #23814 from Jeffwan/update_k8s_sdk.

Authored-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-02-25 04:56:04 -06:00
Marcelo Vanzin 61c3cdc706 [SPARK-24894][K8S] Make sure valid host names are created for executors.
Since the host name is derived from the app name, which can contain arbitrary
characters, it needs to be sanitized so that only valid characters are allowed.

On top of that, take extra care that truncation doesn't leave characters that
are valid except at the start of a host name.

Closes #23781 from vanzin/SPARK-24894.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-02-19 15:19:59 -08:00
Marcelo Vanzin 2a67dbfbd3 [SPARK-26595][CORE] Allow credential renewal based on kerberos ticket cache.
This change addes a new mode for credential renewal that does not require
a keytab; it uses the local ticket cache instead, so it works while the
user keeps the cache valid.

This can be useful for, e.g., people running long spark-shell sessions where
their kerberos login is kept up-to-date.

The main change to enable this behavior is in HadoopDelegationTokenManager,
with a small change in the HDFS token provider. The other changes are to avoid
creating duplicate tokens when submitting the application to YARN; they allow
the tokens from the scheduler to be sent to the YARN AM, reducing the round trips
to HDFS.

For that, the scheduler initialization code was changed a little bit so that
the tokens are available when the YARN client is initialized. That basically
takes care of a long-standing TODO that was in the code to clean up configuration
propagation to the driver's RPC endpoint (in CoarseGrainedSchedulerBackend).

Tested with an app designed to stress this functionality, with both keytab and
cache-based logins. Some basic kerberos tests on k8s also.

Closes #23525 from vanzin/SPARK-26595.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-01-28 13:32:34 -08:00
Rob Vesse c542c247bb [SPARK-25887][K8S] Configurable K8S context support
This enhancement allows for specifying the desired context to use for the initial K8S client auto-configuration.  This allows users to more easily access alternative K8S contexts without having to first
explicitly change their current context via kubectl.

Explicitly set my K8S context to a context pointing to a non-existent cluster, then launched Spark jobs with explicitly specified contexts via the new `spark.kubernetes.context` configuration property.

Example Output:

```
> kubectl config current-context
minikube
> minikube status
minikube: Stopped
cluster:
kubectl:
> ./spark-submit --master k8s://https://localhost:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.context=docker-for-desktop --conf spark.kubernetes.container.image=rvesse/spark:debian local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4
18/10/31 11:57:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/31 11:57:51 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using context docker-for-desktop from users K8S config file
18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new state:
	 pod name: spark-pi-1540987071845-driver
	 namespace: default
	 labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver
	 pod uid: 32462cac-dd04-11e8-b6c6-025000000001
	 creation time: 2018-10-31T11:57:52Z
	 service account name: default
	 volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv
	 node name: N/A
	 start time: N/A
	 phase: Pending
	 container status: N/A
18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new state:
	 pod name: spark-pi-1540987071845-driver
	 namespace: default
	 labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver
	 pod uid: 32462cac-dd04-11e8-b6c6-025000000001
	 creation time: 2018-10-31T11:57:52Z
	 service account name: default
	 volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv
	 node name: docker-for-desktop
	 start time: N/A
	 phase: Pending
	 container status: N/A
...
18/10/31 11:58:03 INFO LoggingPodStatusWatcherImpl: State changed, new state:
	 pod name: spark-pi-1540987071845-driver
	 namespace: default
	 labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver
	 pod uid: 32462cac-dd04-11e8-b6c6-025000000001
	 creation time: 2018-10-31T11:57:52Z
	 service account name: default
	 volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv
	 node name: docker-for-desktop
	 start time: 2018-10-31T11:57:52Z
	 phase: Succeeded
	 container status:
		 container name: spark-kubernetes-driver
		 container image: rvesse/spark:debian
		 container state: terminated
		 container started at: 2018-10-31T11:57:54Z
		 container finished at: 2018-10-31T11:58:02Z
		 exit code: 0
		 termination reason: Completed
```

Without the `spark.kubernetes.context` setting this will fail because the current context - `minikube` - is pointing to a non-running cluster e.g.

```
> ./spark-submit --master k8s://https://localhost:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.container.image=rvesse/spark:debian local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4
18/10/31 12:02:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/31 12:02:30 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
18/10/31 12:02:31 WARN WatchConnectionManager: Exec Failure
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
	at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509)
	at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
	at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
	at sun.security.ssl.Handshaker.process_record(Handshaker.java:914)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
	at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281)
	at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:66)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:109)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
	at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
	at sun.security.validator.Validator.validate(Validator.java:260)
	at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
	at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
	at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1491)
	... 39 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
	at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
	at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
	... 45 more
Exception in thread "kubernetes-dispatcher-0" Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask611a9c09 rejected from java.util.concurrent.ScheduledThreadPoolExecutor404819e4[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
	at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
	at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
	at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632)
	at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678)
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:300)
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:48)
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:213)
	at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
	at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:204)
	at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
	at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
	at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509)
	at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
	at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
	at sun.security.ssl.Handshaker.process_record(Handshaker.java:914)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
	at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281)
	at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:66)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:109)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135)
	... 4 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
	at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
	at sun.security.validator.Validator.validate(Validator.java:260)
	at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
	at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
	at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1491)
	... 39 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
	at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
	at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
	... 45 more
18/10/31 12:02:31 INFO ShutdownHookManager: Shutdown hook called
18/10/31 12:02:31 INFO ShutdownHookManager: Deleting directory /private/var/folders/6b/y1010qp107j9w2dhhy8csvz0000xq3/T/spark-5e649891-8a0f-4f17-bf3a-33b34082eba8
```

Suggested reviews: mccheah liyinan926 - this is the follow up fix to the bug discovered while working on SPARK-25809 (PR #22805)

Closes #22904 from rvesse/SPARK-25887.

Authored-by: Rob Vesse <rvesse@dotnetrdf.org>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-01-22 10:25:21 -08:00
Kazuaki Ishizaki 7bf0794651 [SPARK-26463][CORE] Use ConfigEntry for hardcoded configs for scheduler categories.
## What changes were proposed in this pull request?

The PR makes hardcoded `spark.dynamicAllocation`, `spark.scheduler`, `spark.rpc`, `spark.task`, `spark.speculation`, and `spark.cleaner` configs to use `ConfigEntry`.

## How was this patch tested?

Existing tests

Closes #23416 from kiszk/SPARK-26463.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-01-22 07:44:36 -06:00
Jungtaek Lim (HeartSaVioR) 38f030725c [SPARK-26466][CORE] Use ConfigEntry for hardcoded configs for submit categories.
## What changes were proposed in this pull request?

The PR makes hardcoded configs below to use `ConfigEntry`.

* spark.kryo
* spark.kryoserializer
* spark.serializer
* spark.jars
* spark.files
* spark.submit
* spark.deploy
* spark.worker

This patch doesn't change configs which are not relevant to SparkConf (e.g. system properties).

## How was this patch tested?

Existing tests.

Closes #23532 from HeartSaVioR/SPARK-26466-v2.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-01-16 20:57:21 -06:00
Jungtaek Lim (HeartSaVioR) d9e4cf67c0 [SPARK-26482][CORE] Use ConfigEntry for hardcoded configs for ui categories
## What changes were proposed in this pull request?

The PR makes hardcoded configs below to use `ConfigEntry`.

* spark.ui
* spark.ssl
* spark.authenticate
* spark.master.rest
* spark.master.ui
* spark.metrics
* spark.admin
* spark.modify.acl

This patch doesn't change configs which are not relevant to SparkConf (e.g. system properties).

## How was this patch tested?

Existing tests.

Closes #23423 from HeartSaVioR/SPARK-26466.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-01-11 10:18:07 -08:00
Marcelo Vanzin 669e8a1559 [SPARK-25689][YARN] Make driver, not AM, manage delegation tokens.
This change modifies the behavior of the delegation token code when running
on YARN, so that the driver controls the renewal, in both client and cluster
mode. For that, a few different things were changed:

* The AM code only runs code that needs DTs when DTs are available.

In a way, this restores the AM behavior to what it was pre-SPARK-23361, but
keeping the fix added in that bug. Basically, all the AM code is run in a
"UGI.doAs()" block; but code that needs to talk to HDFS (basically the
distributed cache handling code) was delayed to the point where the driver
is up and running, and thus when valid delegation tokens are available.

* SparkSubmit / ApplicationMaster now handle user login, not the token manager.

The previous AM code was relying on the token manager to keep the user
logged in when keytabs are used. This required some odd APIs in the token
manager and the AM so that the right UGI was exposed and used in the right
places.

After this change, the logged in user is handled separately from the token
manager, so the API was cleaned up, and, as explained above, the whole AM
runs under the logged in user, which also helps with simplifying some more code.

* Distributed cache configs are sent separately to the AM.

Because of the delayed initialization of the cached resources in the AM, it
became easier to write the cache config to a separate properties file instead
of bundling it with the rest of the Spark config. This also avoids having
to modify the SparkConf to hide things from the UI.

* Finally, the AM doesn't manage the token manager anymore.

The above changes allow the token manager to be completely handled by the
driver's scheduler backend code also in YARN mode (whether client or cluster),
making it similar to other RMs. To maintain the fix added in SPARK-23361 also
in client mode, the AM now sends an extra message to the driver on initialization
to fetch delegation tokens; and although it might not really be needed, the
driver also keeps the running AM updated when new tokens are created.

Tested in a kerberized cluster with the same tests used to validate SPARK-23361,
in both client and cluster mode. Also tested with a non-kerberized cluster.

Closes #23338 from vanzin/SPARK-25689.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2019-01-07 14:40:08 -06:00
Dongjoon Hyun e15a319ccd
[SPARK-26536][BUILD][TEST] Upgrade Mockito to 2.23.4
## What changes were proposed in this pull request?

This PR upgrades Mockito from 1.10.19 to 2.23.4. The following changes are required.

- Replace `org.mockito.Matchers` with `org.mockito.ArgumentMatchers`
- Replace `anyObject` with `any`
- Replace `getArgumentAt` with `getArgument` and add type annotation.
- Use `isNull` matcher in case of `null` is invoked.
```scala
     saslHandler.channelInactive(null);
-    verify(handler).channelInactive(any(TransportClient.class));
+    verify(handler).channelInactive(isNull());
```

- Make and use `doReturn` wrapper to avoid [SI-4775](https://issues.scala-lang.org/browse/SI-4775)
```scala
private def doReturn(value: Any) = org.mockito.Mockito.doReturn(value, Seq.empty: _*)
```

## How was this patch tested?

Pass the Jenkins with the existing tests.

Closes #23452 from dongjoon-hyun/SPARK-26536.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2019-01-04 19:23:38 -08:00
Takuya UESHIN 4419e1daca [SPARK-26445][CORE] Use ConfigEntry for hardcoded configs for driver/executor categories.
## What changes were proposed in this pull request?

The PR makes hardcoded spark.driver, spark.executor, and spark.cores.max configs to use `ConfigEntry`.

Note that some config keys are from `SparkLauncher` instead of defining in the config package object because the string is already defined in it and it does not depend on core module.

## How was this patch tested?

Existing tests.

Closes #23415 from ueshin/issues/SPARK-26445/hardcoded_driver_executor_configs.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-01-04 22:12:35 +08:00
Jungtaek Lim (HeartSaVioR) 05372d188a [SPARK-26489][CORE] Use ConfigEntry for hardcoded configs for python/r categories
## What changes were proposed in this pull request?

The PR makes hardcoded configs below to use ConfigEntry.

* spark.pyspark
* spark.python
* spark.r

This patch doesn't change configs which are not relevant to SparkConf (e.g. system properties, python source code)

## How was this patch tested?

Existing tests.

Closes #23428 from HeartSaVioR/SPARK-26489.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-01-03 14:30:27 -08:00
Marcelo Vanzin 4b3fe3a9cc [SPARK-25815][K8S] Support kerberos in client mode, keytab-based token renewal.
This change hooks up the k8s backed to the updated HadoopDelegationTokenManager,
so that delegation tokens are also available in client mode, and keytab-based token
renewal is enabled.

The change re-works the k8s feature steps related to kerberos so
that the driver does all the credential management and provides all
the needed information to executors - so nothing needs to be added
to executor pods. This also makes cluster mode behave a lot more
similarly to client mode, since no driver-related config steps are run
in the latter case.

The main two things that don't need to happen in executors anymore are:

- adding the Hadoop config to the executor pods: this is not needed
  since the Spark driver will serialize the Hadoop config and send
  it to executors when running tasks.

- mounting the kerberos config file in the executor pods: this is
  not needed once you remove the above. The Hadoop conf sent by
  the driver with the tasks is already resolved (i.e. has all the
  kerberos names properly defined), so executors do not need access
  to the kerberos realm information anymore.

The change also avoids creating delegation tokens unnecessarily.
This means that they'll only be created if a secret with tokens
was not provided, and if a keytab is not provided. In either of
those cases, the driver code will handle delegation tokens: in
cluster mode by creating a secret and stashing them, in client
mode by using existing mechanisms to send DTs to executors.

One last feature: the change also allows defining a keytab with
a "local:" URI. This is supported in client mode (although that's
the same as not saying "local:"), and in k8s cluster mode. This
allows the keytab to be mounted onto the image from a pre-existing
secret, for example.

Finally, the new code always sets SPARK_USER in the driver and
executor pods. This is in line with how other resource managers
behave: the submitting user reflects which user will access
Hadoop services in the app. (With kerberos, that's overridden
by the logged in user.) That user is unrelated to the OS user
the app is running as inside the containers.

Tested:
- client and cluster mode with kinit
- cluster mode with keytab
- cluster mode with local: keytab
- YARN cluster with keytab (to make sure it isn't broken)

Closes #22911 from vanzin/SPARK-25815.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2018-12-18 13:30:09 -08:00
suxingfate 114d0de14c [SPARK-25922][K8] Spark Driver/Executor "spark-app-selector" label mismatch
## What changes were proposed in this pull request?

In K8S Cluster mode, the algorithm to generate spark-app-selector/spark.app.id of spark driver is different with spark executor.
This patch makes sure spark driver and executor to use the same spark-app-selector/spark.app.id if spark.app.id is set, otherwise it will use superclass applicationId.

In K8S Client mode, spark-app-selector/spark.app.id for executors will use superclass applicationId.

## How was this patch tested?

Manually run."

Closes #23322 from suxingfate/SPARK-25922.

Lead-authored-by: suxingfate <suxingfate@163.com>
Co-authored-by: xinglwang <xinglwang@ebay.com>
Signed-off-by: Yinan Li <ynli@google.com>
2018-12-17 13:36:57 -08:00
Marcelo Vanzin a63e7b2a21 [SPARK-25877][K8S] Move all feature logic to feature classes.
This change makes the driver and executor builders a lot simpler
by encapsulating almost all feature logic into the respective
feature classes. The only logic that remains is the creation of
the initial pod, which needs to happen before anything else so
is better to be left in the builder class.

Most feature classes already behave fine when the config has nothing
they should handle, but a few minor tweaks had to be added. Unit
tests were also updated or added to account for these.

The builder suites were simplified a lot and just test the remaining
pod-related code in the builders themselves.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #23220 from vanzin/SPARK-25877.
2018-12-12 12:01:21 -08:00
mcheah 57d6fbfa8c [SPARK-26239] File-based secret key loading for SASL.
This proposes an alternative way to load secret keys into a Spark application that is running on Kubernetes. Instead of automatically generating the secret, the secret key can reside in a file that is shared between both the driver and executor containers.

Unit tests.

Closes #23252 from mccheah/auth-secret-with-file.

Authored-by: mcheah <mcheah@palantir.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2018-12-11 13:50:16 -08:00
Marcelo Vanzin dbd90e5440 [SPARK-26194][K8S] Auto generate auth secret for k8s apps.
This change modifies the logic in the SecurityManager to do two
things:

- generate unique app secrets also when k8s is being used
- only store the secret in the user's UGI on YARN

The latter is needed so that k8s won't unnecessarily create
k8s secrets for the UGI credentials when only the auth token
is stored there.

On the k8s side, the secret is propagated to executors using
an environment variable instead. This ensures it works in both
client and cluster mode.

Security doc was updated to mention the feature and clarify that
proper access control in k8s should be enabled for it to be secure.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #23174 from vanzin/SPARK-26194.
2018-12-06 14:17:13 -08:00
Stavros Kontopoulos a24e1a126c [SPARK-26256][K8S] Fix labels for pod deletion
## What changes were proposed in this pull request?
Adds proper labels when deleting executor pods.

## How was this patch tested?
Manually with tests.

Closes #23209 from skonto/fix-deletion-labels.

Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2018-12-03 14:57:18 -08:00
Stavros Kontopoulos 0c2935b01d [SPARK-25515][K8S] Adds a config option to keep executor pods for debugging
## What changes were proposed in this pull request?
Keeps K8s executor resources present if case of failure or normal termination.
Introduces a new boolean config option: `spark.kubernetes.deleteExecutors`, with default value set to true.
The idea is to update Spark K8s backend structures but leave the resources around.
The assumption is that since entries are not removed from the `removedExecutorsCache` we are immune to updates that refer to the the executor resources previously removed.
The only delete operation not touched is the one in the `doKillExecutors` method.
Reason is right now we dont support [blacklisting](https://issues.apache.org/jira/browse/SPARK-23485) and dynamic allocation with Spark on K8s. In both cases in the future we might want to handle these scenarios although its more complicated.
More tests can be added if approach is approved.
## How was this patch tested?
Manually by running a Spark job and verifying pods are not deleted.

Closes #23136 from skonto/keep_pods.

Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: Yinan Li <ynli@google.com>
2018-12-03 09:02:47 -08:00
Marcelo Vanzin 6be272b75b [SPARK-25876][K8S] Simplify kubernetes configuration types.
There are a few issues with the current configuration types used in
the kubernetes backend:

- they use type parameters for role-specific specialization, which makes
  type signatures really noisy throughout the code base.

- they break encapsulation by forcing the code that creates the config
  object to remove the configuration from SparkConf before creating the
  k8s-specific wrapper.

- they don't provide an easy way for tests to have default values for
  fields they do not use.

This change fixes those problems by:

- creating a base config type with role-specific specialization using
  inheritance

- encapsulating the logic of parsing SparkConf into k8s-specific views
  inside the k8s config classes

- providing some helper code for tests to easily override just the part
  of the configs they want.

Most of the change relates to the above, especially cleaning up the
tests. While doing that, I also made some smaller changes elsewhere:

- removed unnecessary type parameters in KubernetesVolumeSpec

- simplified the error detection logic in KubernetesVolumeUtils; all
  the call sites would just throw the first exception collected by
  that class, since they all called "get" on the "Try" object. Now
  the unnecessary wrapping is gone and the exception is just thrown
  where it occurs.

- removed a lot of unnecessary mocking from tests.

- changed the kerberos-related code so that less logic needs to live
  in the driver builder. In spirit it should be part of the upcoming
  work in this series of cleanups, but it made parts of this change
  simpler.

Tested with existing unit tests and integration tests.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #22959 from vanzin/SPARK-25876.
2018-11-30 16:23:37 -08:00
Nihar Sheth 3df307aa51 [SPARK-25960][K8S] Support subpath mounting with Kubernetes
## What changes were proposed in this pull request?

This PR adds configurations to use subpaths with Spark on k8s. Subpaths (https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath) allow the user to specify a path within a volume to use instead of the volume's root.

## How was this patch tested?

Added unit tests. Ran SparkPi on a cluster with event logging pointed at a subpath-mount and verified the driver host created and used the subpath.

Closes #23026 from NiharS/k8s_subpath.

Authored-by: Nihar Sheth <niharrsheth@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2018-11-26 11:06:02 -08:00
Sean Owen 32365f8177 [SPARK-26090][CORE][SQL][ML] Resolve most miscellaneous deprecation and build warnings for Spark 3
## What changes were proposed in this pull request?

The build has a lot of deprecation warnings. Some are new in Scala 2.12 and Java 11. We've fixed some, but I wanted to take a pass at fixing lots of easy miscellaneous ones here.

They're too numerous and small to list here; see the pull request. Some highlights:

- `BeanInfo` is deprecated in 2.12, and BeanInfo classes are pretty ancient in Java. Instead, case classes can explicitly declare getters
- Eta expansion of zero-arg methods; foo() becomes () => foo() in many cases
- Floating-point Range is inexact and deprecated, like 0.0 to 100.0 by 1.0
- finalize() is finally deprecated (just needs to be suppressed)
- StageInfo.attempId was deprecated and easiest to remove here

I'm not now going to touch some chunks of deprecation warnings:

- Parquet deprecations
- Hive deprecations (particularly serde2 classes)
- Deprecations in generated code (mostly Thriftserver CLI)
- ProcessingTime deprecations (we may need to revive this class as internal)
- many MLlib deprecations because they concern methods that may be removed anyway
- a few Kinesis deprecations I couldn't figure out
- Mesos get/setRole, which I don't know well
- Kafka/ZK deprecations (e.g. poll())
- Kinesis
- a few other ones that will probably resolve by deleting a deprecated method

## How was this patch tested?

Existing tests, including manual testing with the 2.11 build and Java 11.

Closes #23065 from srowen/SPARK-26090.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-11-19 09:16:42 -06:00
DB Tsai ad853c5678
[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0
## What changes were proposed in this pull request?

This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds.

We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11.

## How was this patch tested?

existing tests

Closes #22967 from dbtsai/scala2.12.

Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2018-11-14 16:22:23 -08:00
Marcelo Vanzin 3404a73f4c [SPARK-25875][K8S] Merge code to set up driver command into a single step.
Right now there are 3 different classes dealing with building the driver
command to run inside the pod, one for each "binding" supported by Spark.
This has two main shortcomings:

- the code in the 3 classes is very similar; changing things in one place
  would probably mean making a similar change in the others.

- it gives the false impression that the step implementation is the only
  place where binding-specific logic is needed. That is not true; there
  was code in KubernetesConf that was binding-specific, and there's also
  code in the executor-specific config step. So the 3 classes weren't really
  working as a language-specific abstraction.

On top of that, the current code was propagating command line parameters in
a different way depending on the binding. That doesn't seem necessary, and
in fact using environment variables for command line parameters is in general
a really bad idea, since you can't handle special characters (e.g. spaces)
that way.

This change merges the 3 different code paths for Java, Python and R into
a single step, and also merges the 3 code paths to start the Spark driver
in the k8s entry point script. This increases the amount of shared code,
and also moves more feature logic into the step itself, so it doesn't live
in KubernetesConf.

Note that not all logic related to setting up the driver lives in that
step. For example, the memory overhead calculation still lives separately,
except it now happens in the driver config step instead of outside the
step hierarchy altogether.

Some of the noise in the diff is because of changes to KubernetesConf, which
will be addressed in a separate change.

Tested with new and updated unit tests + integration tests.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #22897 from vanzin/SPARK-25875.
2018-11-02 13:58:08 -07:00
Rob Vesse fc8222298e [SPARK-25809][K8S][TEST] New K8S integration testing backends
## What changes were proposed in this pull request?

Currently K8S integration tests are hardcoded to use a `minikube` based backend.  `minikube` is VM based so can be resource hungry and also doesn't cope well with certain networking setups (for example using Cisco AnyConnect software VPN `minikube` is unusable as it detects its own IP incorrectly).

This PR Adds a new K8S integration testing backend that allows for using the Kubernetes support in [Docker for Desktop](https://blog.docker.com/2018/07/kubernetes-is-now-available-in-docker-desktop-stable-channel/).  It also generalises the framework to be able to run the integration tests against an arbitrary Kubernetes cluster.

To Do:

- [x] General Kubernetes cluster backend
- [x] Documentation on Kubernetes integration testing
- [x] Testing of general K8S backend
- [x] Check whether change from timestamps being `Time` to `String` in Fabric 8 upgrade needs additional fix up

## How was this patch tested?

Ran integration tests with Docker for Desktop and all passed:

![screen shot 2018-10-23 at 14 19 56](https://user-images.githubusercontent.com/2104864/47363460-c5816a00-d6ce-11e8-9c15-56b34698e797.png)

Suggested Reviewers: ifilonenko srowen

Author: Rob Vesse <rvesse@dotnetrdf.org>

Closes #22805 from rvesse/SPARK-25809.
2018-11-01 09:33:55 -07:00
Marcelo Vanzin 68dde3481e [SPARK-23781][CORE] Merge token renewer functionality into HadoopDelegationTokenManager.
This avoids having two classes to deal with tokens; now the above
class is a one-stop shop for dealing with delegation tokens. The
YARN backend extends that class instead of doing composition like
before, resulting in a bit less code there too.

The renewer functionality is basically the same code that used to
be in YARN's AMCredentialRenewer. That is also the reason why the
public API of HadoopDelegationTokenManager is a little bit odd;
the YARN AM has some odd requirements for how this all should be
initialized, and the weirdness is needed currently to support that.

Tested:
- YARN with stress app for DT renewal
- Mesos and K8S with basic kerberos tests (both tgt and keytab)

Closes #22624 from vanzin/SPARK-23781.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2018-10-31 13:00:10 -05:00
Onur Satici f6cc354d83 [SPARK-24434][K8S] pod template files
## What changes were proposed in this pull request?

New feature to pass podspec files for driver and executor pods.

## How was this patch tested?
new unit and integration tests

- [x] more overwrites in integration tests
- [ ] invalid template integration test, documentation

Author: Onur Satici <osatici@palantir.com>
Author: Yifei Huang <yifeih@palantir.com>
Author: onursatici <onursatici@gmail.com>

Closes #22146 from onursatici/pod-template.
2018-10-30 13:52:44 -07:00
Ilan Filonenko e9b71c8f01 [SPARK-25828][K8S] Bumping Kubernetes-Client version to 4.1.0
## What changes were proposed in this pull request?

Changed the `kubernetes-client` version and refactored code that broke as a result

## How was this patch tested?

Unit and Integration tests

Closes #22820 from ifilonenko/SPARK-25828.

Authored-by: Ilan Filonenko <ifilondz@gmail.com>
Signed-off-by: Erik Erlandson <eerlands@redhat.com>
2018-10-26 15:59:12 -07:00
Ilan Filonenko 19ada15d1b [SPARK-24516][K8S] Change Python default to Python3
## What changes were proposed in this pull request?

As this is targeted for 3.0.0 and Python2 will be deprecated by Jan 1st, 2020, I feel it is appropriate to change the default to Python3. Especially as these projects [found here](https://python3statement.org/) are deprecating their support.

## How was this patch tested?

Unit and Integration tests

Author: Ilan Filonenko <ifilondz@gmail.com>

Closes #22810 from ifilonenko/SPARK-24516.
2018-10-24 23:29:47 -07:00
Mike Kaplinskiy ffe256ce16 [SPARK-25730][K8S] Delete executor pods from kubernetes after figuring out why they died
## What changes were proposed in this pull request?

`removeExecutorFromSpark` tries to fetch the reason the executor exited from Kubernetes, which may be useful if the pod was OOMKilled. However, the code previously deleted the pod from Kubernetes first which made retrieving this status impossible. This fixes the ordering.

On a separate but related note, it would be nice to wait some time before removing the pod - to let the operator examine logs and such.

## How was this patch tested?

Running on my local cluster.

Author: Mike Kaplinskiy <mike.kaplinskiy@gmail.com>

Closes #22720 from mikekap/patch-1.
2018-10-21 11:32:33 -07:00
Ilan Filonenko 6c9c84ffb9 [SPARK-23257][K8S] Kerberos Support for Spark on K8S
## What changes were proposed in this pull request?
This is the work on setting up Secure HDFS interaction with Spark-on-K8S.
The architecture is discussed in this community-wide google [doc](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg)
This initiative can be broken down into 4 Stages

**STAGE 1**
- [x] Detecting `HADOOP_CONF_DIR` environmental variable and using Config Maps to store all Hadoop config files locally, while also setting `HADOOP_CONF_DIR` locally in the driver / executors

**STAGE 2**
- [x] Grabbing `TGT` from `LTC` or using keytabs+principle and creating a `DT` that will be mounted as a secret or using a pre-populated secret

**STAGE 3**
- [x] Driver

**STAGE 4**
- [x] Executor

## How was this patch tested?
Locally tested on a single-noded, pseudo-distributed Kerberized Hadoop Cluster
- [x] E2E Integration tests https://github.com/apache/spark/pull/22608
- [ ] Unit tests

## Docs and Error Handling?
- [x] Docs
- [x] Error Handling

## Contribution Credit
kimoonkim skonto

Closes #21669 from ifilonenko/secure-hdfs.

Lead-authored-by: Ilan Filonenko <if56@cornell.edu>
Co-authored-by: Ilan Filonenko <ifilondz@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2018-10-15 15:48:51 -07:00
gatorsmile 9bf397c0e4 [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
## What changes were proposed in this pull request?

This patch is to bump the master branch version to 3.0.0-SNAPSHOT.

## How was this patch tested?
N/A

Closes #22606 from gatorsmile/bump3.0.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2018-10-02 08:48:24 -07:00
Prashant Sharma 4da541a5d2
[SPARK-25543][K8S] Print debug message iff execIdsRemovedInThisRound is not empty.
## What changes were proposed in this pull request?

Spurious logs like /sec.
2018-09-26 09:33:57 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors with ids  from Spark that were either found to be deleted or non-existent in the cluster.
2018-09-26 09:33:58 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors with ids  from Spark that were either found to be deleted or non-existent in the cluster.
2018-09-26 09:33:59 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors with ids  from Spark that were either found to be deleted or non-existent in the cluster.
2018-09-26 09:34:00 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors with ids  from Spark that were either found to be deleted or non-existent in the cluster.

The fix is easy, first check if there are any removed executors, before producing the log message.

## How was this patch tested?

Tested by manually deploying to a minikube cluster.

Closes #22565 from ScrapCodes/spark-25543/k8s/debug-log-spurious-warning.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2018-09-30 14:28:20 -07:00
hyukjinkwon a2f502cf53 [SPARK-25565][BUILD] Add scalastyle rule to check add Locale.ROOT to .toLowerCase and .toUpperCase for internal calls
## What changes were proposed in this pull request?

This PR adds a rule to force `.toLowerCase(Locale.ROOT)` or `toUpperCase(Locale.ROOT)`.

It produces an error as below:

```
[error]       Are you sure that you want to use toUpperCase or toLowerCase without the root locale? In most cases, you
[error]       should use toUpperCase(Locale.ROOT) or toLowerCase(Locale.ROOT) instead.
[error]       If you must use toUpperCase or toLowerCase without the root locale, wrap the code block with
[error]       // scalastyle:off caselocale
[error]       .toUpperCase
[error]       .toLowerCase
[error]       // scalastyle:on caselocale
```

This PR excludes the cases above for SQL code path for external calls like table name, column name and etc.

For test suites, or when it's clear there's no locale problem like Turkish locale problem, it uses `Locale.ROOT`.

One minor problem is, `UTF8String` has both methods, `toLowerCase` and `toUpperCase`, and the new rule detects them as well. They are ignored.

## How was this patch tested?

Manually tested, and Jenkins tests.

Closes #22581 from HyukjinKwon/SPARK-25565.

Authored-by: hyukjinkwon <gurwls223@apache.org>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
2018-09-30 14:31:04 +08:00
gatorsmile bb2f069cf2 [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT
## What changes were proposed in this pull request?
In the dev list, we can still discuss whether the next version is 2.5.0 or 3.0.0. Let us first bump the master branch version to `2.5.0-SNAPSHOT`.

## How was this patch tested?
N/A

Closes #22426 from gatorsmile/bumpVersionMaster.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2018-09-15 16:24:02 -07:00
Stavros Kontopoulos 3e75a9fa24 [SPARK-25295][K8S] Fix executor names collision
## What changes were proposed in this pull request?
Fixes the collision issue with spark executor names in client mode, see SPARK-25295 for the details.
It follows the cluster name convention as app-name will be used as the prefix and if that is not defined we use "spark" as the default prefix. Eg. `spark-pi-1536781360723-exec-1` where spark-pi is the name of the app passed at the config side or transformed if it contains illegal characters.

Also fixes the issue with spark app name having spaces in cluster mode.
If you run the Spark Pi test in client mode it passes.
The tricky part is the user may set the app name:
3030b82c89/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala (L30)
If i do:

```
./bin/spark-submit
...
 --deploy-mode cluster --name "spark pi"
...
```
it will fail as the app name is used for the prefix of driver's pod name and it cannot have spaces (according to k8s conventions).

## How was this patch tested?

Manually by running spark job in client mode.
To reproduce do:
```
kubectl create -f service.yaml
kubectl create -f pod.yaml
```
 service.yaml :
```
kind: Service
apiVersion: v1
metadata:
  name: spark-test-app-1-svc
spec:
  clusterIP: None
  selector:
    spark-app-selector: spark-test-app-1
  ports:
  - protocol: TCP
    name: driver-port
    port: 7077
    targetPort: 7077
  - protocol: TCP
    name: block-manager
    port: 10000
    targetPort: 10000
```
pod.yaml:

```
apiVersion: v1
kind: Pod
metadata:
  name: spark-test-app-1
  labels:
    spark-app-selector: spark-test-app-1
spec:
  containers:
  - name: spark-test
    image: skonto/spark:k8s-client-fix
    imagePullPolicy: Always
    command:
      - 'sh'
      - '-c'
      -  "/opt/spark/bin/spark-submit
              --verbose
              --master k8s://https://kubernetes.default.svc
              --deploy-mode client
              --class org.apache.spark.examples.SparkPi
              --conf spark.app.name=spark
              --conf spark.executor.instances=1
              --conf spark.kubernetes.container.image=skonto/spark:k8s-client-fix
              --conf spark.kubernetes.container.image.pullPolicy=Always
              --conf spark.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token
              --conf spark.kubernetes.authenticate.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              --conf spark.executor.memory=500m
              --conf spark.executor.cores=1
              --conf spark.executor.instances=1
              --conf spark.driver.host=spark-test-app-1-svc.default.svc
              --conf spark.driver.port=7077
              --conf spark.driver.blockManager.port=10000
              local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar 1000000"
```

Closes #22405 from skonto/fix-k8s-client-mode-executor-names.

Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: Yinan Li <ynli@google.com>
2018-09-12 22:02:59 -07:00
Ilan Filonenko 1cfda44825 [SPARK-25021][K8S] Add spark.executor.pyspark.memory limit for K8S
## What changes were proposed in this pull request?

Add spark.executor.pyspark.memory limit for K8S

## How was this patch tested?

Unit and Integration tests

Closes #22298 from ifilonenko/SPARK-25021.

Authored-by: Ilan Filonenko <if56@cornell.edu>
Signed-off-by: Holden Karau <holden@pigscanfly.ca>
2018-09-08 22:18:06 -07:00
Rob Vesse da6fa3828b [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S
## What changes were proposed in this pull request?

The default behaviour of Spark on K8S currently is to create `emptyDir` volumes to back `SPARK_LOCAL_DIRS`.  In some environments e.g. diskless compute nodes this may actually hurt performance because these are backed by the Kubelet's node storage which on a diskless node will typically be some remote network storage.

Even if this is enterprise grade storage connected via a high speed interconnect the way Spark uses these directories as scratch space (lots of relatively small short lived files) has been observed to cause serious performance degradation.  Therefore we would like to provide the option to use K8S's ability to instead back these `emptyDir` volumes with `tmpfs`. Therefore this PR adds a configuration option that enables `SPARK_LOCAL_DIRS` to be backed by Memory backed `emptyDir` volumes rather than the default.

Documentation is added to describe both the default behaviour plus this new option and its implications.  One of which is that scratch space then counts towards your pods memory limits and therefore users will need to adjust their memory requests accordingly.

*NB* - This is an alternative version of PR #22256 reduced to just the `tmpfs` piece

## How was this patch tested?

Ran with this option in our diskless compute environments to verify functionality

Author: Rob Vesse <rvesse@dotnetrdf.org>

Closes #22323 from rvesse/SPARK-25262-tmpfs.
2018-09-06 16:18:59 -07:00
Rob Vesse 27d3b0a51c [SPARK-25222][K8S] Improve container status logging
## What changes were proposed in this pull request?

Currently when running Spark on Kubernetes a logger is run by the client that watches the K8S API for events related to the Driver pod and logs them.  However for the container status aspect of the logging this simply dumps the raw object which is not human readable e.g.

![screen shot 2018-08-24 at 10 37 46](https://user-images.githubusercontent.com/2104864/44577799-e0486880-a789-11e8-9ae9-fdeddacbbea8.png)
![screen shot 2018-08-24 at 10 38 14](https://user-images.githubusercontent.com/2104864/44577800-e0e0ff00-a789-11e8-81f5-3bb315dbbdb1.png)

This is despite the fact that the logging class in question actually has methods to pretty print this information but only invokes these at the end of a job.

This PR improves the logging to always use the pretty printing methods, additionally modifying them to include further useful information provided by the K8S API.

A similar issue also exists when tasks are lost that will be addressed by further commits to this PR

- [x] Improved `LoggingPodStatusWatcher`
- [x] Improved container status on task failure

## How was this patch tested?

Built and launched jobs with the updated Spark client and observed the new human readable output:

![screen shot 2018-08-24 at 11 09 32](https://user-images.githubusercontent.com/2104864/44579429-5353de00-a78e-11e8-9228-c750af8e6311.png)
![screen shot 2018-08-24 at 11 09 42](https://user-images.githubusercontent.com/2104864/44579430-5353de00-a78e-11e8-8fce-d5bb2a3ae65f.png)
![screen shot 2018-08-24 at 11 10 13](https://user-images.githubusercontent.com/2104864/44579431-53ec7480-a78e-11e8-9fa2-aeabc5b28ec4.png)
![screen shot 2018-08-24 at 17 47 44](https://user-images.githubusercontent.com/2104864/44596922-db090f00-a7c5-11e8-910c-bc2339f5a196.png)

Suggested reviewers: liyinan926 mccheah

Author: Rob Vesse <rvesse@dotnetrdf.org>

Closes #22215 from rvesse/SPARK-25222.
2018-09-06 16:15:11 -07:00
Ilan Filonenko e1d72f2c07 [SPARK-25264][K8S] Fix comma-delineated arguments passed into PythonRunner and RRunner
## What changes were proposed in this pull request?

Fixes the issue brought up in https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/273 where the arguments were being comma-delineated, which was incorrect wrt to the PythonRunner and RRunner.

## How was this patch tested?

Modified unit test to test this change.

Author: Ilan Filonenko <if56@cornell.edu>

Closes #22257 from ifilonenko/SPARK-25264.
2018-08-31 15:46:45 -07:00
Ilan Filonenko ba84bcb2c4 [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
## What changes were proposed in this pull request?

Introducing R Bindings for Spark R on K8s

- [x] Running SparkR Job

## How was this patch tested?

This patch was tested with

- [x] Unit Tests
- [x] Integration Tests

## Example:

Commands to run example spark job:
1. `dev/make-distribution.sh --pip --r --tgz -Psparkr -Phadoop-2.7 -Pkubernetes`
2. `bin/docker-image-tool.sh -m -t testing build`
3.
```
bin/spark-submit \
    --master k8s://https://192.168.64.33:8443 \
    --deploy-mode cluster \
    --name spark-r \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.container.image=spark-r:testing \
    local:///opt/spark/examples/src/main/r/dataframe.R
```

This above spark-submit command works given the distribution. (Will include this integration test in PR once PRB is ready).

Author: Ilan Filonenko <if56@cornell.edu>

Closes #21584 from ifilonenko/spark-r.
2018-08-17 16:04:02 -07:00
Ilan Filonenko a791c29bd8 [SPARK-23984][K8S] Changed Python Version config to be camelCase
## What changes were proposed in this pull request?

Small formatting change to have Python Version be camelCase as per request during PR review.

## How was this patch tested?

Tested with unit and integration tests

Author: Ilan Filonenko <if56@cornell.edu>

Closes #22095 from ifilonenko/spark-py-edits.
2018-08-15 17:52:12 -07:00
Adelbert Chang f5113ea8d7 [SPARK-24960][K8S] explicitly expose ports on driver container
https://issues.apache.org/jira/browse/SPARK-24960
## What changes were proposed in this pull request?

Expose ports explicitly in the driver container. The driver Service created expects to reach the driver Pod at specific ports which before this change, were not explicitly exposed and would likely cause connection issues (see https://github.com/apache-spark-on-k8s/spark/issues/617).

This is a port of the original PR created in the now-deprecated Kubernetes fork: https://github.com/apache-spark-on-k8s/spark/pull/618

## How was this patch tested?

Failure in https://github.com/apache-spark-on-k8s/spark/issues/617 reproduced on Kubernetes 1.6.x and 1.8.x. Built the driver image with this patch and observed fixed https://github.com/apache-spark-on-k8s/spark/issues/617 on Kubernetes 1.6.x.

Author: Adelbert Chang <Adelbert.Chang@target.com>

Closes #21884 from adelbertc/k8s-expose-driver-ports.
2018-08-01 13:57:33 -07:00
mcheah 571a6f0574 [SPARK-23146][K8S] Support client mode.
## What changes were proposed in this pull request?

Support client mode for the Kubernetes scheduler.

Client mode works more or less identically to cluster mode. However, in client mode, the Spark Context needs to be manually bootstrapped with certain properties which would have otherwise been set up by spark-submit in cluster mode. Specifically:

- If the user doesn't provide a driver pod name, we don't add an owner reference. This is for usage when the driver is not running in a pod in the cluster. In such a case, the driver can only provide a best effort to clean up the executors when the driver exits, but cleaning up the resources is not guaranteed. The executor JVMs should exit if the driver JVM exits, but the pods will still remain in the cluster in a COMPLETED or FAILED state.
- The user must provide a host (spark.driver.host) and port (spark.driver.port) that the executors can connect to. When using spark-submit in cluster mode, spark-submit generates the headless service automatically; in client mode, the user is responsible for setting up their own connectivity.

We also change the authentication configuration prefixes for client mode.

## How was this patch tested?

Adding an integration test to exercise client mode support.

Author: mcheah <mcheah@palantir.com>

Closes #21748 from mccheah/k8s-client-mode.
2018-07-25 11:08:41 -07:00