Commit graph

190 commits

Author SHA1 Message Date
Holden Karau 57d27e900f
[SPARK-31125][K8S] Terminating pods have a deletion timestamp but they are not yet dead
### What changes were proposed in this pull request?

Change what we consider a deleted pod to not include "Terminating"

### Why are the changes needed?

If we get a new snapshot while a pod is in the process of being cleaned up we shouldn't delete the executor until it is fully terminated.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

This should be covered by the decommissioning tests in that they currently are flaky because we sometimes delete the executor instead of allowing it to decommission all the way.

I also ran this in a loop locally ~80 times with the only failures being the PV suite because of unrelated minikube mount issues.

Closes #27905 from holdenk/SPARK-31125-Processing-state-snapshots-incorrect.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-17 12:04:06 -07:00
Pedro Rossi ed06d98044
[SPARK-25355][K8S] Add proxy user to driver if present on spark-submit
### What changes were proposed in this pull request?

This PR adds the proxy user on the spark-submit command to the childArgs, so the proxy user can be retrieved and used in the KubernetesAplication to add the proxy user in the driver container args

### Why are the changes needed?

The proxy user when used on the spark submit doesn't work on the Kubernetes environment since it doesn't add the `--proxy-user` argument on the driver container and when I added it manually to the Pod definition it worked just fine.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Tests were added

Closes #27422 from PedroRossi/SPARK-25355.

Authored-by: Pedro Rossi <pgrr@cin.ufpe.br>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-16 21:53:58 -07:00
beliefer 1254c88034 [SPARK-31118][K8S][DOC] Add version information to the configuration of K8S
### What changes were proposed in this pull request?
Add version information to the configuration of `K8S`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.kubernetes.context | 3.0.0 | SPARK-25887 | c542c247bbfe1214c0bf81076451718a9e8931dc#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.master | 3.0.0 | SPARK-30371 | f14061c6a4729ad419902193aa23575d8f17f597#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.namespace | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image | 2.3.0 | SPARK-22994 | b94debd2b01b87ef1d2a34d48877e38ade0969e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.container.image | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.container.image | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image.pullPolicy | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image.pullSecrets | 2.4.0 | SPARK-23668 | cccaaa14ad775fb981e501452ba2cc06ff5c0f0a#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.requestTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.connectionTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.requestTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.connectionTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
KUBERNETES_AUTH_DRIVER_CONF_PREFIX.serviceAccountName | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 | spark.kubernetes.authenticate.driver
KUBERNETES_AUTH_EXECUTOR_CONF_PREFIX.serviceAccountName | 3.1.0 | SPARK-30122 | f9f06eee9853ad4b6458ac9d31233e729a1ca226#diff-6e882d5561424e7e6651eb46f10104b8 | spark.kubernetes.authenticate.executor
spark.kubernetes.driver.limit.cores | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.request.cores | 3.0.0 | SPARK-27754 | 1a8c09334db87b0e938c38cd6b59d326bdcab3c3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submitInDriver | 2.4.0 | SPARK-22839 | f15906da153f139b698e192ec6f82f078f896f1e#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.limit.cores | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.scheduler.name | 3.0.0 | SPARK-29436 | f800fa383131559c4e841bf062c9775d09190935#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.request.cores | 2.4.0 | SPARK-23285 | fe2b7a4568d65a62da6e6eb00fff05f248b4332c#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.pod.name | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.resourceNamePrefix | 3.0.0 | SPARK-25876 | 6be272b75b4ae3149869e19df193675cc4117763#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podNamePrefix | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.allocation.batch.size | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.allocation.batch.delay | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.lostCheck.maxAttempts | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.waitAppCompletion | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.report.interval | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.apiPollingInterval | 2.4.0 | SPARK-24248 | 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.eventProcessingInterval | 2.4.0 | SPARK-24248 | 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.memoryOverheadFactor | 2.4.0 | SPARK-23984 | 1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.pyspark.pythonVersion | 2.4.0 | SPARK-23984 | a791c29bd824adadfb2d85594bc8dad4424df936#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.krb5.path | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.krb5.configMapName | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.hadoop.configMapName | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.tokenSecret.name | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.tokenSecret.itemKey | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.resource.type | 2.4.1 | SPARK-25021 | 9031c784847353051bc0978f63ef4146ae9095ff#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.local.dirs.tmpfs | 3.0.0 | SPARK-25262 | da6fa3828bb824b65f50122a8a0a0d4741551257#diff-6e882d5561424e7e6651eb46f10104b8 | It exists in branch-3.0, but in pom.xml it is 2.4.0-snapshot
spark.kubernetes.driver.podTemplateFile | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podTemplateFile | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.podTemplateContainerName | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podTemplateContainerName | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.deleteOnTermination | 3.0.0 | SPARK-25515 | 0c2935b01def8a5f631851999d9c2d57b63763e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.dynamicAllocation.deleteGracePeriod | 3.0.0 | SPARK-28487 | 0343854f54b48b206ca434accec99355011560c2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.appKillPodDeletionGracePeriod | 3.0.0 | SPARK-24793 | 05168e725d2a17c4164ee5f9aa068801ec2454f4#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.file.upload.path | 3.0.0 | SPARK-23153 | 5e74570c8f5e7dfc1ca1c53c177827c5cea57bf1#diff-6e882d5561424e7e6651eb46f10104b8 |  
The following appears in the document |   |   |   |  
spark.kubernetes.authenticate.submission.caCertFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.clientKeyFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.clientCertFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.oauthToken | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.oauthTokenFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.caCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.clientKeyFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.clientCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.oauthToken | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.oauthTokenFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.caCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.clientKeyFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.clientCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.oauthTokenFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.caCertFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.clientKeyFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.clientCertFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.oauthToken | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.oauthTokenFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.label.[LabelName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.annotation.[AnnotationName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.label.[LabelName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.annotation.[AnnotationName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.node.selector.[labelKey] | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driverEnv.[EnvironmentVariableName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.secrets.[SecretName] | 2.3.0 | SPARK-22757 | 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.secrets.[SecretName] | 2.3.0 | SPARK-22757 | 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.secretKeyRef.[EnvName] | 2.4.0 | SPARK-24232 | 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.secretKeyRef.[EnvName] | 2.4.0 | SPARK-24232 | 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.subPath | 3.0.0 | SPARK-25960 | 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.readOnly | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].options.[OptionName] | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.path | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPath | 3.0.0 | SPARK-25960 | 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.readOnly | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].options.[OptionName] | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'

### How was this patch tested?
Exists UT

Closes #27875 from beliefer/add-version-to-k8s-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 09:54:08 +09:00
Holden Karau 2825237448 [SPARK-31062][K8S][TESTS] Improve spark decommissioning k8s test reliability
### What changes were proposed in this pull request?

Replace a sleep with waiting for the first collect to happen to try and make the K8s test code more reliable.

### Why are the changes needed?

Currently the Decommissioning test appears to be flaky in Jenkins.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Ran K8s test suite in a loop on minikube on my desktop for 10 iterations without this test failing on any of the runs.

Closes #27858 from holdenk/SPARK-31062-Improve-Spark-Decommissioning-K8s-test-teliability.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-03-11 14:42:31 -07:00
gatorsmile 28b8713036 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT
### What changes were proposed in this pull request?
This patch is to bump the master branch version to 3.1.0-SNAPSHOT.

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27698 from gatorsmile/updateVersion.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 19:44:31 -08:00
Holden Karau d273a2bb0f [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
This PR is based on an existing/previou PR - https://github.com/apache/spark/pull/19045

### What changes were proposed in this pull request?

This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache.

There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing

### Why are the changes needed?

With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required.

### Does this PR introduce any user-facing change?

There is no API change, however an additional configuration flag is added to enable/disable this behaviour.

### How was this patch tested?

New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s.

Closes #26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-02-14 12:36:52 -08:00
Dongjoon Hyun 74cd46eb69 [SPARK-30816][K8S][TESTS] Fix dev-run-integration-tests.sh to ignore empty params
### What changes were proposed in this pull request?

This PR aims to fix `dev-run-integration-tests.sh` to ignore empty params correctly.

### Why are the changes needed?

The following script runs `mvn` integration test like the following.
```
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
...
build/mvn integration-test
-f /Users/dongjoon/APACHE/spark/pom.xml
-pl resource-managers/kubernetes/integration-tests
-am
-Pscala-2.12
-Pkubernetes
-Pkubernetes-integration-tests
-Djava.version=8
-Dspark.kubernetes.test.sparkTgz=N/A
-Dspark.kubernetes.test.imageTag=N/A
-Dspark.kubernetes.test.imageRepo=docker.io/kubespark
-Dspark.kubernetes.test.deployMode=minikube
-Dtest.include.tags=k8s
-Dspark.kubernetes.test.namespace=
-Dspark.kubernetes.test.serviceAccountName=
-Dspark.kubernetes.test.kubeConfigContext=
-Dspark.kubernetes.test.master=
-Dtest.exclude.tags=
-Dspark.kubernetes.test.jvmImage=spark
-Dspark.kubernetes.test.pythonImage=spark-py
-Dspark.kubernetes.test.rImage=spark-r
```

After this PR, the empty parameters like the followings will be skipped like the original design.
```
-Dspark.kubernetes.test.namespace=
-Dspark.kubernetes.test.serviceAccountName=
-Dspark.kubernetes.test.kubeConfigContext=
-Dspark.kubernetes.test.master=
-Dtest.exclude.tags=
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins K8S integration test.

Closes #27566 from dongjoon-hyun/SPARK-30816.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-13 11:42:00 -08:00
Dongjoon Hyun 859699135c [SPARK-30807][K8S][TESTS] Support Java 11 in K8S integration tests
### What changes were proposed in this pull request?

This PR aims to support JDK11 test in K8S integration tests.
- This is an update in testing framework instead of individual tests.
- This will enable JDK11 runtime test when you didn't installed JDK11 on your local system.

### Why are the changes needed?

Apache Spark 3.0.0 adds JDK11 support, but K8s integration tests use JDK8 until now.

### Does this PR introduce any user-facing change?

No. This is a dev-only test-related PR.

### How was this patch tested?

This is irrelevant to Jenkins UT, but Jenkins K8S IT (JDK8) should pass.
- https://github.com/apache/spark/pull/27559#issuecomment-585903489 (JDK8 Passed)

And, manually do the following for JDK11 test.
```
$ NO_MANUAL=1 ./dev/make-distribution.sh --r --pip --tgz -Phadoop-3.2 -Pkubernetes
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --java-image-tag 11-jre-slim --spark-tgz $PWD/spark-*.tgz
```

```
$ docker run -it --rm kubespark/spark:1318DD8A-2B15-4A00-BC69-D0E90CED235B /usr/local/openjdk-11/bin/java --version | tail -n1
OpenJDK 64-Bit Server VM 18.9 (build 11.0.6+10, mixed mode)
```

Closes #27559 from dongjoon-hyun/SPARK-30807.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-13 11:17:27 -08:00
Thomas Graves 496f6ac860 [SPARK-29148][CORE] Add stage level scheduling dynamic allocation and scheduler backend changes
### What changes were proposed in this pull request?

This is another PR for stage level scheduling. In particular this adds changes to the dynamic allocation manager and the scheduler backend to be able to track what executors are needed per ResourceProfile.  Note the api is still private to Spark until the entire feature gets in, so this functionality will be there but only usable by tests for profiles other then the DefaultProfile.

The main changes here are simply tracking things on a ResourceProfile basis as well as sending the executor requests to the scheduler backend for all ResourceProfiles.

I introduce a ResourceProfileManager in this PR that will track all the actual ResourceProfile objects so that we can keep them all in a single place and just pass around and use in datastructures the resource profile id. The resource profile id can be used with the ResourceProfileManager to get the actual ResourceProfile contents.

There are various places in the code that use executor "slots" for things.  The ResourceProfile adds functionality to keep that calculation in it.   This logic is more complex then it should due to standalone mode and mesos coarse grained not setting the executor cores config. They default to all cores on the worker, so calculating slots is harder there.
This PR keeps the functionality to make the cores the limiting resource because the scheduler still uses that for "slots" for a few things.

This PR does also add the resource profile id to the Stage and stage info classes to be able to test things easier.   That full set of changes will come with the scheduler PR that will be after this one.

The PR stops at the scheduler backend pieces for the cluster manager and the real YARN support hasn't been added in this PR, that again will be in a separate PR, so this has a few of the API changes up to the cluster manager and then just uses the default profile requests to continue.

The code for the entire feature is here for reference: https://github.com/apache/spark/pull/27053/files although it needs to be upmerged again as well.

### Why are the changes needed?

Needed for stage level scheduling feature.

### Does this PR introduce any user-facing change?

No user facing api changes added here.

### How was this patch tested?

Lots of unit tests and manually testing. I tested on yarn, k8s, standalone, local modes. Ran both failure and success cases.

Closes #27313 from tgravescs/SPARK-29148.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-02-12 16:45:42 -06:00
Dongjoon Hyun 9d907bc84d [SPARK-30743][K8S][TESTS] Use JRE instead of JDK in K8S test docker image
### What changes were proposed in this pull request?

This PR aims to replace JDK to JRE in K8S integration test docker images.

### Why are the changes needed?

This will save some resources and make it sure that we only need JRE at runtime and testing.
- https://lists.apache.org/thread.html/3145150b711d7806a86bcd3ab43e18bcd0e4892ab5f11600689ba087%40%3Cdev.spark.apache.org%3E

### Does this PR introduce any user-facing change?

No. This is a dev-only test environment.

### How was this patch tested?

Pass the Jenkins K8s Integration Test.
- https://github.com/apache/spark/pull/27469#issuecomment-582681125

Closes #27469 from dongjoon-hyun/SPARK-30743.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 16:55:45 -08:00
yudovin f9f06eee98 [SPARK-30122][K8S] Support spark.kubernetes.authenticate.executor.serviceAccountName
### What changes were proposed in this pull request?

Currently, it doesn't seem to be possible to have Spark Driver set the serviceAccountName for executor pods it launches.

### Why are the changes needed?

it will allow settings serviceAccountName for executors pods.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

It was covered by unit test.

Closes #27034 from ayudovin/srevice-account-name-for-executor-pods.

Authored-by: yudovin <artsiom.yudovin@profitero.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 14:16:59 -08:00
Dongjoon Hyun 9d90c8b898 [SPARK-30738][K8S] Use specific image version in "Launcher client dependencies" test
### What changes were proposed in this pull request?

This PR use a specific version of docker image instead of `latest`. As of today, when I run K8s integration test locally, this test case fails always.

Also, in this PR, I shows two consecutive failures with a dummy change.
- https://github.com/apache/spark/pull/27465#issuecomment-582326614
- https://github.com/apache/spark/pull/27465#issuecomment-582329114
```
- Launcher client dependencies *** FAILED ***
```

After that, I added the patch and K8s Integration test passed.
- https://github.com/apache/spark/pull/27465#issuecomment-582361696

### Why are the changes needed?

[SPARK-28465](https://github.com/apache/spark/pull/25222) switched from `v4.0.0-stable-4.0-master-centos-7-x86_64` to `latest` to catch up the API change. However, the API change seems to occur again. We had better use a specific version to prevent accidental failures.

```scala
- .withImage("ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64")
+ .withImage("ceph/daemon:latest")
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass `Launcher client dependencies` test in Jenkins K8s Integration Suite.
Or, run K8s Integration test locally.

Closes #27465 from dongjoon-hyun/SPARK-K8S-IT.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 11:01:53 -08:00
Onur Satici 86fdb818bf [SPARK-30715][K8S] Bump fabric8 to 4.7.1
### What changes were proposed in this pull request?
Bump fabric8 kubernetes-client to 4.7.1

### Why are the changes needed?
New fabric8 version brings support for Kubernetes 1.17 clusters.
Full release notes:
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.0
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.1

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing unit and integration tests cover creation of K8S objects. Adjusted them to work with the new fabric8 version

Closes #27443 from onursatici/os/bump-fabric8.

Authored-by: Onur Satici <onursatici@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 01:17:30 -08:00
Thomas Graves 878094f972 [SPARK-30689][CORE][YARN] Add resource discovery plugin api to support YARN versions with resource scheduling
### What changes were proposed in this pull request?

This change is to allow custom resource scheduler (GPUs,FPGAs,etc) resource discovery to be more flexible. Users are asking for it to work with hadoop 2.x versions that do not support resource scheduling in YARN and/or also they may not run in an isolated environment.
This change creates a plugin api that users can write their own resource discovery class that allows a lot more flexibility. The user can chain plugins for different resource types. The user specified plugins execute in the order specified and will fall back to use the discovery script plugin if they don't return information for a particular resource.

I had to open up a few of the classes to be public and change them to not be case classes and make them developer api in order for the the plugin to get enough information it needs.

I also relaxed the yarn side so that if yarn isn't configured for resource scheduling we just warn and go on. This helps users that have yarn 3.1 but haven't configured the resource scheduling side on their cluster yet, or aren't running in isolated environment.

The user would configured this like:
--conf spark.resources.discovery.plugin="org.apache.spark.resource.ResourceDiscoveryFPGAPlugin, org.apache.spark.resource.ResourceDiscoveryGPUPlugin"

Note the executor side had to be wrapped with a classloader to make sure we include the user classpath for jars they specified on submission.

Note this is more flexible because the discovery script has limitations such as spawning it in a separate process. This means if you are trying to allocate resources in that process they might be released when the script returns. Other things are the class makes it more flexible to be able to integrate with existing systems and solutions for assigning resources.

### Why are the changes needed?

to more easily use spark resource scheduling with older versions of hadoop or in non-isolated enivronments.

### Does this PR introduce any user-facing change?

Yes a plugin api

### How was this patch tested?

Unit tests added and manual testing done on yarn and standalone modes.

Closes #27410 from tgravescs/hadoop27spark3.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-01-31 22:20:28 -06:00
Jiaxin Shan f86a1b9590 [SPARK-30626][K8S] Add SPARK_APPLICATION_ID into driver pod env
### What changes were proposed in this pull request?
Add SPARK_APPLICATION_ID environment when spark configure driver pod.

### Why are the changes needed?
Currently, driver doesn't have this in environments and it's no convenient to retrieve spark id.
The use case is we want to look up spark application id and create application folder and redirect driver logs to application folder.

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
unit tested. I also build new distribution and container image to kick off a job in Kubernetes and I do see SPARK_APPLICATION_ID added there. .

Closes #27347 from Jeffwan/SPARK-30626.

Authored-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-24 12:00:30 -08:00
xushiwei 00425595 f14061c6a4 [SPARK-30371][K8S] Add spark.kubernetes.driver.master conf
### What changes were proposed in this pull request?

make KUBERNETES_MASTER_INTERNAL_URL configurable

### Why are the changes needed?

we do not always use the default port number 443 to access our kube-apiserver, and even in some mulit-tenant cluster,  people do not use the service `kubernetes.default.svc` to access the kube-apiserver, so make the internal master configurable is necessary。

### Does this PR introduce any user-facing change?

user can configure the internal master url by
```
--conf spark.kubernetes.internal.master=https://kubernetes.default.svc:6443
```

### How was this patch tested?

run in multi-cluster that do not use the https://kubernetes.default.svc to access the kube-apiserver

Closes #27029 from wackxu/internalmaster.

Authored-by: xushiwei 00425595 <xushiwei5@huawei.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-19 14:14:45 -08:00
Marcelo Vanzin dca838058f [SPARK-29950][K8S] Blacklist deleted executors in K8S with dynamic allocation
The issue here is that when Spark is downscaling the application and deletes
a few pod requests that aren't needed anymore, it may actually race with the
K8S scheduler, who may be bringing up those executors. So they may have enough
time to connect back to the driver, register, to just be deleted soon after.
This wastes resources and causes misleading entries in the driver log.

The change (ab)uses the blacklisting mechanism to consider the deleted excess
pods as blacklisted, so that if they try to connect back, the driver will deny
it.

It also changes the executor registration slightly, since even with the above
change there were misleading logs. That was because the executor registration
message was an RPC that always succeeded (bar network issues), so the executor
would always try to send an unregistration message to the driver, which would
then log several messages about not knowing anything about the executor. The
change makes the registration RPC succeed or fail directly, instead of using
the separate failure message that would lead to this issue.

Note the last change required some changes in a standalone test suite related
to dynamic allocation, since it relied on the driver not throwing exceptions
when a duplicate executor registration happened.

Tested with existing unit tests, and with live cluster with dyn alloc on.

Closes #26586 from vanzin/SPARK-29950.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2020-01-16 13:37:11 -08:00
Yuming Wang 696288f623 [INFRA] Reverts commit 56dcd79 and c216ef1
### What changes were proposed in this pull request?
1. Revert "Preparing development version 3.0.1-SNAPSHOT": 56dcd79

2. Revert "Preparing Spark release v3.0.0-preview2-rc2": c216ef1

### Why are the changes needed?
Shouldn't change master.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master

Closes #26915 from wangyum/revert-master.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-16 19:57:44 -07:00
Yuming Wang 56dcd79992 Preparing development version 3.0.1-SNAPSHOT 2019-12-17 01:57:27 +00:00
Yuming Wang c216ef1d03 Preparing Spark release v3.0.0-preview2-rc2 2019-12-17 01:57:21 +00:00
Shahin Shakeri b573f23ed1 [SPARK-29574][K8S] Add SPARK_DIST_CLASSPATH to the executor class path
### What changes were proposed in this pull request?
Include `$SPARK_DIST_CLASSPATH` in class path when launching `CoarseGrainedExecutorBackend` on Kubernetes executors using the provided `entrypoint.sh`

### Why are the changes needed?
For user provided Hadoop, `$SPARK_DIST_CLASSPATH` contains the required jars.

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
Kubernetes 1.14, Spark 2.4.4, Hadoop 3.2.1. Adding $SPARK_DIST_CLASSPATH to  `-cp ` param of entrypoint.sh enables launching the executors correctly.

Closes #26493 from sshakeri/master.

Authored-by: Shahin Shakeri <shahin.shakeri@pwc.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-12-16 10:11:50 -08:00
Dongjoon Hyun cc276f8a6e [SPARK-30243][BUILD][K8S] Upgrade K8s client dependency to 4.6.4
### What changes were proposed in this pull request?

This PR aims to upgrade K8s client library from 4.6.1 to 4.6.4 for `3.0.0-preview2`.

### Why are the changes needed?

This will bring the latest bug fixes.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.4
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.3
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.2

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with K8s integration test.

Closes #26874 from dongjoon-hyun/SPARK-30243.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-13 08:25:51 -08:00
Ilan Filonenko 708cf16be9 [SPARK-30111][K8S] Apt-get update to fix debian issues
### What changes were proposed in this pull request?
Added apt-get update as per [docker best-practices](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#apt-get)

### Why are the changes needed?
Builder is failing because:
Without doing apt-get update, the APT lists get outdated and begins referring to package versions that no longer exist, hence the 404 trying to download them (Debian does not keep old versions in the archive when a package is updated).

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
k8s builder

Closes #26753 from ifilonenko/SPARK-30111.

Authored-by: Ilan Filonenko <ifilonenko@bloomberg.net>
Signed-off-by: shane knapp <incomplete@gmail.com>
2019-12-03 17:59:02 -08:00
Marcelo Vanzin b095232f63 [SPARK-29865][K8S] Ensure client-mode executors have same name prefix
This basically does what BasicDriverFeatureStep already does to achieve the
same thing in cluster mode; but since that class (or any other feature) is
not invoked in client mode, it needs to be done elsewhere.

I also modified the client mode integration test to check the executor name
prefix; while there I had to fix the minikube backend to parse the output
from newer minikube versions (I have 1.5.2).

Closes #26488 from vanzin/SPARK-29865.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Erik Erlandson <eerlands@redhat.com>
2019-11-14 15:52:39 -07:00
Xingbo Jiang 8207c835b4 Revert "Prepare Spark release v3.0.0-preview-rc2"
This reverts commit 007c873ae3.
2019-10-30 17:45:44 -07:00
Xingbo Jiang 007c873ae3 Prepare Spark release v3.0.0-preview-rc2
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the sparkR version number check logic to allow jvm version like `3.0.0-preview`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A
2019-10-30 17:42:59 -07:00
Xingbo Jiang b33a58c0c6 Revert "Prepare Spark release v3.0.0-preview-rc1"
This reverts commit 5eddbb5f1d.
2019-10-28 22:32:34 -07:00
Xingbo Jiang 5eddbb5f1d Prepare Spark release v3.0.0-preview-rc1
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the PySpark version from `3.0.0.dev0` to `3.0.0`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A

Closes #26243 from jiangxb1987/3.0.0-preview-prepare.

Lead-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2019-10-28 22:31:29 -07:00
igor.calabria 78bdcfade1 [SPARK-27812][K8S] Bump K8S client version to 4.6.1
### What changes were proposed in this pull request?

Updated kubernetes client.

### Why are the changes needed?

https://issues.apache.org/jira/browse/SPARK-27812
https://issues.apache.org/jira/browse/SPARK-27927

We need this fix https://github.com/fabric8io/kubernetes-client/pull/1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in https://github.com/apache/spark/pull/25785

### Does this PR introduce any user-facing change?

Nope, it should be transparent to users

### How was this patch tested?

This patch was tested manually using a simple pyspark job

```python
from pyspark.sql import SparkSession

if __name__ == '__main__':
    spark = SparkSession.builder.getOrCreate()
```

The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running

```
"OkHttp WebSocket https://10.96.0.1/..." #121 prio=5 os_prio=0 tid=0x00007fb27c005800 nid=0x24b waiting on condition [0x00007fb300847000]
"OkHttp WebSocket https://10.96.0.1/..." #117 prio=5 os_prio=0 tid=0x00007fb28c004000 nid=0x247 waiting on condition [0x00007fb300e4b000]
```
This is caused by a bug on `kubernetes-client` library, which is fixed on the version that we are upgrading to.

When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.3 is restored and both processes terminate successfully

Closes #26093 from igorcalabria/k8s-client-update.

Authored-by: igor.calabria <igor.calabria@ubee.in>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-17 12:23:24 -07:00
maruilei f800fa3831 [SPARK-29436][K8S] Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario
### What changes were proposed in this pull request?

Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario.

### Why are the changes needed?

If there is no such function, spark can not support the case of k8s multi-scheduler scenario.

### Does this PR introduce any user-facing change?

Yes, users can add scheduler name through configuration.

### How was this patch tested?

Manually tested with spark + k8s cluster

Closes #26088 from merrily01/SPARK-29436.

Authored-by: maruilei <maruilei@jd.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-17 07:24:13 -07:00
Kent Yao 02c5b4f763 [SPARK-28947][K8S] Status logging not happens at an interval for liveness
### What changes were proposed in this pull request?

This pr invoke the start method of `LoggingPodStatusWatcherImpl` for status logging at intervals.

### Why are the changes needed?

This pr invoke the start method of `LoggingPodStatusWatcherImpl` is declared but never called

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

manually test

Closes #25648 from yaooqinn/SPARK-28947.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-10-15 12:34:39 -07:00
Ilan Filonenko 52186afd84 [SPARK-25152][K8S] Enable SparkR Integration Tests for Kubernetes
## What changes were proposed in this pull request?

Re-introduced SparkR integration tests as part of the SparkR on K8S release. This PR awaits Jenkins availability.

## How was this patch tested?

This patch was tested with unit tests and integration tests.

Closes #22145 from ifilonenko/spark-r-with-tests.

Authored-by: Ilan Filonenko <if56@cornell.edu>
Signed-off-by: shane knapp <incomplete@gmail.com>
2019-10-14 13:25:54 -07:00
Liang-Chi Hsieh ea8b5df474 [SPARK-28938][K8S] Move to supported OpenJDK docker image for Kubernetes
### What changes were proposed in this pull request?

The current docker image used by Kubernetes is `openjdk:8-alpine`. It was not supported and  was removed with the commit 3eb0351b20 (diff-f95ffa3d1377774732c33f7b8368e099).

This PR proposes to move to a supported docker image.

### Why are the changes needed?

I think there are at least two reasons:

1. According to the commit, Alpine/musl is not officially supported by the OpenJDK project.
2. As no more OpenJDK 8 Alpine images, new JDK updates including security fixes
, are not applied to it. See below:

```
docker run -it --rm openjdk:8-alpine java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (IcedTea 3.12.0) (Alpine 8.212.04-r0)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)
```
```
docker run -it --rm openjdk:8-jdk-slim java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
```

### Does this PR introduce any user-facing change?

Yes. This changes the base docker image of Spark.

### How was this patch tested?

Existing tests.

Closes #26037 from viirya/SPARK-28938.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-07 08:52:35 -07:00
maruilei 77510c602a [SPARK-29233][K8S] Add regex expression checks for executorEnv…
### What changes were proposed in this pull request?

In kubernetes, there are some naming regular expression requirements and restrictions on environment variable names, such as:

- In kubernetes version release-1.7 and earlier, the naming rules of pod environment variable names should meet the requirements of regular expressions: [[A-Za-z_] [A-Za-z0-9_]*](https://github.com/kubernetes/kubernetes/blob/release-1.7/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L169)
- In kubernetes version release-1.8 and later, the naming rules of pod environment variable names should meet the requirements of regular expressions: [[-. _ A-ZA-Z][-. _ A-ZA-Z0-9].*](https://github.com/kubernetes/kubernetes/blob/release-1.8/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L305)

However, in spark on k8s mode, spark should add restrictions on environmental variable names when creating executorEnv.

In addition, we need to use regular expressions adapted to the high version of k8s to increase the restrictions on the names of environmental variables.

Otherwise, the pod will not be created properly and the spark application will be suspended.

To solve the problem above, a regular validation to executorEnv is added and committed. 

### Why are the changes needed?

If no validation rules are added, the environment variable names that don't meet the requirements will cause the pod to not be created properly and the application will be suspended.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Add unit tests and manually run.

Closes #25920 from merrily01/SPARK-29233.

Authored-by: maruilei <maruilei@jd.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-06 09:41:11 -05:00
Dongjoon Hyun bd031c2173 [SPARK-29307][BUILD][TESTS] Remove scalatest deprecation warnings
### What changes were proposed in this pull request?

This PR aims to remove `scalatest` deprecation warnings with the following changes.
- `org.scalatest.mockito.MockitoSugar` -> `org.scalatestplus.mockito.MockitoSugar`
- `org.scalatest.selenium.WebBrowser` -> `org.scalatestplus.selenium.WebBrowser`
- `org.scalatest.prop.Checkers` -> `org.scalatestplus.scalacheck.Checkers`
- `org.scalatest.prop.GeneratorDrivenPropertyChecks` -> `org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks`

### Why are the changes needed?

According to the Jenkins logs, there are 118 warnings about this.
```
 grep "is deprecated" ~/consoleText | grep scalatest | wc -l
     118
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

After Jenkins passes, we need to check the Jenkins log.

Closes #25982 from dongjoon-hyun/SPARK-29307.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-30 21:00:11 -07:00
Sean Owen e1ea806b30 [SPARK-29291][CORE][SQL][STREAMING][MLLIB] Change procedure-like declaration to function + Unit for 2.13
### What changes were proposed in this pull request?

Scala 2.13 emits a deprecation warning for procedure-like declarations:

```
def foo() {
 ...
```

This is equivalent to the following, so should be changed to avoid a warning:

```
def foo(): Unit = {
  ...
```

### Why are the changes needed?

It will avoid about a thousand compiler warnings when we start to support Scala 2.13. I wanted to make the change in 3.0 as there are less likely to be back-ports from 3.0 to 2.4 than 3.1 to 3.0, for example, minimizing that downside to touching so many files.

Unfortunately, that makes this quite a big change.

### Does this PR introduce any user-facing change?

No behavior change at all.

### How was this patch tested?

Existing tests.

Closes #25968 from srowen/SPARK-29291.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-30 10:03:23 -07:00
Holden Karau 4080c4beeb [SPARK-28937][SPARK-28936][KUBERNETES] Reduce test flakyness
### What changes were proposed in this pull request?

Switch from using a Thread sleep for waiting for commands to finish to just waiting for the command to finish with a watcher & improve the error messages in the SecretsTestsSuite.

### Why are the changes needed?
Currently some of the Spark Kubernetes tests have race conditions with command execution, and the frequent use of eventually makes debugging test failures difficult.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests pass after removal of thread.sleep

Closes #25765 from holdenk/SPARK-28937SPARK-28936-improve-kubernetes-integration-tests.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2019-09-20 10:08:16 -07:00
Dongjoon Hyun 3bf43fb60d [SPARK-29159][BUILD] Increase ReservedCodeCacheSize to 1G
### What changes were proposed in this pull request?

This PR aims to increase the JVM CodeCacheSize from 0.5G to 1G.

### Why are the changes needed?

After upgrading to `Scala 2.12.10`, the following is observed during building.
```
2019-09-18T20:49:23.5030586Z OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
2019-09-18T20:49:23.5032920Z OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
2019-09-18T20:49:23.5034959Z CodeCache: size=524288Kb used=521399Kb max_used=521423Kb free=2888Kb
2019-09-18T20:49:23.5035472Z  bounds [0x00007fa62c000000, 0x00007fa64c000000, 0x00007fa64c000000]
2019-09-18T20:49:23.5035781Z  total_blobs=156549 nmethods=155863 adapters=592
2019-09-18T20:49:23.5036090Z  compilation: disabled (not enough contiguous free space left)
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually check the Jenkins or GitHub Action build log (which should not have the above).

Closes #25836 from dongjoon-hyun/SPARK-CODE-CACHE-1G.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-19 00:24:15 -07:00
Holden Karau 0ed9fae457 [SPARK-28886][K8S] Fix the DepsTestsSuite with minikube 1.3.1
### What changes were proposed in this pull request?

Matches the response from minikube service against a regex to extract the URL

### Why are the changes needed?

minikube 1.3.1 on OSX has different formatting than expected

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Ran the existing integration test run on OSX with minikube 1.3.1

Closes #25599 from holdenk/SPARK-28886-fix-deps-tests-with-minikube-1.3.1.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-08 20:04:16 -05:00
Sean Owen ded23f83dd [SPARK-28921][K8S][FOLLOWUP] Also bump K8S client version in integration-tests
### What changes were proposed in this pull request?

Per https://github.com/apache/spark/pull/25640#issuecomment-527397689 also bump K8S client version in integration-tests module.

### Why are the changes needed?

Harmonize the version as intended.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #25664 from srowen/SPARK-28921.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-04 19:51:04 -05:00
Andy Grove 35d4edffa2 [SPARK-28921][BUILD][K8S] Upgrade kubernetes client to 4.4.2
### What changes were proposed in this pull request?

Upgrade kubernetes client from 4.1.2 to 4.4.2

### Why are the changes needed?

To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Pass the Jenkins and manually test on EKS.

Closes #25640 from andygrove/SPARK-28921.

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-02 16:50:58 -07:00
shane knapp 13fd32c9a9 [SPARK-28701][TEST-HADOOP3.2][TEST-JAVA11][K8S] adding java11 support for pull request builds
## What changes were proposed in this pull request?

we need to add the ability to test PRBs against java11.

see comments here:  https://github.com/apache/spark/pull/25405

## How was this patch tested?

the build system will test this.

Closes #25423 from shaneknapp/spark-prb-java11.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-08-27 00:48:01 +09:00
Dongjoon Hyun f7c9de9035 [SPARK-28765][BUILD] Add explict exclusions to avoid JDK11 dependency issue
### What changes were proposed in this pull request?

This PR adds explicit exclusions to avoid Maven `JDK11` dependency issues.

### Why are the changes needed?

Maven/Ivy seems to be confused during dependency generation on `JDK11` environment.
This is not only wrong, but also causes a Jenkins failure during dependency manifest check on `JDK11` environment.

**JDK8**
```
$ cd core
$ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api
...
[DEBUG]       org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2)
[DEBUG]          org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]          javax.validation:validation-api:jar:2.0.1.Final:compile
```

**JDK11**
```
[DEBUG]       org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2)
[DEBUG]          org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile
[DEBUG]          javax.validation:validation-api:jar:2.0.1.Final:compile
[DEBUG]          jakarta.xml.bind:jakarta.xml.bind-api🫙2.3.2:compile
[DEBUG]             jakarta.activation:jakarta.activation-api🫙1.2.1:compile
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Do the following in both `JDK8` and `JDK11` environment. The dependency manifest should not be changed. In the current `master` branch, `JDK11` changes the dependency manifest.
```
$ dev/test-dependencies.sh --replace-manifest
```

Closes #25481 from dongjoon-hyun/SPARK-28765.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-17 10:16:22 -07:00
Marcelo Vanzin 0343854f54 [SPARK-28487][K8S] More responsive dynamic allocation with K8S
This change implements a few changes to the k8s pod allocator so
that it behaves a little better when dynamic allocation is on.

(i) Allow the application to ramp up immediately when there's a
change in the target number of executors. Without this change,
scaling would only trigger when a change happened in the state of
the cluster, e.g. an executor going down, or when the periodical
snapshot was taken (default every 30s).

(ii) Get rid of pending pod requests, both acknowledged (i.e. Spark
knows that a pod is pending resource allocation) and unacknowledged
(i.e. Spark has requested the pod but the API server hasn't created it
yet), when they're not needed anymore. This avoids starting those
executors to just remove them after the idle timeout, wasting resources
in the meantime.

(iii) Re-work some of the code to avoid unnecessary logging. While not
bad without dynamic allocation, the existing logging was very chatty
when dynamic allocation was on. With the changes, all the useful
information is still there, but only when interesting changes happen.

(iv) Gracefully shut down executors when they become idle. Just deleting
the pod causes a lot of ugly logs to show up, so it's better to ask pods
to exit nicely. That also allows Spark to respect the "don't delete
pods" option when dynamic allocation is on.

Tested on a small k8s cluster running different TPC-DS workloads.

Closes #25236 from vanzin/SPARK-28487.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-08-13 17:29:54 -07:00
Liang-Chi Hsieh 37eedf6149 [SPARK-28652][TESTS][K8S] Add python version check for executor
## What changes were proposed in this pull request?

Current two PySpark version tests in PythonTestsSuite, just test against Python version at driver side. Because the test script doesn't run any spark job requiring python worker, it doesn't actually do version check at worker side. This patch adds pieces of code to the test script, to run a simple job to verify Python version.

## How was this patch tested?

Unit test. Locally manual test.

Closes #25411 from viirya/SPARK-28652.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-08-12 14:43:32 +09:00
Jungtaek Lim (HeartSaVioR) 128ea37bda [SPARK-28601][CORE][SQL] Use StandardCharsets.UTF_8 instead of "UTF-8" string representation, and get rid of UnsupportedEncodingException
## What changes were proposed in this pull request?

This patch tries to keep consistency whenever UTF-8 charset is needed, as using `StandardCharsets.UTF_8` instead of using "UTF-8". If the String type is needed, `StandardCharsets.UTF_8.name()` is used.

This change also brings the benefit of getting rid of `UnsupportedEncodingException`, as we're providing `Charset` instead of `String` whenever possible.

This also changes some private Catalyst helper methods to operate on encodings as `Charset` objects rather than strings.

## How was this patch tested?

Existing unit tests.

Closes #25335 from HeartSaVioR/SPARK-28601.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-05 20:45:54 -07:00
HyukjinKwon 946aef0535 [SPARK-28550][K8S][TESTS] Unset SPARK_HOME environment variable in K8S integration preparation
## What changes were proposed in this pull request?

Currently, if we run the Kubernetes integration tests with `SPARK_HOME` already set, it refers the `SPARK_HOME` even when `--spark-tgz` is specified.

This PR proposes to unset `SPARK_HOME` to let the docker-image-tool script detect `SPARK_HOME`. Otherwise, it cannot indicate the unpacked directory as its home.

## How was this patch tested?

```bash
export SPARK_HOME=`pwd`
dev/make-distribution.sh --pip --tgz -Phadoop-2.7 -Pkubernetes
resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --deploy-mode docker-for-desktop --spark-tgz $PWD/spark-*.tgz
```

**Before:**

```
+ /.../spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh -r docker.io/kubespark -t 650B51C8-BBED-47C9-AEAB-E66FC9A0E64E -p /.../spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
cp: resource-managers/kubernetes/docker/src/main/dockerfiles: No such file or directory
cp: assembly/target/scala-2.12/jars: No such file or directory
cp: resource-managers/kubernetes/integration-tests/tests: No such file or directory
cp: examples/target/scala-2.12/jars/*: No such file or directory
cp: resource-managers/kubernetes/docker/src/main/dockerfiles: No such file or directory
cp: resource-managers/kubernetes/docker/src/main/dockerfiles: No such file or directory
Cannot find docker image. This script must be run from a runnable distribution of Apache Spark.
...
[INFO] Spark Project Kubernetes Integration Tests ......... FAILURE [  4.870 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
```

**After:**

```
+ /.../spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh -r docker.io/kubespark -t 2BA5883A-A0AC-4D2B-8D00-702D31B59B23 -p /.../spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
Sending build context to Docker daemon  250.2MB
Step 1/15 : FROM openjdk:8-alpine
 ---> a3562aa0b991
...
Successfully built 8614fb5ac279
Successfully tagged kubespark/spark:2BA5883A-A0AC-4D2B-8D00-702D31B59B23
```

Closes #25283 from HyukjinKwon/SPARK-28550.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-07-29 10:47:28 -07:00
Junjie Chen 780d176136 [SPARK-28042][K8S] Support using volume mount as local storage
## What changes were proposed in this pull request?

This pr is used to support using hostpath/PV volume mounts as local storage. In KubernetesExecutorBuilder.scala, the LocalDrisFeatureStep is built before MountVolumesFeatureStep which means we cannot use any volumes mount later. This pr adjust the order of feature building steps which moves localDirsFeature at last so that we can check if directories in SPARK_LOCAL_DIRS are set to volumes mounted such as hostPath, PV, or others.

## How was this patch tested?
Unit tests

Closes #24879 from chenjunjiedada/SPARK-28042.

Lead-authored-by: Junjie Chen <jimmyjchen@tencent.com>
Co-authored-by: Junjie Chen <cjjnjust@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-07-29 10:44:17 -07:00
Dongjoon Hyun 767802500c [SPARK-28534][K8S][TEST] Update node affinity for DockerForDesktop backend in PVTestsSuite
## What changes were proposed in this pull request?

This PR aims to recover our K8s integration test suite by extending node affinity in order to pass `PVTestsSuite` in `DockerForDesktop` environment, too. Previously, `PVTestsSuite` fails at `--deploy-mode docker-for-desktop` option because the node affinity requires `minibase` node.

For `Docker Desktop`, there are two node names like the following. Note that Spark testing needs K8s v1.13 and above. So, this PR should be verified with `Docker Desktop (Edge)` version. This PR adds both because next stable `Docker Desktop` will have K8s v1.14.3.

**Docker Desktop (Stable, K8s v1.10.11)**
```
$ kubectl get node
NAME                 STATUS   ROLES    AGE   VERSION
docker-for-desktop   Ready    master   52s   v1.10.11
```

**Docker Desktop 2.1.0.0 (Edge, K8s v1.14.3, Released 2019-07-26)**
```
$ kubectl get node
NAME             STATUS   ROLES    AGE   VERSION
docker-desktop   Ready    master   16h   v1.14.3
```

## How was this patch tested?

Pass the Jenkins K8s integration test (`minibase`) and install `Docker Desktop 2.1.0.0 (Edge)` and run the integration test in `DockerForDesktop`. Note that this fixes only `PVTestsSuite`.

```
$ dev/make-distribution.sh --pip --tgz -Phadoop-2.7 -Pkubernetes
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --deploy-mode docker-for-desktop --spark-tgz $PWD/spark-*.tgz
...
KubernetesSuite:
...
- PVs with local storage
...
```

Closes #25269 from dongjoon-hyun/SPARK-28534.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-07-29 16:28:56 +09:00
Stavros Kontopoulos 7504eab42f [SPARK-28465][K8S] Fix integration tests which fail due to missing ceph-nano image
## What changes were proposed in this pull request?

Fixes the tests. Follows instructions here: https://github.com/ceph/cn/issues/115#issuecomment-497384369
## How was this patch tested?

Manually by running the tests with minikube.

Closes #25222 from skonto/fix-ceph.

Authored-by: Stavros Kontopoulos <st.kontopoulos@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-07-24 14:58:20 -07:00