ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Dongjoon Hyun	78244bafe8	[SPARK-34281][K8S] Promote spark.kubernetes.executor.podNamePrefix to the public conf ### What changes were proposed in this pull request? This PR aims to remove `internal()` from `spark.kubernetes.executor.podNamePrefix` in order to make it the configuration public. ### Why are the changes needed? In line with K8s GA, this will allow some users control the full executor pod names officially. This is useful when we want a custom executor pod name pattern independently from the app name. ### Does this PR introduce _any_ user-facing change? No, this has been there since Apache Spark 2.3.0. ### How was this patch tested? N/A. Closes #31386 from dongjoon-hyun/SPARK-34281. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-01-28 13:01:18 -08:00
HyukjinKwon	a99a47ca1d	[SPARK-33748][K8S] Respect environment variables and configurations for Python executables ### What changes were proposed in this pull request? This PR proposes: - Respect `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations in Kubernates just like other cluster types in Spark. - Depreate `spark.kubernetes.pyspark.pythonVersion` and guide users to set the environment variables and configurations for Python executables. NOTE that `spark.kubernetes.pyspark.pythonVersion` is already a no-op configuration without this PR. Default is `3` and other values are disallowed. - In order for Python executable settings to be consistently used, fix `spark.archives` option to unpack into the current working directory in the driver of Kubernates' cluster mode. This behaviour is identical with Yarn's cluster mode. By doing this, users can leverage Conda or virtuenenv in cluster mode as below: ```python conda create -y -n pyspark_conda_env -c conda-forge pyarrow pandas conda-pack conda activate pyspark_conda_env conda pack -f -o pyspark_conda_env.tar.gz PYSPARK_PYTHON=./environment/bin/python spark-submit --archives pyspark_conda_env.tar.gz#environment app.py ``` - Removed several unused or useless codes such as `extractS3Key` and `renameResourcesToLocalFS` ### Why are the changes needed? - To provide a consistent support of PySpark by using `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations. - To provide Conda and virtualenv support via `spark.archives` options. ### Does this PR introduce _any_ user-facing change? Yes: - `spark.kubernetes.pyspark.pythonVersion` is deprecated. - `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, and `spark.pyspark.python` and `spark.pyspark.driver.python` configurations are respected. ### How was this patch tested? Manually tested via: ```bash minikube delete minikube start --cpus 12 --memory 16384 kubectl create namespace spark-integration-test cat <<EOF \| kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: spark namespace: spark-integration-test EOF kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test dev/make-distribution.sh --pip --tgz -Pkubernetes resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.2.0-SNAPSHOT-bin-3.2.0.tgz --service-account spark --namespace spark-integration-test ``` Unittests were also added. Closes #30735 from HyukjinKwon/SPARK-33748. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-15 08:56:45 +09:00
HyukjinKwon	990bee9c58	[SPARK-33615][K8S] Make 'spark.archives' working in Kubernates ### What changes were proposed in this pull request? This PR proposes to make `spark.archives` configuration working in Kubernates. It works without a problem in standalone cluster but there seems a bug in Kubernates. It fails to fetch the file on the driver side as below: ``` 20/12/03 13:33:53 INFO SparkContext: Added JAR file:/tmp/spark-75004286-c83a-4369-b624-14c5d2d2a748/spark-examples_2.12-3.1.0-SNAPSHOT.jar at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/jars/spark-examples_2.12-3.1.0-SNAPSHOT.jar with timestamp 1607002432558 20/12/03 13:33:53 INFO SparkContext: Added archive file:///tmp/tmp4542734800151332666.txt.tar.gz#test_tar_gz at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz with timestamp 1607002432558 20/12/03 13:33:53 INFO TransportClientFactory: Successfully created connection to spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc/172.17.0.4:7078 after 83 ms (47 ms spent in bootstraps) 20/12/03 13:33:53 INFO Utils: Fetching spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz to /tmp/spark-66573e24-27a3-427c-99f4-36f06d9e9cd5/fetchFileTemp2665785666227461849.tmp 20/12/03 13:33:53 ERROR SparkContext: Error initializing SparkContext. java.lang.RuntimeException: Stream '/files/tmp4542734800151332666.txt.tar.gz' was not found. at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:242) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) ``` This is because `spark.archives` was not actually added on the driver side correctly. The changes here fix it by adding and resolving URIs correctly. ### Why are the changes needed? `spark.archives` feature can be leveraged for many things such as Conda support. We should make it working in Kubernates as well. This is a bug fix too. ### Does this PR introduce _any_ user-facing change? No, this feature is not out yet. ### How was this patch tested? I manually tested with Minikube 1.15.1. For an environment issue (?), I had to use a custom namespace, service account and roles. `default` service account does not work for me and complains it doesn't have permissions to get/list pods, etc. ```bash minikube delete minikube start --cpus 12 --memory 16384 kubectl create namespace spark-integration-test cat <<EOF \| kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: spark namespace: spark-integration-test EOF kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test dev/make-distribution.sh --pip --tgz -Pkubernetes resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.1.0-SNAPSHOT-bin-3.2.0.tgz --service-account spark --namespace spark-integration-test ``` Closes #30581 from HyukjinKwon/SPARK-33615. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-04 19:37:03 +09:00
Josh Soref	485145326a	[MINOR] Spelling bin core docs external mllib repl ### What changes were proposed in this pull request? This PR intends to fix typos in the sub-modules: * `bin` * `core` * `docs` * `external` * `mllib` * `repl` * `pom.xml` Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618 NOTE: The misspellings have been reported at `706a726f87 (commitcomment-44064356)` ### Why are the changes needed? Misspelled words make it harder to read / understand content. ### Does this PR introduce _any_ user-facing change? There are various fixes to documentation, etc... ### How was this patch tested? No testing was performed Closes #30530 from jsoref/spelling-bin-core-docs-external-mllib-repl. Authored-by: Josh Soref <jsoref@users.noreply.github.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-11-30 13:59:51 +09:00
Thomas Graves	acfd846753	[SPARK-33288][SPARK-32661][K8S] Stage level scheduling support for Kubernetes ### What changes were proposed in this pull request? This adds support for Stage level scheduling to kubernetes. Kubernetes can support dynamic allocation via the shuffle tracking option which means we can support stage level scheduling by getting new executors. The main changes here are having the k8s cluster manager pass the resource profile id into the executors and then the ExecutorsPodsAllocator has to request executors based on the individual resource profiles. I tried to keep code changes here to a minimum. I specifically choose to leave the ExecutorPodsSnapshot the way it was and construct the resource profile to pod states on the fly, with a fast path when not using other resource profiles, to keep the impact to a minimum. This results in the main changes required are just wrapping the allocation logic in a for loop over each profile. The other main change is in the basic feature step we have to look at the resources in the ResourceProfile to request pods with the correct resources. Much of the other logic like in the executor life cycle manager doesn't need to be resource profile. This also adds support for [SPARK-32661]Spark executors on K8S should request extra memory for off-heap allocations because the stage level scheduling api has support for this and it made sense to make consistent with YARN. This was started with PR https://github.com/apache/spark/pull/29477 but never updated so I just did it here. To do this I moved a few functions around that were now used by both YARN and kubernetes so you will see some changes in Utils. ### Why are the changes needed? Add the feature to Kubernetes based on customer feedback. ### Does this PR introduce _any_ user-facing change? Yes the feature now works with K8s, but not underlying API changes. ### How was this patch tested? Tested manually on kubernetes cluster and with unit tests. Closes #30204 from tgravescs/stagek8sOrigSnapshotsRebase. Lead-authored-by: Thomas Graves <tgraves@apache.org> Co-authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-11-13 16:04:13 -06:00
Shiqi Sun	f659527727	[SPARK-30821][K8S] Handle executor failure with multiple containers Handle executor failure with multiple containers Added a spark property spark.kubernetes.executor.checkAllContainers, with default being false. When it's true, the executor snapshot will take all containers in the executor into consideration when deciding whether the executor is in "Running" state, if the pod restart policy is "Never". Also, added the new spark property to the doc. ### What changes were proposed in this pull request? Checking of all containers in the executor pod when reporting executor status, if the `spark.kubernetes.executor.checkAllContainers` property is set to true. ### Why are the changes needed? Currently, a pod remains "running" as long as there is at least one running container. This prevents Spark from noticing when a container has failed in an executor pod with multiple containers. With this change, user can configure the behavior to be different. Namely, if any container in the executor pod has failed, either the executor process or one of its sidecars, the pod is considered to be failed, and it will be rescheduled. ### Does this PR introduce _any_ user-facing change? Yes, new spark property added. User is now able to choose whether to turn on this feature using the `spark.kubernetes.executor.checkAllContainers` property. ### How was this patch tested? Unit test was added and all passed. I tried to run integration test by following the instruction [here](https://spark.apache.org/developer-tools.html) (section "Testing K8S") and also [here](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md), but I wasn't able to run it smoothly as it fails to talk with minikube cluster. Maybe it's because my minikube version is too new (I'm using v1.13.1)...? Since I've been trying it for two days and still can't make it work, I decided to submit this PR and hopefully the Jenkins test will pass. Closes #29924 from huskysun/exec-sidecar-failure. Authored-by: Shiqi Sun <s.sun@salesforce.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2020-10-24 09:55:57 -07:00
Dongjoon Hyun	8e7c39089f	[SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' ### What changes were proposed in this pull request? This PR makes `spark.kubernetes.pyspark.pythonVersion` allow only `3`. In other words, it will reject `2` for `Python 2`. - [x] Configuration description and check is updated. - [x] Documentation is updated - [x] Unit test cases are updated. - [x] Docker image script is updated. ### Why are the changes needed? After SPARK-32138, Apache Spark 3.1 dropped Python 2 support. ### Does this PR introduce _any_ user-facing change? Yes, but Python 2 support is already dropped officially. ### How was this patch tested? Pass the CI. Closes #30049 from dongjoon-hyun/SPARK-DROP-PYTHON2. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-15 01:51:01 -07:00
Dongjoon Hyun	ece8d8e22c	[SPARK-33006][K8S][DOCS] Add dynamic PVC usage example into K8s doc ### What changes were proposed in this pull request? This updates K8s document to describe new dynamic PVC features. ### Why are the changes needed? This will help the user use the new features easily. ### Does this PR introduce _any_ user-facing change? Yes, but it's a doc updates. ### How was this patch tested? Manual. <img width="847" alt="Screen Shot 2020-09-28 at 3 54 53 PM" src="https://user-images.githubusercontent.com/9700541/94494923-3ed04400-01a5-11eb-81f9-127db42d4256.png"> <img width="779" alt="Screen Shot 2020-09-28 at 3 55 07 PM" src="https://user-images.githubusercontent.com/9700541/94494930-4394f800-01a5-11eb-9387-50ebc14af477.png"> Closes #29897 from dongjoon-hyun/SPARK-33006. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-09-30 09:27:57 -07:00
Takeshi Yamamuro	bf4ac3bacc	[SPARK-32554][K8S][DOCS] Remove the words "experimental" in the k8s document ### What changes were proposed in this pull request? This PR targets at dropping the words "experimental" in the k8s document from the primary branch. This update comes from a thread in the spark-dev mailing list: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-on-k8s-is-still-experimental-td29942.html ### Why are the changes needed? To prepare a GA announcement for the k8s scheduler in the next feature release (v3.1.0) ### Does this PR introduce _any_ user-facing change? Yes BEFORE: <img width="938" alt="Screen Shot 2020-08-10 at 21 17 48" src="https://user-images.githubusercontent.com/692303/89781831-0752fd00-db4f-11ea-843a-67fb23fc8f71.png"> AFTER: <img width="874" alt="Screen Shot 2020-08-10 at 21 17 21" src="https://user-images.githubusercontent.com/692303/89781816-01f5b280-db4f-11ea-9ab4-4d1012bad80e.png"> ### How was this patch tested? N/A Closes #29368 from maropu/UpdateDocForK8S. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-08-10 06:38:19 -07:00
James Yu	ac98a9a07f	[MINOR][DOCS] Update running-on-kubernetes.md ### What changes were proposed in this pull request? Fix executor container name typo. `executor` should be `spark-kubernetes-executor`. ### Why are the changes needed? The Executor pod container name the users actually get from their Kubernetes clusters is different from that described in the documentation. For example, below is what a user get from an executor pod. ``` Containers: spark-kubernetes-executor: Container ID: docker://aaaabbbbccccddddeeeeffff Image: <imagename> Image ID: docker-pullable://0000.dkr.ecr.us-east-0.amazonaws.com/spark Port: 7079/TCP Host Port: 0/TCP Args: executor State: Running Started: Thu, 28 May 2020 05:54:04 -0700 Ready: True Restart Count: 0 Limits: memory: 16Gi ``` ### Does this PR introduce _any_ user-facing change? Document change. ### How was this patch tested? N/A Closes #28862 from yuj/patch-1. Authored-by: James Yu <yuj@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-18 14:36:20 -07:00
Dongjoon Hyun	7ce3f76af6	[SPARK-31696][DOCS][FOLLOWUP] Update version in documentation # What changes were proposed in this pull request? This PR is a follow-up to fix a version of configuration document. ### Why are the changes needed? The original PR is backported to branch-3.0. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Manual. Closes #28530 from dongjoon-hyun/SPARK-31696-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-14 10:25:22 -07:00
Dongjoon Hyun	c8f3bd861d	[SPARK-31696][K8S] Support driver service annotation in K8S ### What changes were proposed in this pull request? This PR aims to add `spark.kubernetes.driver.service.annotation` like `spark.kubernetes.driver.service.annotation`. ### Why are the changes needed? Annotations are used in many ways. One example is that Prometheus monitoring system search metric endpoint via annotation. - https://github.com/helm/charts/tree/master/stable/prometheus#scraping-pod-metrics-via-annotations ### Does this PR introduce _any_ user-facing change? Yes. The documentation is added. ### How was this patch tested? Pass Jenkins with the updated unit tests. Closes #28518 from dongjoon-hyun/SPARK-31696. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-13 13:59:42 -07:00
Dongjoon Hyun	fde996be87	[SPARK-31394][DOC][FOLLOWUP] Add nfs volume type description ### What changes were proposed in this pull request? This adds newly supported `nfs` volume type description into the document for Apache Spark 3.1.0. ### Why are the changes needed? To complete the document. ### Does this PR introduce any user-facing change? Yes. (Doc) ![nfs_screen_shot](https://user-images.githubusercontent.com/9700541/79530887-8f077f80-8025-11ea-8cc1-e0b551802d5d.png) ### How was this patch tested? Manually generate doc and check it. ``` SKIP_API=1 jekyll build ``` Closes #28236 from dongjoon-hyun/SPARK-NFS-DOC. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-17 12:07:34 -07:00
beliefer	1254c88034	[SPARK-31118][K8S][DOC] Add version information to the configuration of K8S ### What changes were proposed in this pull request? Add version information to the configuration of `K8S`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.kubernetes.context \| 3.0.0 \| SPARK-25887 \| c542c247bbfe1214c0bf81076451718a9e8931dc#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.master \| 3.0.0 \| SPARK-30371 \| f14061c6a4729ad419902193aa23575d8f17f597#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.namespace \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.container.image \| 2.3.0 \| SPARK-22994 \| b94debd2b01b87ef1d2a34d48877e38ade0969e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.container.image \| 2.3.0 \| SPARK-22807 \| fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.container.image \| 2.3.0 \| SPARK-22807 \| fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.container.image.pullPolicy \| 2.3.0 \| SPARK-22807 \| fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.container.image.pullSecrets \| 2.4.0 \| SPARK-23668 \| cccaaa14ad775fb981e501452ba2cc06ff5c0f0a#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.submission.requestTimeout \| 3.0.0 \| SPARK-27023 \| e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.submission.connectionTimeout \| 3.0.0 \| SPARK-27023 \| e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.requestTimeout \| 3.0.0 \| SPARK-27023 \| e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.connectionTimeout \| 3.0.0 \| SPARK-27023 \| e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 \| KUBERNETES_AUTH_DRIVER_CONF_PREFIX.serviceAccountName \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver KUBERNETES_AUTH_EXECUTOR_CONF_PREFIX.serviceAccountName \| 3.1.0 \| SPARK-30122 \| f9f06eee9853ad4b6458ac9d31233e729a1ca226#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.executor spark.kubernetes.driver.limit.cores \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.request.cores \| 3.0.0 \| SPARK-27754 \| 1a8c09334db87b0e938c38cd6b59d326bdcab3c3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.submitInDriver \| 2.4.0 \| SPARK-22839 \| f15906da153f139b698e192ec6f82f078f896f1e#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.limit.cores \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.scheduler.name \| 3.0.0 \| SPARK-29436 \| f800fa383131559c4e841bf062c9775d09190935#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.request.cores \| 2.4.0 \| SPARK-23285 \| fe2b7a4568d65a62da6e6eb00fff05f248b4332c#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.pod.name \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.resourceNamePrefix \| 3.0.0 \| SPARK-25876 \| 6be272b75b4ae3149869e19df193675cc4117763#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.podNamePrefix \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.allocation.batch.size \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.allocation.batch.delay \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.lostCheck.maxAttempts \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.submission.waitAppCompletion \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.report.interval \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.apiPollingInterval \| 2.4.0 \| SPARK-24248 \| 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.eventProcessingInterval \| 2.4.0 \| SPARK-24248 \| 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.memoryOverheadFactor \| 2.4.0 \| SPARK-23984 \| 1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.pyspark.pythonVersion \| 2.4.0 \| SPARK-23984 \| a791c29bd824adadfb2d85594bc8dad4424df936#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.kerberos.krb5.path \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.kerberos.krb5.configMapName \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.hadoop.configMapName \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.kerberos.tokenSecret.name \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.kerberos.tokenSecret.itemKey \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.resource.type \| 2.4.1 \| SPARK-25021 \| 9031c784847353051bc0978f63ef4146ae9095ff#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.local.dirs.tmpfs \| 3.0.0 \| SPARK-25262 \| da6fa3828bb824b65f50122a8a0a0d4741551257#diff-6e882d5561424e7e6651eb46f10104b8 \| It exists in branch-3.0, but in pom.xml it is 2.4.0-snapshot spark.kubernetes.driver.podTemplateFile \| 3.0.0 \| SPARK-24434 \| f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.podTemplateFile \| 3.0.0 \| SPARK-24434 \| f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.podTemplateContainerName \| 3.0.0 \| SPARK-24434 \| f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.podTemplateContainerName \| 3.0.0 \| SPARK-24434 \| f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.deleteOnTermination \| 3.0.0 \| SPARK-25515 \| 0c2935b01def8a5f631851999d9c2d57b63763e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.dynamicAllocation.deleteGracePeriod \| 3.0.0 \| SPARK-28487 \| 0343854f54b48b206ca434accec99355011560c2#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.appKillPodDeletionGracePeriod \| 3.0.0 \| SPARK-24793 \| 05168e725d2a17c4164ee5f9aa068801ec2454f4#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.file.upload.path \| 3.0.0 \| SPARK-23153 \| 5e74570c8f5e7dfc1ca1c53c177827c5cea57bf1#diff-6e882d5561424e7e6651eb46f10104b8 \| The following appears in the document \| \| \| \| spark.kubernetes.authenticate.submission.caCertFile \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.submission.clientKeyFile \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.submission.clientCertFile \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.submission.oauthToken \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.submission.oauthTokenFile \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.caCertFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.clientKeyFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.clientCertFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.oauthToken \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.oauthTokenFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.mounted.caCertFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.mounted.clientKeyFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.mounted.clientCertFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.mounted.oauthTokenFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.caCertFile \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.clientKeyFile \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.clientCertFile \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.oauthToken \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.oauthTokenFile \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.label.[LabelName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.annotation.[AnnotationName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.label.[LabelName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.annotation.[AnnotationName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.node.selector.[labelKey] \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driverEnv.[EnvironmentVariableName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.secrets.[SecretName] \| 2.3.0 \| SPARK-22757 \| 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.secrets.[SecretName] \| 2.3.0 \| SPARK-22757 \| 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.secretKeyRef.[EnvName] \| 2.4.0 \| SPARK-24232 \| 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.secretKeyRef.[EnvName] \| 2.4.0 \| SPARK-24232 \| 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.subPath \| 3.0.0 \| SPARK-25960 \| 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.readOnly \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].options.[OptionName] \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 \| spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.path \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPath \| 3.0.0 \| SPARK-25960 \| 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.readOnly \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].options.[OptionName] \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No' ### How was this patch tested? Exists UT Closes #27875 from beliefer/add-version-to-k8s-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-12 09:54:08 +09:00
Kent Yao	46019b6e6c	[MINOR][DOCS] Fix fabric8 version in documentation ### What changes were proposed in this pull request? fix kubernetes-client version doc ### Why are the changes needed? correct doc ### Does this PR introduce any user-facing change? nah ### How was this patch tested? nah Closes #27605 from yaooqinn/k8s-version-update. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-02-19 10:47:59 -06:00
Kent Yao	0353cbf092	[MINOR][DOC] Fix 2 style issues in running-on-kubernetes doc ### What changes were proposed in this pull request? fix style issue in the k8s document, please go to http://spark.apache.org/docs/3.0.0-preview2/running-on-kubernetes.html and search the keyword`spark.kubernetes.file.upload.path` to jump to the error context ### Why are the changes needed? doc correctness ### Does this PR introduce any user-facing change? Nah ### How was this patch tested? Nah Closes #27582 from yaooqinn/k8s-doc. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-02-17 12:06:25 +09:00
xushiwei 00425595	f14061c6a4	[SPARK-30371][K8S] Add spark.kubernetes.driver.master conf ### What changes were proposed in this pull request? make KUBERNETES_MASTER_INTERNAL_URL configurable ### Why are the changes needed? we do not always use the default port number 443 to access our kube-apiserver, and even in some mulit-tenant cluster, people do not use the service `kubernetes.default.svc` to access the kube-apiserver, so make the internal master configurable is necessary。 ### Does this PR introduce any user-facing change? user can configure the internal master url by ``` --conf spark.kubernetes.internal.master=https://kubernetes.default.svc:6443 ``` ### How was this patch tested? run in multi-cluster that do not use the https://kubernetes.default.svc to access the kube-apiserver Closes #27029 from wackxu/internalmaster. Authored-by: xushiwei 00425595 <xushiwei5@huawei.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-19 14:14:45 -08:00
Emil Sandstø	0bdadba5e3	[SPARK-29790][DOC] Note required port for Kube API It adds a note about the required port of a master url in Kubernetes. Currently a port needs to be specified for the Kubernetes API. Also in case the API is hosted on the HTTPS port. Else the driver might fail with https://medium.com/kidane.weldemariam_75349/thanks-james-on-issuing-spark-submit-i-run-into-this-error-cc507d4f8f0d Yes, a change to the "Running on Kubernetes" guide. None - Documentation change Closes #26426 from Tapped/patch-1. Authored-by: Emil Sandstø <emilalexer@hotmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-11-08 09:33:07 -08:00
Jiajia Li	dc0bc7a6eb	[MINOR][DOCS] Fix some typos ### What changes were proposed in this pull request? This PR proposes a few typos: 1. Sparks => Spark's 2. parallize => parallelize 3. doesnt => doesn't Closes #26140 from plusplusjiajia/fix-typos. Authored-by: Jiajia Li <jiajia.li@intel.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-10-17 07:22:01 -07:00
Thomas Graves	b425f8ee65	[SPARK-27492][DOC][YARN][K8S][CORE] Resource scheduling high level user docs ### What changes were proposed in this pull request? Document the resource scheduling feature - https://issues.apache.org/jira/browse/SPARK-24615 Add general docs, yarn, kubernetes, and standalone cluster specific ones. ### Why are the changes needed? Help users understand the feature ### Does this PR introduce any user-facing change? docs ### How was this patch tested? N/A Closes #25698 from tgravescs/SPARK-27492-gpu-sched-docs. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2019-09-11 08:22:36 -05:00
Junjie Chen	780d176136	[SPARK-28042][K8S] Support using volume mount as local storage ## What changes were proposed in this pull request? This pr is used to support using hostpath/PV volume mounts as local storage. In KubernetesExecutorBuilder.scala, the LocalDrisFeatureStep is built before MountVolumesFeatureStep which means we cannot use any volumes mount later. This pr adjust the order of feature building steps which moves localDirsFeature at last so that we can check if directories in SPARK_LOCAL_DIRS are set to volumes mounted such as hostPath, PV, or others. ## How was this patch tested? Unit tests Closes #24879 from chenjunjiedada/SPARK-28042. Lead-authored-by: Junjie Chen <jimmyjchen@tencent.com> Co-authored-by: Junjie Chen <cjjnjust@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-07-29 10:44:17 -07:00
Douglas R Colkitt	8fc5cb6285	[SPARK-28473][DOC] Stylistic consistency of build command in README ## What changes were proposed in this pull request? Change the format of the build command in the README to start with a `./` prefix ./build/mvn -DskipTests clean package This increases stylistic consistency across the README- all the other commands have a `./` prefix. Having a visible `./` prefix also makes it clear to the user that the shell command requires the current working directory to be at the repository root. ## How was this patch tested? README.md was reviewed both in raw markdown and in the Github rendered landing page for stylistic consistency. Closes #25231 from Mister-Meeseeks/master. Lead-authored-by: Douglas R Colkitt <douglas.colkitt@gmail.com> Co-authored-by: Mister-Meeseeks <douglas.colkitt@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-07-23 16:29:46 -07:00
Thomas Graves	1277f8fa92	[SPARK-27362][K8S] Resource Scheduling support for k8s ## What changes were proposed in this pull request? Add ability to map the spark resource configs spark.{executor/driver}.resource.{resourceName} to kubernetes Container builder so that we request resources (gpu,s/fpgas/etc) from kubernetes. Note that the spark configs will overwrite any resource configs users put into a pod template. I added a generic vendor config which is only used by kubernetes right now. I intentionally didn't put it into the kubernetes config namespace just to avoid adding more config prefixes. I will add more documentation for this under jira SPARK-27492. I think it will be easier to do all at once to get cohesive story. ## How was this patch tested? Unit tests and manually testing on k8s cluster. Closes #24703 from tgravescs/SPARK-27362. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2019-05-31 15:26:14 -05:00
Stavros Kontopoulos	5e74570c8f	[SPARK-23153][K8S] Support client dependencies with a Hadoop Compatible File System ## What changes were proposed in this pull request? - solves the current issue with --packages in cluster mode (there is no ticket for it). Also note of some [issues](https://issues.apache.org/jira/browse/SPARK-22657) of the past here when hadoop libs are used at the spark submit side. - supports spark.jars, spark.files, app jar. It works as follows: Spark submit uploads the deps to the HCFS. Then the driver serves the deps via the Spark file server. No hcfs uris are propagated. The related design document is [here](https://docs.google.com/document/d/1peg_qVhLaAl4weo5C51jQicPwLclApBsdR1To2fgc48/edit). the next option to add is the RSS but has to be improved given the discussion in the past about it (Spark 2.3). ## How was this patch tested? - Run integration test suite. - Run an example using S3: ``` ./bin/spark-submit \ ... --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.6 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.memory=1G \ --conf spark.kubernetes.namespace=spark \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \ --conf spark.driver.memory=1G \ --conf spark.executor.instances=2 \ --conf spark.sql.streaming.metricsEnabled=true \ --conf "spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image=skonto/spark:k8s-3.0.0 \ --conf spark.kubernetes.file.upload.path=s3a://fdp-stavros-test \ --conf spark.hadoop.fs.s3a.access.key=... \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ --conf spark.hadoop.fs.s3a.fast.upload=true \ --conf spark.kubernetes.executor.deleteOnTermination=false \ --conf spark.hadoop.fs.s3a.secret.key=... \ --conf spark.files=client:///...resolv.conf \ file:///my.jar ** ``` Added integration tests based on [Ceph nano](https://github.com/ceph/cn). Looks very [active](http://www.sebastien-han.fr/blog/2019/02/24/Ceph-nano-is-getting-better-and-better/). Unfortunately minio needs hadoop >= 2.8. Closes #23546 from skonto/support-client-deps. Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Signed-off-by: Erik Erlandson <eerlands@redhat.com>	2019-05-22 16:15:42 -07:00
Arun Mahadevan	1a8c09334d	[SPARK-27754][K8S] Introduce additional config (spark.kubernetes.driver.request.cores) for driver request cores for spark on k8s ## What changes were proposed in this pull request? Spark on k8s supports config for specifying the executor cpu requests (spark.kubernetes.executor.request.cores) but a similar config is missing for the driver. Instead, currently `spark.driver.cores` value is used for integer value. Although `pod spec` can have `cpu` for the fine-grained control like the following, this PR proposes additional configuration `spark.kubernetes.driver.request.cores` for driver request cores. ``` resources: requests: memory: "64Mi" cpu: "250m" ``` ## How was this patch tested? Unit tests Closes #24630 from arunmahadevan/SPARK-27754. Authored-by: Arun Mahadevan <arunm@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-18 21:28:46 -07:00
Sean Owen	754f820035	[SPARK-26918][DOCS] All .md should have ASF license header ## What changes were proposed in this pull request? Add AL2 license to metadata of all .md files. This seemed to be the tidiest way as it will get ignored by .md renderers and other tools. Attempts to write them as markdown comments revealed that there is no such standard thing. ## How was this patch tested? Doc build Closes #24243 from srowen/SPARK-26918. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-30 19:49:45 -05:00
Stavros Kontopoulos	05168e725d	[SPARK-24793][K8S] Enhance spark-submit for app management - supports `--kill` & `--status` flags. - supports globs which is useful in general check this long standing [issue](https://github.com/kubernetes/kubernetes/issues/17144#issuecomment-272052461) for kubectl. Manually against running apps. Example output: Submission Id reported at launch time: ``` 2019-01-20 23:47:56 INFO Client:58 - Waiting for application spark-pi with submissionId spark:spark-pi-1548020873671-driver to finish... ``` Killing the app: ``` ./bin/spark-submit --kill spark:spark-pi-1548020873671-driver --master k8s://https://192.168.2.8:8443 2019-01-20 23:48:07 WARN Utils:70 - Your hostname, universe resolves to a loopback address: 127.0.0.1; using 192.168.2.8 instead (on interface wlp2s0) 2019-01-20 23:48:07 WARN Utils:70 - Set SPARK_LOCAL_IP if you need to bind to another address ``` App terminates with 143 (SIGTERM, since we have tiny this should lead to [graceful shutdown](https://cloud.google.com/solutions/best-practices-for-building-containers)): ``` 2019-01-20 23:48:08 INFO LoggingPodStatusWatcherImpl:58 - State changed, new state: pod name: spark-pi-1548020873671-driver namespace: spark labels: spark-app-selector -> spark-e4730c80e1014b72aa77915a2203ae05, spark-role -> driver pod uid: 0ba9a794-1cfd-11e9-8215-a434d9270a65 creation time: 2019-01-20T21:47:55Z service account name: spark-sa volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm node name: minikube start time: 2019-01-20T21:47:55Z phase: Running container status: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: running container started at: 2019-01-20T21:48:00Z 2019-01-20 23:48:09 INFO LoggingPodStatusWatcherImpl:58 - State changed, new state: pod name: spark-pi-1548020873671-driver namespace: spark labels: spark-app-selector -> spark-e4730c80e1014b72aa77915a2203ae05, spark-role -> driver pod uid: 0ba9a794-1cfd-11e9-8215-a434d9270a65 creation time: 2019-01-20T21:47:55Z service account name: spark-sa volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm node name: minikube start time: 2019-01-20T21:47:55Z phase: Failed container status: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: terminated container started at: 2019-01-20T21:48:00Z container finished at: 2019-01-20T21:48:08Z exit code: 143 termination reason: Error 2019-01-20 23:48:09 INFO LoggingPodStatusWatcherImpl:58 - Container final statuses: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: terminated container started at: 2019-01-20T21:48:00Z container finished at: 2019-01-20T21:48:08Z exit code: 143 termination reason: Error 2019-01-20 23:48:09 INFO Client:58 - Application spark-pi with submissionId spark:spark-pi-1548020873671-driver finished. 2019-01-20 23:48:09 INFO ShutdownHookManager:58 - Shutdown hook called 2019-01-20 23:48:09 INFO ShutdownHookManager:58 - Deleting directory /tmp/spark-f114b2e0-5605-4083-9203-a4b1c1f6059e ``` Glob scenario: ``` ./bin/spark-submit --status spark:spark-pi* --master k8s://https://192.168.2.8:8443 2019-01-20 22:27:44 WARN Utils:70 - Your hostname, universe resolves to a loopback address: 127.0.0.1; using 192.168.2.8 instead (on interface wlp2s0) 2019-01-20 22:27:44 WARN Utils:70 - Set SPARK_LOCAL_IP if you need to bind to another address Application status (driver): pod name: spark-pi-1547948600328-driver namespace: spark labels: spark-app-selector -> spark-f13f01702f0b4503975ce98252d59b94, spark-role -> driver pod uid: c576e1c6-1c54-11e9-8215-a434d9270a65 creation time: 2019-01-20T01:43:22Z service account name: spark-sa volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm node name: minikube start time: 2019-01-20T01:43:22Z phase: Running container status: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: running container started at: 2019-01-20T01:43:27Z Application status (driver): pod name: spark-pi-1547948792539-driver namespace: spark labels: spark-app-selector -> spark-006d252db9b24f25b5069df357c30264, spark-role -> driver pod uid: 38375b4b-1c55-11e9-8215-a434d9270a65 creation time: 2019-01-20T01:46:35Z service account name: spark-sa volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm node name: minikube start time: 2019-01-20T01:46:35Z phase: Succeeded container status: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: terminated container started at: 2019-01-20T01:46:39Z container finished at: 2019-01-20T01:46:56Z exit code: 0 termination reason: Completed ``` Closes #23599 from skonto/submit_ops_extension. Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-03-26 11:55:03 -07:00
hehuiyuan	7beb464564	[MINOR][DOC] Fix the description of Pod Metadata's annotations ## What changes were proposed in this pull request? ![annotation](https://user-images.githubusercontent.com/18002496/54189638-2d551780-44ed-11e9-9efc-3691bec42130.jpg) Closes #24064 from hehuiyuan/hehuiyuan-patch-4. Authored-by: hehuiyuan <hehuiyuan@ZBMAC-C02WD3K5H.local> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-12 19:29:32 -05:00
hehuiyuan	fd1852b344	[MINOR][DOC] Fix spark.kubernetes.executor.label.[LabelName] parameter meaning ## What changes were proposed in this pull request? It would be better to change the explanation to this spark.kubernetes.executor.label.[LabelName]. Before: Note that Spark also adds its own labels to the driver pod for bookkeeping purposes. After modification: Note that Spark also adds its own labels to the executor pod for bookkeeping purposes. Closes #24054 from hehuiyuan/hehuiyuan-patch-3. Authored-by: hehuiyuan <hehuiyuan@ZBMAC-C02WD3K5H.local> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-11 16:26:47 -07:00
Onur Satici	e9e8bb33ef	[SPARK-27023][K8S] Make k8s client timeouts configurable ## What changes were proposed in this pull request? Make k8s client timeouts configurable. No test suite exists for the client factory class, happy to add one if needed Closes #23928 from onursatici/os/k8s-client-timeouts. Lead-authored-by: Onur Satici <osatici@palantir.com> Co-authored-by: Onur Satici <onursatici@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-06 11:14:39 -08:00
Rob Vesse	c542c247bb	[SPARK-25887][K8S] Configurable K8S context support This enhancement allows for specifying the desired context to use for the initial K8S client auto-configuration. This allows users to more easily access alternative K8S contexts without having to first explicitly change their current context via kubectl. Explicitly set my K8S context to a context pointing to a non-existent cluster, then launched Spark jobs with explicitly specified contexts via the new `spark.kubernetes.context` configuration property. Example Output: ``` > kubectl config current-context minikube > minikube status minikube: Stopped cluster: kubectl: > ./spark-submit --master k8s://https://localhost:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.context=docker-for-desktop --conf spark.kubernetes.container.image=rvesse/spark:debian local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4 18/10/31 11:57:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/31 11:57:51 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using context docker-for-desktop from users K8S config file 18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-pi-1540987071845-driver namespace: default labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver pod uid: 32462cac-dd04-11e8-b6c6-025000000001 creation time: 2018-10-31T11:57:52Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv node name: N/A start time: N/A phase: Pending container status: N/A 18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-pi-1540987071845-driver namespace: default labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver pod uid: 32462cac-dd04-11e8-b6c6-025000000001 creation time: 2018-10-31T11:57:52Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv node name: docker-for-desktop start time: N/A phase: Pending container status: N/A ... 18/10/31 11:58:03 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-pi-1540987071845-driver namespace: default labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver pod uid: 32462cac-dd04-11e8-b6c6-025000000001 creation time: 2018-10-31T11:57:52Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv node name: docker-for-desktop start time: 2018-10-31T11:57:52Z phase: Succeeded container status: container name: spark-kubernetes-driver container image: rvesse/spark:debian container state: terminated container started at: 2018-10-31T11:57:54Z container finished at: 2018-10-31T11:58:02Z exit code: 0 termination reason: Completed ``` Without the `spark.kubernetes.context` setting this will fail because the current context - `minikube` - is pointing to a non-running cluster e.g. ``` > ./spark-submit --master k8s://https://localhost:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.container.image=rvesse/spark:debian local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4 18/10/31 12:02:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/31 12:02:30 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 18/10/31 12:02:31 WARN WatchConnectionManager: Exec Failure javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979) at sun.security.ssl.Handshaker.process_record(Handshaker.java:914) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:66) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:109) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292) at sun.security.validator.Validator.validate(Validator.java:260) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1491) ... 39 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382) ... 45 more Exception in thread "kubernetes-dispatcher-0" Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask611a9c09 rejected from java.util.concurrent.ScheduledThreadPoolExecutor404819e4[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632) at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:300) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:48) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:213) at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543) at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:204) at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543) at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979) at sun.security.ssl.Handshaker.process_record(Handshaker.java:914) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:66) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:109) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) ... 4 more Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292) at sun.security.validator.Validator.validate(Validator.java:260) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1491) ... 39 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382) ... 45 more 18/10/31 12:02:31 INFO ShutdownHookManager: Shutdown hook called 18/10/31 12:02:31 INFO ShutdownHookManager: Deleting directory /private/var/folders/6b/y1010qp107j9w2dhhy8csvz0000xq3/T/spark-5e649891-8a0f-4f17-bf3a-33b34082eba8 ``` Suggested reviews: mccheah liyinan926 - this is the follow up fix to the bug discovered while working on SPARK-25809 (PR #22805) Closes #22904 from rvesse/SPARK-25887. Authored-by: Rob Vesse <rvesse@dotnetrdf.org> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-01-22 10:25:21 -08:00
Adrian Tanase	5fb5a0292d	[MINOR][K8S] add missing docs for podTemplateContainerName properties ## What changes were proposed in this pull request? Adding docs for an enhancement that came in late in this PR: #22146 Currently the docs state that we're going to use the first container in a pod template, which was the implementation for some time, until it was improved with 2 new properties. ## How was this patch tested? I tested that the properties work by combining pod templates with client-mode and a simple pod template. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #23155 from aditanase/k8s-readme. Authored-by: Adrian Tanase <atanase@adobe.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-07 19:03:38 -06:00
Stavros Kontopoulos	0c2935b01d	[SPARK-25515][K8S] Adds a config option to keep executor pods for debugging ## What changes were proposed in this pull request? Keeps K8s executor resources present if case of failure or normal termination. Introduces a new boolean config option: `spark.kubernetes.deleteExecutors`, with default value set to true. The idea is to update Spark K8s backend structures but leave the resources around. The assumption is that since entries are not removed from the `removedExecutorsCache` we are immune to updates that refer to the the executor resources previously removed. The only delete operation not touched is the one in the `doKillExecutors` method. Reason is right now we dont support [blacklisting](https://issues.apache.org/jira/browse/SPARK-23485) and dynamic allocation with Spark on K8s. In both cases in the future we might want to handle these scenarios although its more complicated. More tests can be added if approach is approved. ## How was this patch tested? Manually by running a Spark job and verifying pods are not deleted. Closes #23136 from skonto/keep_pods. Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Signed-off-by: Yinan Li <ynli@google.com>	2018-12-03 09:02:47 -08:00
Rob Vesse	1144df3b5d	[SPARK-26015][K8S] Set a default UID for Spark on K8S Images Adds USER directives to the Dockerfiles which is configurable via build argument (`spark_uid`) for easy customisation. A `-u` flag is added to `bin/docker-image-tool.sh` to make it easy to customise this e.g. ``` > bin/docker-image-tool.sh -r rvesse -t uid -u 185 build > bin/docker-image-tool.sh -r rvesse -t uid push ``` If no UID is explicitly specified it defaults to `185` - this is per skonto's suggestion to align with the OpenShift standard reserved UID for Java apps ( https://lists.openshift.redhat.com/openshift-archives/users/2016-March/msg00283.html) Notes: - We have to make the `WORKDIR` writable by the root group or otherwise jobs will fail with `AccessDeniedException` To Do: - [x] Debug and resolve issue with client mode test - [x] Consider whether to always propagate `SPARK_USER_NAME` to environment of driver and executor pods so `entrypoint.sh` can insert that into `/etc/passwd` entry - [x] Rebase once PR #23013 is merged and update documentation accordingly Built the Docker images with the new Dockerfiles that include the `USER` directives. Ran the Spark on K8S integration tests against the new images. All pass except client mode which I am currently debugging further. Also manually dropped myself into the resulting container images via `docker run` and checked `id -u` output to see that UID is as expected. Tried customising the UID from the default via the new `-u` argument to `docker-image-tool.sh` and again checked the resulting image for the correct runtime UID. cc felixcheung skonto vanzin Closes #23017 from rvesse/SPARK-26015. Authored-by: Rob Vesse <rvesse@dotnetrdf.org> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2018-11-29 10:00:12 -08:00
Nihar Sheth	3df307aa51	[SPARK-25960][K8S] Support subpath mounting with Kubernetes ## What changes were proposed in this pull request? This PR adds configurations to use subpaths with Spark on k8s. Subpaths (https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath) allow the user to specify a path within a volume to use instead of the volume's root. ## How was this patch tested? Added unit tests. Ran SparkPi on a cluster with event logging pointed at a subpath-mount and verified the driver host created and used the subpath. Closes #23026 from NiharS/k8s_subpath. Authored-by: Nihar Sheth <niharrsheth@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2018-11-26 11:06:02 -08:00
Lee moon soo	eea4a0330b	[MINOR][K8S] Invalid property "spark.driver.pod.name" is referenced in docs. ## What changes were proposed in this pull request? "Running on Kubernetes" references `spark.driver.pod.name` few places, and it should be `spark.kubernetes.driver.pod.name`. ## How was this patch tested? See changes Closes #23133 from Leemoonsoo/fix-driver-pod-name-prop. Authored-by: Lee moon soo <moon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-11-24 16:09:13 -08:00
Nagaram Prasad Addepally	9b48107f9c	[SPARK-25957][K8S] Make building alternate language binding docker images optional ## What changes were proposed in this pull request? bin/docker-image-tool.sh tries to build all docker images (JVM, PySpark and SparkR) by default. But not all spark distributions are built with SparkR and hence this script will fail on such distros. With this change, we make building alternate language binding docker images (PySpark and SparkR) optional. User has to specify dockerfile for those language bindings using -p and -R flags accordingly, to build the binding docker images. ## How was this patch tested? Tested following scenarios. bin/docker-image-tool.sh -r <repo> -t <tag> build --> Builds only JVM docker image (default behavior) bin/docker-image-tool.sh -r <repo> -t <tag> -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile build --> Builds both JVM and PySpark docker images bin/docker-image-tool.sh -r <repo> -t <tag> -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile -R kubernetes/dockerfiles/spark/bindings/R/Dockerfile build --> Builds JVM, PySpark and SparkR docker images. Author: Nagaram Prasad Addepally <ram@cloudera.com> Closes #23053 from ramaddepally/SPARK-25957.	2018-11-21 15:51:37 -08:00
Rob Vesse	2aef79a65a	[SPARK-25023] More detailed security guidance for K8S ## What changes were proposed in this pull request? Highlights specific security issues to be aware of with Spark on K8S and recommends K8S mechanisms that should be used to secure clusters. ## How was this patch tested? N/A - Documentation only CC felixcheung tgravescs skonto Closes #23013 from rvesse/SPARK-25023. Authored-by: Rob Vesse <rvesse@dotnetrdf.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-11-16 08:53:29 -06:00
Thomas Graves	c00186f90c	[SPARK-25023] Clarify Spark security documentation ## What changes were proposed in this pull request? Clarify documentation about security. ## How was this patch tested? None, just documentation Closes #22852 from tgravescs/SPARK-25023. Authored-by: Thomas Graves <tgraves@thirteenroutine.corp.gq1.yahoo.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2018-11-02 10:56:30 -05:00
Onur Satici	f6cc354d83	[SPARK-24434][K8S] pod template files ## What changes were proposed in this pull request? New feature to pass podspec files for driver and executor pods. ## How was this patch tested? new unit and integration tests - [x] more overwrites in integration tests - [ ] invalid template integration test, documentation Author: Onur Satici <osatici@palantir.com> Author: Yifei Huang <yifeih@palantir.com> Author: onursatici <onursatici@gmail.com> Closes #22146 from onursatici/pod-template.	2018-10-30 13:52:44 -07:00
Ilan Filonenko	e9b71c8f01	[SPARK-25828][K8S] Bumping Kubernetes-Client version to 4.1.0 ## What changes were proposed in this pull request? Changed the `kubernetes-client` version and refactored code that broke as a result ## How was this patch tested? Unit and Integration tests Closes #22820 from ifilonenko/SPARK-25828. Authored-by: Ilan Filonenko <ifilondz@gmail.com> Signed-off-by: Erik Erlandson <eerlands@redhat.com>	2018-10-26 15:59:12 -07:00
Ilan Filonenko	19ada15d1b	[SPARK-24516][K8S] Change Python default to Python3 ## What changes were proposed in this pull request? As this is targeted for 3.0.0 and Python2 will be deprecated by Jan 1st, 2020, I feel it is appropriate to change the default to Python3. Especially as these projects [found here](https://python3statement.org/) are deprecating their support. ## How was this patch tested? Unit and Integration tests Author: Ilan Filonenko <ifilondz@gmail.com> Closes #22810 from ifilonenko/SPARK-24516.	2018-10-24 23:29:47 -07:00
Ilan Filonenko	6c9c84ffb9	[SPARK-23257][K8S] Kerberos Support for Spark on K8S ## What changes were proposed in this pull request? This is the work on setting up Secure HDFS interaction with Spark-on-K8S. The architecture is discussed in this community-wide google [doc](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg) This initiative can be broken down into 4 Stages STAGE 1 - [x] Detecting `HADOOP_CONF_DIR` environmental variable and using Config Maps to store all Hadoop config files locally, while also setting `HADOOP_CONF_DIR` locally in the driver / executors STAGE 2 - [x] Grabbing `TGT` from `LTC` or using keytabs+principle and creating a `DT` that will be mounted as a secret or using a pre-populated secret STAGE 3 - [x] Driver STAGE 4 - [x] Executor ## How was this patch tested? Locally tested on a single-noded, pseudo-distributed Kerberized Hadoop Cluster - [x] E2E Integration tests https://github.com/apache/spark/pull/22608 - [ ] Unit tests ## Docs and Error Handling? - [x] Docs - [x] Error Handling ## Contribution Credit kimoonkim skonto Closes #21669 from ifilonenko/secure-hdfs. Lead-authored-by: Ilan Filonenko <if56@cornell.edu> Co-authored-by: Ilan Filonenko <ifilondz@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2018-10-15 15:48:51 -07:00
Liang-Chi Hsieh	dcb9a97f3e	[SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table ## What changes were proposed in this pull request? Markdown links are not working inside html table. We should use html link tag. ## How was this patch tested? Verified in IntelliJ IDEA's markdown editor and online markdown editor. Closes #22588 from viirya/SPARK-25262-followup. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>	2018-09-29 18:18:37 +08:00
Dongjoon Hyun	e99ba8d7c8	[SPARK-25262][DOC][FOLLOWUP] Fix missing markup tag ## What changes were proposed in this pull request? This adds a missing end markup tag. This should go `master` branch only. ## How was this patch tested? This is a doc-only change. Manual via `SKIP_API=1 jekyll build`. Closes #22584 from dongjoon-hyun/SPARK-25262. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: hyukjinkwon <gurwls223@apache.org>	2018-09-29 11:23:37 +08:00
Dongjoon Hyun	0b33f08683	[SPARK-23285][DOC][FOLLOWUP] Fix missing markup tag ## What changes were proposed in this pull request? This adds a missing markup tag. This should go to `master/branch-2.4`. ## How was this patch tested? Manual via `SKIP_API=1 jekyll build`. Closes #22585 from dongjoon-hyun/SPARK-23285. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-09-28 14:10:24 -07:00
Rob Vesse	da6fa3828b	[SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S ## What changes were proposed in this pull request? The default behaviour of Spark on K8S currently is to create `emptyDir` volumes to back `SPARK_LOCAL_DIRS`. In some environments e.g. diskless compute nodes this may actually hurt performance because these are backed by the Kubelet's node storage which on a diskless node will typically be some remote network storage. Even if this is enterprise grade storage connected via a high speed interconnect the way Spark uses these directories as scratch space (lots of relatively small short lived files) has been observed to cause serious performance degradation. Therefore we would like to provide the option to use K8S's ability to instead back these `emptyDir` volumes with `tmpfs`. Therefore this PR adds a configuration option that enables `SPARK_LOCAL_DIRS` to be backed by Memory backed `emptyDir` volumes rather than the default. Documentation is added to describe both the default behaviour plus this new option and its implications. One of which is that scratch space then counts towards your pods memory limits and therefore users will need to adjust their memory requests accordingly. NB - This is an alternative version of PR #22256 reduced to just the `tmpfs` piece ## How was this patch tested? Ran with this option in our diskless compute environments to verify functionality Author: Rob Vesse <rvesse@dotnetrdf.org> Closes #22323 from rvesse/SPARK-25262-tmpfs.	2018-09-06 16:18:59 -07:00
Yinan Li	dac099d082	[SPARK-24090][K8S] Update running-on-kubernetes.md ## What changes were proposed in this pull request? Updated documentation for Spark on Kubernetes for the upcoming 2.4.0. Please review http://spark.apache.org/contributing.html before opening a pull request. mccheah erikerlandson Closes #22224 from liyinan926/master. Authored-by: Yinan Li <ynli@google.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-08-27 15:55:34 -05:00
Ilan Filonenko	a791c29bd8	[SPARK-23984][K8S] Changed Python Version config to be camelCase ## What changes were proposed in this pull request? Small formatting change to have Python Version be camelCase as per request during PR review. ## How was this patch tested? Tested with unit and integration tests Author: Ilan Filonenko <if56@cornell.edu> Closes #22095 from ifilonenko/spark-py-edits.	2018-08-15 17:52:12 -07:00
mcheah	571a6f0574	[SPARK-23146][K8S] Support client mode. ## What changes were proposed in this pull request? Support client mode for the Kubernetes scheduler. Client mode works more or less identically to cluster mode. However, in client mode, the Spark Context needs to be manually bootstrapped with certain properties which would have otherwise been set up by spark-submit in cluster mode. Specifically: - If the user doesn't provide a driver pod name, we don't add an owner reference. This is for usage when the driver is not running in a pod in the cluster. In such a case, the driver can only provide a best effort to clean up the executors when the driver exits, but cleaning up the resources is not guaranteed. The executor JVMs should exit if the driver JVM exits, but the pods will still remain in the cluster in a COMPLETED or FAILED state. - The user must provide a host (spark.driver.host) and port (spark.driver.port) that the executors can connect to. When using spark-submit in cluster mode, spark-submit generates the headless service automatically; in client mode, the user is responsible for setting up their own connectivity. We also change the authentication configuration prefixes for client mode. ## How was this patch tested? Adding an integration test to exercise client mode support. Author: mcheah <mcheah@palantir.com> Closes #21748 from mccheah/k8s-client-mode.	2018-07-25 11:08:41 -07:00

1 2

63 commits