ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Dongjoon Hyun	4e8701a77d	[SPARK-35280][K8S] Promote KubernetesUtils to DeveloperApi ### What changes were proposed in this pull request? Since SPARK-22757, `KubernetesUtils` has been used as an important utility class by all K8s modules and `ExternalClusterManager`s. This PR aims to promote `KubernetesUtils` to `DeveloperApi` in order to maintain it officially in a backward compatible way at Apache Spark 3.2.0. ### Why are the changes needed? Apache Spark 3.1.1 makes `Kubernetes` module GA and provides an extensible external cluster manager framework. To have `ExternalClusterManager` for K8s environment, `KubernetesUtils` class is crucial and needs to be stable. By promoting to a subset of K8s developer API, we can maintain these more sustainable way and give a better and stable functionality to K8s users. In this PR, `Since` annotations denote the last function signature changes because these are going to become public at Apache Spark 3.2.0. \| Version \| Function Name \| \|-\|-\| \| 2.3.0 \| parsePrefixedKeyValuePairs \| \| 2.3.0 \| requireNandDefined \| \| 2.3.0 \| parsePrefixedKeyValuePairs \| \| 2.4.0 \| parseMasterUrl \| \| 3.0.0 \| requireBothOrNeitherDefined \| \| 3.0.0 \| requireSecondIfFirstIsDefined \| \| 3.0.0 \| selectSparkContainer \| \| 3.0.0 \| formatPairsBundle \| \| 3.0.0 \| formatPodState \| \| 3.0.0 \| containersDescription \| \| 3.0.0 \| containerStatusDescription \| \| 3.0.0 \| formatTime \| \| 3.0.0 \| uniqueID \| \| 3.0.0 \| buildResourcesQuantities \| \| 3.0.0 \| uploadAndTransformFileUris \| \| 3.0.0 \| uploadFileUri \| \| 3.0.0 \| requireBothOrNeitherDefined \| \| 3.0.0 \| buildPodWithServiceAccount \| \| 3.0.0 \| isLocalAndResolvable \| \| 3.1.1 \| renameMainAppResource \| \| 3.1.1 \| addOwnerReference \| \| 3.2.0 \| loadPodFromTemplate \| ### Does this PR introduce _any_ user-facing change? Yes, but this is new API additions. ### How was this patch tested? Pass the CIs. Closes #32406 from dongjoon-hyun/SPARK-35280. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-04-30 11:39:18 -07:00
Dongjoon Hyun	6ab00488d0	[SPARK-35182][K8S] Support driver-owned on-demand PVC ### What changes were proposed in this pull request? This PR aims to support driver-owned on-demand PVC(Persistent Volume Claim)s. It means dynamically-created PVCs will have the `ownerReference` to `driver` pod instead of `executor` pod. ### Why are the changes needed? This allows K8s backend scheduler can reuse this later. BEFORE ``` $ k get pvc tpcds-pvc-exec-1-pvc-0 -oyaml apiVersion: v1 kind: PersistentVolumeClaim metadata: ... ownerReferences: - apiVersion: v1 controller: true kind: Pod name: tpcds-pvc-exec-1 ``` AFTER ``` $ k get pvc tpcds-pvc-exec-1-pvc-0 -oyaml apiVersion: v1 kind: PersistentVolumeClaim metadata: ... ownerReferences: - apiVersion: v1 controller: true kind: Pod name: tpcds-pvc ``` ### Does this PR introduce _any_ user-facing change? No. (The default is `false`) ### How was this patch tested? Manually check the above and pass K8s IT. ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - SPARK-33615: Launcher client archives - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python - Launcher python client dependencies using a zip file - Test basic decommissioning - Test basic decommissioning with shuffle cleanup - Test decommissioning with dynamic allocation & shuffle cleanups - Test decommissioning timeouts - Run SparkR on simple dataframe.R example Run completed in 16 minutes, 40 seconds. Total number of tests run: 27 Suites: completed 2, aborted 0 Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes #32288 from dongjoon-hyun/SPARK-35182. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-04-22 17:03:19 -07:00
Dongjoon Hyun	00f06dd267	[SPARK-35131][K8S] Support early driver service clean-up during app termination ### What changes were proposed in this pull request? This PR aims to support a new configuration, `spark.kubernetes.driver.service.deleteOnTermination`, to clean up `Driver Service` resource during app termination. ### Why are the changes needed? The K8s service is one of the important resources and sometimes it's controlled by quota. ``` $ k describe quota Name: service Namespace: default Resource Used Hard -------- ---- ---- services 1 3 ``` Apache Spark creates a service for driver whose lifecycle is the same with driver pod. It means a new Spark job submission fails if the number of completed Spark jobs equals the number of service quota. BEFORE ``` $ k get pod NAME READY STATUS RESTARTS AGE org-apache-spark-examples-sparkpi-a32c9278e7061b4d-driver 0/1 Completed 0 31m org-apache-spark-examples-sparkpi-a9f1f578e721ef62-driver 0/1 Completed 0 78s $ k get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 80m org-apache-spark-examples-sparkpi-a32c9278e7061b4d-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 31m org-apache-spark-examples-sparkpi-a9f1f578e721ef62-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 80s $ k describe quota Name: service Namespace: default Resource Used Hard -------- ---- ---- services 3 3 $ bin/spark-submit... Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://192.168.64.50:8443/api/v1/namespaces/default/services. Message: Forbidden! User minikube doesn't have permission. services "org-apache-spark-examples-sparkpi-843f6978e722819c-driver-svc" is forbidden: exceeded quota: service, requested: services=1, used: services=3, limited: services=3. ``` AFTER ``` $ k get pod NAME READY STATUS RESTARTS AGE org-apache-spark-examples-sparkpi-23d5f278e77731a7-driver 0/1 Completed 0 26s org-apache-spark-examples-sparkpi-d1292278e7768ed4-driver 0/1 Completed 0 67s org-apache-spark-examples-sparkpi-e5bedf78e776ea9d-driver 0/1 Completed 0 44s $ k get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 172m $ k describe quota Name: service Namespace: default Resource Used Hard -------- ---- ---- services 1 3 ``` ### Does this PR introduce _any_ user-facing change? Yes, this PR adds a new configuration, `spark.kubernetes.driver.service.deleteOnTermination`, and enables it by default. The change is documented at the migration guide. ### How was this patch tested? Pass the CIs. This is tested with K8s IT manually. ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - SPARK-33615: Launcher client archives - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python - Launcher python client dependencies using a zip file - Test basic decommissioning - Test basic decommissioning with shuffle cleanup - Test decommissioning with dynamic allocation & shuffle cleanups - Test decommissioning timeouts - Run SparkR on simple dataframe.R example Run completed in 19 minutes, 9 seconds. Total number of tests run: 27 Suites: completed 2, aborted 0 Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes #32226 from dongjoon-hyun/SPARK-35131. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-04-19 12:11:08 -07:00
Dongjoon Hyun	425dc58c02	[SPARK-35125][K8S] Upgrade K8s client to 5.3.0 to support K8s 1.20 ### What changes were proposed in this pull request? Although AS-IS master branch already works with K8s 1.20, this PR aims to upgrade K8s client to 5.3.0 to support K8s 1.20 officially. - https://github.com/fabric8io/kubernetes-client#compatibility-matrix The following are the notable breaking API changes. 1. Remove Doneable (5.0+): - https://github.com/fabric8io/kubernetes-client/pull/2571 2. Change Watcher.onClose signature (5.0+): - https://github.com/fabric8io/kubernetes-client/pull/2616 3. Change Readiness (5.1+) - https://github.com/fabric8io/kubernetes-client/pull/2796 ### Why are the changes needed? According to the compatibility matrix, this makes Apache Spark and its external cluster manager extension support all K8s 1.20 features officially for Apache Spark 3.2.0. ### Does this PR introduce _any_ user-facing change? Yes, this is a dev dependency change which affects K8s cluster extension users. ### How was this patch tested? Pass the CIs. This is manually tested with K8s IT. ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - SPARK-33615: Launcher client archives - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python - Launcher python client dependencies using a zip file - Test basic decommissioning - Test basic decommissioning with shuffle cleanup - Test decommissioning with dynamic allocation & shuffle cleanups - Test decommissioning timeouts - Run SparkR on simple dataframe.R example Run completed in 17 minutes, 44 seconds. Total number of tests run: 27 Suites: completed 2, aborted 0 Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes #32221 from dongjoon-hyun/SPARK-K8S-530. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-04-19 07:39:38 -07:00
“attilapiros”	8a3815f722	[SPARK-34789][TEST] Introduce Jetty based construct for integration tests where HTTP server is used ### What changes were proposed in this pull request? Introducing a new test construct: ``` withHttpServer() { baseURL => ... } ``` Which starts and stops a Jetty server to serve files via HTTP. Moreover this PR uses this new construct in the test `Run SparkRemoteFileTest using a remote data file`. ### Why are the changes needed? Before this PR github URLs was used like "https://raw.githubusercontent.com/apache/spark/master/data/mllib/pagerank_data.txt". This connects two Spark version in an unhealthy way like connecting the "master" branch which is moving part with the committed test code which is a non-moving (as it might be even released). So this way a test running for an earlier version of Spark expects something (filename, content, path) from a the latter release and what is worse when the moving version is changed the earlier test will break. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit test. Closes #31935 from attilapiros/SPARK-34789. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-04-14 21:22:52 -07:00
Dongjoon Hyun	a42dc93a2a	[SPARK-34948][K8S] Add ownerReference to executor configmap to fix leakages ### What changes were proposed in this pull request? This PR aims to add `ownerReference` to the executor ConfigMap to fix leakage. ### Why are the changes needed? SPARK-30985 maintains the executor config map explicitly inside Spark. However, this config map can be leaked when Spark drivers die accidentally or are killed by K8s. We need to add `ownerReference` to make K8s do the garbage collection these automatically. The number of ConfigMap is one of the resource quota. So, the leaked configMaps currently cause Spark jobs submission failures. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and check manually. K8s IT is tested manually. ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - SPARK-33615: Launcher client archives - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python - Launcher python client dependencies using a zip file - Test basic decommissioning - Test basic decommissioning with shuffle cleanup - Test decommissioning with dynamic allocation & shuffle cleanups - Test decommissioning timeouts - Run SparkR on simple dataframe.R example Run completed in 19 minutes, 2 seconds. Total number of tests run: 27 Suites: completed 2, aborted 0 Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` BEFORE ``` $ k get cm spark-exec-450b417895b3b2c7-conf-map -oyaml \| grep ownerReferences ``` AFTER ``` $ k get cm spark-exec-bb37a27895b1c26c-conf-map -oyaml \| grep ownerReferences f:ownerReferences: ``` Closes #32042 from dongjoon-hyun/SPARK-34948. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-04-03 00:00:17 -07:00
“attilapiros”	c8b7a09d39	[SPARK-34869][K8S][TEST] Extend "EXTRA LOGS FOR THE FAILED TEST" section of k8s integration test log with the describe pods output ### What changes were proposed in this pull request? Extending "EXTRA LOGS FOR THE FAILED TEST" section of k8s integration test log with `kubectl describe pods` output for the failed test. ### Why are the changes needed? PR builds frequently fails as the k8s integration tests are very flaky now in Amplab Jenkins environment. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Locally by making temporary one of the test fail. The output is: ``` 21/03/25 16:55:16.722 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: ===== EXTRA LOGS FOR THE FAILED TEST 21/03/25 16:55:17.167 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: BEGIN driver DESCRIBE POD Name: spark-test-app-a2b03971b7c049e8a2629f6a3198842b Namespace: 35bdb17e308743afaec17538f89a7c3e Priority: 0 Node: minikube/192.168.64.119 Start Time: Thu, 25 Mar 2021 16:52:10 +0100 Labels: spark-app-locator=75f695685ae44314a99ec13bb39332bc spark-app-selector=spark-150230742d364a77927a08eed0222065 spark-role=driver Annotations: <none> Status: Succeeded IP: 172.17.0.4 Containers: spark-kubernetes-driver: Container ID: docker://d6d27b0551060d9b094f12d1e232dfb5ae78ce38559680c7126c548996da4d95 Image: docker.io/kubespark/spark:3.2.0-SNAPSHOT_9575B805-9CB0-4A16-8A31-AA2F8DDA8EE5 Image ID: docker://sha256:3fc556c73a0d5187b5a14dbdc2f69ef292e60b544b4b4d3715f6749417c20918 Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi local:///opt/spark/examples/jars/spark-examples_2.12-3.2.0-SNAPSHOT.jar State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 25 Mar 2021 16:52:11 +0100 Finished: Thu, 25 Mar 2021 16:52:20 +0100 Ready: False Restart Count: 0 Limits: memory: 1408Mi Requests: cpu: 1 memory: 1408Mi Environment: SPARK_USER: attilazsoltpiros SPARK_APPLICATION_ID: spark-150230742d364a77927a08eed0222065 SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP) SPARK_LOCAL_DIRS: /var/data/spark-dab6f1c9-e538-40c8-a7d9-3e88f9b82cfa SPARK_CONF_DIR: /opt/spark/conf Mounts: /opt/spark/conf from spark-conf-volume-driver (rw) /var/data/spark-dab6f1c9-e538-40c8-a7d9-3e88f9b82cfa from spark-local-dir-1 (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-nmfwl (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: spark-local-dir-1: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> spark-conf-volume-driver: Type: ConfigMap (a volume populated by a ConfigMap) Name: spark-drv-c60832786a15ffbe-conf-map Optional: false default-token-nmfwl: Type: Secret (a volume populated by a Secret) SecretName: default-token-nmfwl Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 3m7s default-scheduler Successfully assigned 35bdb17e308743afaec17538f89a7c3e/spark-test-app-a2b03971b7c049e8a2629f6a3198842b to minikube Normal Pulled 3m7s kubelet, minikube Container image "docker.io/kubespark/spark:3.2.0-SNAPSHOT_9575B805-9CB0-4A16-8A31-AA2F8DDA8EE5" already present on machine Normal Created 3m7s kubelet, minikube Created container spark-kubernetes-driver Normal Started 3m6s kubelet, minikube Started container spark-kubernetes-driver 21/03/25 16:55:17.168 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: END driver DESCRIBE POD ``` Closes #31962 from attilapiros/SPARK-34869. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-03-28 09:44:56 -07:00
hongdongdong	985c653b20	[SPARK-33720][K8S] Support submit to k8s only with token ### What changes were proposed in this pull request? Support submit to k8s only with token. ### Why are the changes needed? Now, sumbit to k8s always need oauth files. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Before, submit job out of k8s cluster without correct ca.crt, we may get this exception: ``` Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:439) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:306) at sun.security.validator.Validator.validate(Validator.java:271) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:312) ``` When set spark.kubernetes.trust.certificates = true, we can submit only with correct token, no need to config ca.crt in local env. Submit as: ``` bin/spark-submit \ --master $master \ --name pi \ --deploy-mode cluster \ --conf spark.kubernetes.container.image=$image \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.authenticate.submission.oauthToken=$clusterToken \ --conf spark.kubernetes.trust.certificates=true \ local:///opt/spark/examples/src/main/python/pi.py 200 ``` Closes #30684 from hddong/trust-certs. Authored-by: hongdongdong <hongdongdong@cmss.chinamobile.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-03-23 22:07:27 -07:00
Yikun Jiang	31da90762e	[SPARK-34820][K8S][R] add apt-update before gnupg install ### What changes were proposed in this pull request? We added the gnupg installation in https://github.com/apache/spark/pull/30130 , we should do apt update before gnupg isntallation, otherwise we will get a fetch error when package is updated. See more in: [1] http://apache-spark-developers-list.1001551.n3.nabble.com/K8s-Integration-test-is-unable-to-run-because-of-the-unavailable-libs-td30986.html ### Why are the changes needed? add a apt-update cmd before gnupg installation to avoid invaild package cache list. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? K8s Integration test passed Closes #31923 from Yikun/SPARK-34820. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-03-22 10:13:31 -07:00
Dongjoon Hyun	2fa792aa64	[SPARK-34783][K8S] Support remote template files ### What changes were proposed in this pull request? This PR aims to support remote driver/executor template files. ### Why are the changes needed? Currently, `KubernetesUtils.loadPodFromTemplate` supports only local files. With this PR, we can do the following. ```bash bin/spark-submit \ ... -c spark.kubernetes.driver.podTemplateFile=s3a://dongjoon/driver.yml \ -c spark.kubernetes.executor.podTemplateFile=s3a://dongjoon/executor.yml \ ... ``` ### Does this PR introduce _any_ user-facing change? Yes, this is an improvement. ### How was this patch tested? Manual testing. Closes #31877 from dongjoon-hyun/SPARK-34783-2. Lead-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-03-19 08:52:42 -07:00
“attilapiros”	124b5af114	[SPARK-34732][K8S][TESTS] Fix IndexOutOfBoundsException in logForFailedTest when driver is not started ### What changes were proposed in this pull request? Fixing `IndexOutOfBoundsException` in `logForFailedTest` method when driver is not started. ### Why are the changes needed? Before this PR when the driver is not started an `IndexOutOfBoundsException` as the first item is tried to be accessed from an empty list: ``` - PVs with local storage * FAILED * java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:659) at java.util.ArrayList.get(ArrayList.java:435) at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.logForFailedTest(KubernetesSuite.scala:83) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:181) at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200) at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:61) ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Running integration tests. After this changes the above error become: ``` - PVs with local storage * FAILED * java.io.IOException: No such file or directory at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createTempFile(File.java:2026) at org.apache.spark.deploy.k8s.integrationtest.Utils$.createTempFile(Utils.scala:103) at org.apache.spark.deploy.k8s.integrationtest.PVTestsSuite.$anonfun$$init$$1(PVTestsSuite.scala:135) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) ... ``` Closes #31824 from attilapiros/SPARK-34732. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-03-13 15:28:02 -08:00
“attilapiros”	6c5322de61	[SPARK-34361][K8S] In case of downscaling avoid killing of executors already known by the scheduler backend in the pod allocator ### What changes were proposed in this pull request? This PR modifies the POD allocator to use the scheduler backend to get the known executors and remove those from the pending and newly created list. This is different from the normal `ExecutorAllocationManager` requested killing of executors where the `spark.dynamicAllocation.executorIdleTimeout` is used. In this case POD allocator kills the executors which should be only responsible for terminating not satisfied POD allocations (new requests where no POD state is received yet and PODs in pending state). ### Why are the changes needed? Because there is race between executor POD allocator and cluster scheduler backend. Running several experiment during downscaling we experienced a lot of killed fresh executors wich has already running task on them. The pattern in the log was the following (see executor 312 and TID 2079): ``` 21/02/01 15:12:03 INFO ExecutorMonitor: New executor 312 has registered (new total is 138) ... 21/02/01 15:12:03 INFO TaskSetManager: Starting task 247.0 in stage 4.0 (TID 2079, 100.100.18.138, executor 312, partition 247, PROCESS_LOCAL, 8777 bytes) 21/02/01 15:12:03 INFO ExecutorPodsAllocator: Deleting 3 excess pod requests (408,312,307). ... 21/02/01 15:12:04 ERROR TaskSchedulerImpl: Lost executor 312 on 100.100.18.138: The executor with id 312 was deleted by a user or the framework. 21/02/01 15:12:04 INFO TaskSetManager: Task 2079 failed because while it was being computed, its executor exited for a reason unrelated to the task. Not counting this failure towards the maximum number of failures for the task. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? #### Manually With this change there was no executor lost with running task on it. ##### With unit test A new test is added and existing test is modified to check these cases. Closes #31513 from attilapiros/SPARK-34361. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2021-03-02 16:58:29 -08:00
Dongjoon Hyun	4d428a821b	Revert "[SPARK-32617][K8S][TESTS] Configure kubernetes client based on kubeconfig settings in kubernetes integration tests" This reverts commit `b17754a8cb`.	2021-02-25 17:10:58 -08:00
“attilapiros”	b17754a8cb	[SPARK-32617][K8S][TESTS] Configure kubernetes client based on kubeconfig settings in kubernetes integration tests ### What changes were proposed in this pull request? From [minikube version v1.1.0](https://github.com/kubernetes/minikube/blob/v1.1.0/CHANGELOG.md) kubectl is available as a command. So the kubeconfig settings can be accessed like: ``` $ minikube kubectl config view apiVersion: v1 clusters: - cluster: certificate-authority: /Users/attilazsoltpiros/.minikube/ca.crt server: https://127.0.0.1:32788 name: minikube contexts: - context: cluster: minikube namespace: default user: minikube name: minikube current-context: minikube kind: Config preferences: {} users: - name: minikube user: client-certificate: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.crt client-key: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.key ``` Here the vm-driver was docker and the server port (https://127.0.0.1:32788) is different from the hardcoded 8443. So the main part of this PR is introducing kubernetes client configuration based on the kubeconfig (output of `minikube kubectl config view`) in case of minikube versions after v1.1.0 and the old legacy way of configuration is also kept as minikube version should be supported back to v0.34.1 . Moreover as the old style of config parsing pattern wasn't sufficient in my case as when the `minikube kubectl config view` is called kubectl downloading message might be included before the first key I changed it even for the existent keys to be a consistent pattern in this file. The old parsing in an example: ``` private val HOST_PREFIX = "host:" val hostString = statusString.find(_.contains(s"$HOST_PREFIX ")) val status1 = hostString.get.split(HOST_PREFIX)(1) ``` The new parsing: ``` private val HOST_PREFIX = "host: " val hostString = statusString.find(_.contains(HOST_PREFIX)) hostString.get.split(HOST_PREFIX)(1) ``` So the PREFIX is extended with the extra space at the declaration (this way the two separate string operation are more safe and consistent with each other) and the replace is changed to split and getting the 2nd string from the result (which is guaranteed to contain only the text after the PREFIX when the PREFIX is a contained substring). Finally there is tiny change in `dev-run-integration-tests.sh` to introduce `--skip-building-dependencies` which switchs off building of maven dependencies of `kubernetes-integration-tests` from the Spark project. This could be used when only the `kubernetes-integration-tests` should be rebuilded as only the tests are modified. ### Why are the changes needed? Kubernetes client configuration based on kubeconfig settings is more reliable and provides a solution which is minikube version independent. ### Does this PR introduce _any_ user-facing change? No. This is only test code. ### How was this patch tested? tested manually on two minikube versions. Minikube v0.34.1: ``` $ minikube version minikube version: v0.34.1 $ grep "version\\|building" resource-managers/kubernetes/integration-tests/target/integration-tests.log 20/12/12 12:52:25.135 ScalaTest-main-running-DiscoverySuite INFO Minikube: minikube version: v0.34.1 20/12/12 12:52:25.761 ScalaTest-main-running-DiscoverySuite INFO Minikube: building kubernetes config with apiVersion: v1, masterUrl: https://192.168.99.103:8443, caCertFile: /Users/attilazsoltpiros/.minikube/ca.crt, clientCertFile: /Users/attilazsoltpiros/.minikube/apiserver.crt, clientKeyFile: /Users/attilazsoltpiros/.minikube/apiserver.key ``` Minikube v1.15.1 ``` $ minikube version minikube version: v1.15.1 commit: 23f40a012abb52eff365ff99a709501a61ac5876 $ grep "version\\|building" resource-managers/kubernetes/integration-tests/target/integration-tests.log 20/12/13 06:25:55.086 ScalaTest-main-running-DiscoverySuite INFO Minikube: minikube version: v1.15.1 20/12/13 06:25:55.597 ScalaTest-main-running-DiscoverySuite INFO Minikube: building kubernetes config with apiVersion: v1, masterUrl: https://192.168.64.4:8443, caCertFile: /Users/attilazsoltpiros/.minikube/ca.crt, clientCertFile: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.crt, clientKeyFile: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.key $ minikube kubectl config view apiVersion: v1 clusters: - cluster: certificate-authority: /Users/attilazsoltpiros/.minikube/ca.crt server: https://192.168.64.4:8443 name: minikube contexts: - context: cluster: minikube namespace: default user: minikube name: minikube current-context: minikube kind: Config preferences: {} users: - name: minikube user: client-certificate: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.crt client-key: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.key ``` Closes #30751 from attilapiros/SPARK-32617. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2021-02-24 11:46:27 -08:00
Dongjoon Hyun	9942548c37	[SPARK-34487][K8S][TESTS] Use the runtime Hadoop version in K8s IT ### What changes were proposed in this pull request? This PR aims to use the runtime Hadoop version in K8s integration test. ### Why are the changes needed? SPARK-33212 upgrades Hadoop dependency from 3.2.0 to 3.2.2 and we will upgrade to 3.3.x+. We had better use the runtime Hadoop version instead of having a static string. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the K8s IT. This is tested locally like the following. ``` KubernetesSuite: ... - Launcher client dependencies - SPARK-33615: Launcher client archives - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python - Launcher python client dependencies using a zip file ... ``` Closes #31604 from dongjoon-hyun/SPARK-34487. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-02-21 08:57:02 -08:00
Dongjoon Hyun	020e84e92f	[SPARK-34486][K8S] Upgrade kubernetes-client to 4.13.2 ### What changes were proposed in this pull request? This PR aims to upgrade `kubernetes-client` library from 4.12.0 to 4.13.2 for Apache Spark 3.2.0. ### Why are the changes needed? This will bring [K8s 1.19.1](https://github.com/fabric8io/kubernetes-client/pull/2541) models officially and the latest bug fixes. - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.0 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.1 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.13.2 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the K8s IT and UT. ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - SPARK-33615: Launcher client archives - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python - Launcher python client dependencies using a zip file - Test basic decommissioning - Test basic decommissioning with shuffle cleanup - Test decommissioning with dynamic allocation & shuffle cleanups - Test decommissioning timeouts - Run SparkR on simple dataframe.R example Run completed in 19 minutes, 25 seconds. Total number of tests run: 27 Suites: completed 2, aborted 0 Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes #31602 from dongjoon-hyun/SPARK-34486. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2021-02-21 18:35:38 +09:00
yi.wu	546d2eb5d4	[SPARK-34384][CORE] Add missing docs for ResourceProfile APIs ### What changes were proposed in this pull request? This PR adds missing docs for ResourceProfile related APIs. Besides, it includes a few minor changes on API: * ResourceProfileBuilder.build -> ResourceProfileBuilder.builder() * Provides java specific API `allSupportedExecutorResourcesJList` * private `ResourceAllocator` since it was mistakenly exposed previously ### Why are the changes needed? Add missing API docs ### Does this PR introduce _any_ user-facing change? No, as Apache Spark 3.1 hasn't officially released. ### How was this patch tested? Updated unit tests due to the signature change of `build()`. Closes #31496 from Ngone51/resource-profile-api-cleanup. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2021-02-21 18:29:44 +09:00
Dongjoon Hyun	484a83e73e	[SPARK-34469][K8S] Ignore RegisterExecutor when SparkContext is stopped ### What changes were proposed in this pull request? This PR aims to make `KubernetesClusterSchedulerBackend` ignore `RegisterExecutor` message when `SparkContext` is stopped already. ### Why are the changes needed? If `SparkDriver` is terminated, the executors will be removed by K8s automatically. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the newly added test case. Closes #31587 from dongjoon-hyun/SPARK-34469. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-02-19 09:36:07 -08:00
“attilapiros”	76e5d75e36	[SPARK-33763] Add metrics for better tracking of dynamic allocation ### What changes were proposed in this pull request? This PR adds the following metrics to track executor remove reasons during dynamic allocation: - `numberExecutorsGracefullyDecommissioned`: number of executors which reached the finished decommissioning state and shut itself down cleanly - `numberExecutorsDecommissionUnfinished`: executors which requested to decommission but they stopped without reaching the finished decommissioning state - `numberExecutorsKilledByDriver`: executors killed by the driver (requested to stop) - `numberExecutorsExitedUnexpectedly`: executors exited without driver request ### Why are the changes needed? For supporting monitoring of dynamic allocation better with these metrics. ### Does this PR introduce _any_ user-facing change? Yes. The new metrics will be available for monitoring. ### How was this patch tested? With unit and integration tests. Finally manually checked the new metrics in jconsole: <img width="1054" alt="jmx" src="https://user-images.githubusercontent.com/2017933/107458686-de8adf00-6b54-11eb-86f7-41faf2fb638f.png"> Closes #31450 from attilapiros/SPARK-33763-final. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2021-02-17 13:44:36 -08:00
“attilapiros”	5f91245cc2	[SPARK-34426][K8S][TESTS] Add driver and executors POD logs to integration tests log when the test fails ### What changes were proposed in this pull request? This PR introduces a new protected method in `SparkFunSuite` which is only called when the test failed and can be used to collect logs for failed test. By this PR it is implemented in the Kubernetes tests by `KubernetesSuite` class where it collects all the POD logs and logs them out. This unfortunately cannot be realized with a simple "after" method as in the "after" method the test outcome is not available. Moreover this PR removes the `appLocator` as a method argument as `appLocator` is available as a member variable. ### Why are the changes needed? Currently both the driver and executors logs are lost. In [developer-tools](https://spark.apache.org/developer-tools.html) there is a hint: "Getting logs from the pods and containers directly is an exercise left to the reader." But when the test is executed by Jenkins and a failure happened we really need the POD logs to analyze problem. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By integration testing. I have checked what would happen if one test fails, the output would be: ``` 21/02/14 11:05:34.261 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: ===== EXTRA LOGS FOR THE FAILED TEST 21/02/14 11:05:34.278 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: BEGIN driver POD log ++ id -u + myuid=185 ++ id -g + mygid=0 + set +e ++ getent passwd 185 + uidentry= + set -e + '[' -z '' ']' + '[' -w /etc/passwd ']' + echo '185❌185:0:anonymous uid:/opt/spark:/bin/false' + SPARK_CLASSPATH=':/opt/spark/jars/' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]=$.$/\1/g' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' -z ']' + '[' -z ']' + '[' -n '' ']' + '[' -z ']' + '[' -z x ']' + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/' + case "$1" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.3 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///opt/spark/tests/decommissioning.py 21/02/14 10:02:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting decom test Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 21/02/14 10:02:29 INFO SparkContext: Running Spark version 3.2.0-SNAPSHOT 21/02/14 10:02:29 INFO ResourceUtils: ============================================================== 21/02/14 10:02:29 INFO ResourceUtils: No custom resources configured for spark.driver. 21/02/14 10:02:29 INFO ResourceUtils: ============================================================== ... 21/02/14 10:03:17 INFO ShutdownHookManager: Deleting directory /var/data/spark-fa6961ed-a2c1-444c-bfeb-20e63ba0b5cf/spark-ab4b0287-6e24-4b39-837e-9b0b62c1f26f 21/02/14 10:03:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-d6b11e7d-6a03-4a1d-8559-37cb853319bf 21/02/14 11:05:34.279 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: END driver POD log ``` Closes #31561 from attilapiros/SPARK-34426. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>	2021-02-17 05:49:16 +09:00
Holden Karau	5248ecb5ab	[SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes ### What changes were proposed in this pull request? Allow users to have Spark attempt to decommission excluded executors. Since excluded executors may be flaky, this also adds the ability for users to specify a time limit after which a decommissioning executor will be killed by Spark. ### Why are the changes needed? This may help prevent fetch failures from excluded executors, and also handle the situation in which executors ### Does this PR introduce _any_ user-facing change? Yes, two new configuration flags for the behaviour. ### How was this patch tested? Extended unit and integration tests. Closes #31539 from holdenk/re=enable-SPARK-34104-SPARK-34105. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2021-02-09 18:16:09 -08:00
HyukjinKwon	c8628c943c	Revert "[SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes" This reverts commit `50641d2e3d`.	2021-02-10 08:00:03 +09:00
Holden Karau	50641d2e3d	[SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes ### What changes were proposed in this pull request? Allow users to have Spark attempt to decommission excluded executors. Since excluded executors may be flaky, this also adds the ability for users to specify a time limit after which a decommissioning executor will be killed by Spark. ### Why are the changes needed? This may help prevent fetch failures from excluded executors, and also handle the situation in which executors ### Does this PR introduce _any_ user-facing change? Yes, two new configuration flags for the behaviour. ### How was this patch tested? Extended unit and integration tests. Closes #31249 from holdenk/configure-inaccessibleList-kill-to-use-decommissioning. Lead-authored-by: Holden Karau <hkarau@apple.com> Co-authored-by: Holden Karau <holden@pigscanfly.ca> Signed-off-by: Holden Karau <hkarau@apple.com>	2021-02-09 14:21:24 -08:00
“attilapiros”	b2dc38b654	[SPARK-34334][K8S] Correctly identify timed out pending pod requests as excess request ### What changes were proposed in this pull request? Fixing identification of timed-out pending pod requests as excess requests to delete when the excess is higher than the newly created timed out requests and there is some non-timed out newly created requests too. ### Why are the changes needed? After https://github.com/apache/spark/pull/29981 only timed out newly created requests and timed out pending requests are taken as excess request. But there is small bug when the excess is higher than the newly created timed out requests and there is some non-timed out newly created requests as well. Because all the newly created requests are counted as excess request when items are chosen from the timed out pod pending requests. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? There is new unit test added: `SPARK-34334: correctly identify timed out pending pod requests as excess`. Closes #31445 from attilapiros/SPARK-34334. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2021-02-09 10:06:55 -08:00
Dongjoon Hyun	ea339c38b4	[SPARK-34407][K8S] KubernetesClusterSchedulerBackend.stop should clean up K8s resources ### What changes were proposed in this pull request? This PR aims to fix `KubernetesClusterSchedulerBackend.stop` to wrap `super.stop` with `Utils.tryLogNonFatalError`. ### Why are the changes needed? [CoarseGrainedSchedulerBackend.stop](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L559) may throw `SparkException` and this causes K8s resource (pod and configmap) leakage. ### Does this PR introduce _any_ user-facing change? No. This is a bug fix. ### How was this patch tested? Pass the CI with the newly added test case. Closes #31533 from dongjoon-hyun/SPARK-34407. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-02-08 21:47:23 -08:00
yangjie01	b344e91368	[SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks' ### What changes were proposed in this pull request? `Mockito.initMocks(Object)` is a deprecated api, should use `Mockito.openMocks(Object).close()` instead. ### Why are the changes needed? Cleanup deprecation api usage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins or GitHub Action Closes #31487 from LuciferYang/mockito-api. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2021-02-08 15:13:00 +09:00
Dongjoon Hyun	f66e38c963	[SPARK-34316][K8S] Support spark.kubernetes.executor.disableConfigMap ### What changes were proposed in this pull request? This PR aims to add a new configuration `spark.kubernetes.executor.disableConfigMap`. ### Why are the changes needed? This can be use to disable config map creating for executor pods due to https://github.com/apache/spark/pull/27735 . ### Does this PR introduce _any_ user-facing change? No. By default, this doesn't change AS-IS behavior. This is a new feature to add an ability to disable SPARK-30985. ### How was this patch tested? Pass the newly added UT. Closes #31428 from dongjoon-hyun/SPARK-34316. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-02-01 22:26:07 -08:00
Dongjoon Hyun	78244bafe8	[SPARK-34281][K8S] Promote spark.kubernetes.executor.podNamePrefix to the public conf ### What changes were proposed in this pull request? This PR aims to remove `internal()` from `spark.kubernetes.executor.podNamePrefix` in order to make it the configuration public. ### Why are the changes needed? In line with K8s GA, this will allow some users control the full executor pod names officially. This is useful when we want a custom executor pod name pattern independently from the app name. ### Does this PR introduce _any_ user-facing change? No, this has been there since Apache Spark 2.3.0. ### How was this patch tested? N/A. Closes #31386 from dongjoon-hyun/SPARK-34281. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-01-28 13:01:18 -08:00
Dongjoon Hyun	134a7d7eb9	[SPARK-34206][K8S] Make Guava Cache as ExecutorPodsLifecycleManager private field ### What changes were proposed in this pull request? `KubernetesClusterManager` and `ExecutorPodsLifecycleManager` are private Spark classes. This PR aims to move `Guava Cache` from a constructor parameter to private field of `ExecutorPodsLifecycleManager`. ### Why are the changes needed? 1. Although `KubernetesClusterManager` creates `Guava Cache`, only `ExecutorPodsLifecycleManager` uses it. 2. Although `ExecutorPodsLifecycleManager` is a Spark private class, when some users implement a new cluster manager with `ExternalClusterManager` for K8s, they can reuse `ExecutorPodsLifecycleManager`. In this case, `Guava Cache` is not good as an interface because it's a shaded class. ### Does this PR introduce _any_ user-facing change? No. This is an Spark private. ### How was this patch tested? Pass the existing UTs. Closes #31297 from dongjoon-hyun/SPARK-34206. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-01-22 19:36:07 -08:00
Chao Sun	b6f46ca297	[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile ### What changes were proposed in this pull request? This: 1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x. 2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2 Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client. In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties: ``` hadoop-client-api.artifact hadoop-client-runtime.artifact hadoop-client-minicluster.artifact ``` which default to: ``` hadoop-client-api hadoop-client-runtime hadoop-client-minicluster ``` but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`. Besides above, there are the following changes: - explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars. - removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API. - modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests). ### Why are the changes needed? Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts. ### Does this PR introduce _any_ user-facing change? When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts. ### How was this patch tested? Relying on existing tests. Closes #30701 from sunchao/test-hadoop-3.2.2. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-01-15 14:06:50 -08:00
“attilapiros”	6bd7a6200f	[SPARK-33711][K8S] Avoid race condition between POD lifecycle manager and scheduler backend ### What changes were proposed in this pull request? Missing POD detection is extended by timestamp (and time limit) based check to avoid wrongfully detection of missing POD detection. The two new timestamps: - `fullSnapshotTs` is introduced for the `ExecutorPodsSnapshot` which only updated by the pod polling snapshot source - `registrationTs` is introduced for the `ExecutorData` and it is initialized at the executor registration at the scheduler backend Moreover a new config `spark.kubernetes.executor.missingPodDetectDelta` is used to specify the accepted delta between the two. ### Why are the changes needed? Watching a POD (`ExecutorPodsWatchSnapshotSource`) only inform about single POD changes. This could wrongfully lead to detecting of missing PODs (PODs known by scheduler backend but missing from POD snapshots) by the executor POD lifecycle manager. A key indicator of this error is seeing this log message: > "The executor with ID [some_id] was not found in the cluster but we didn't get a reason why. Marking the executor as failed. The executor may have been deleted but the driver missed the deletion event." So one of the problem is running the missing POD detection check even when a single POD is changed without having a full consistent snapshot about all the PODs (see `ExecutorPodsPollingSnapshotSource`). The other problem could be the race between the executor POD lifecycle manager and the scheduler backend: so even in case of a having a full snapshot the registration at the scheduler backend could precede the snapshot polling (and processing of those polled snapshots). ### Does this PR introduce _any_ user-facing change? Yes. When the POD is missing then the reason message explaining the executor's exit is extended with both timestamps (the polling time and the executor registration time) and even the new config is mentioned. ### How was this patch tested? The existing unit tests are extended. Closes #30675 from attilapiros/SPARK-33711. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2021-01-11 14:25:12 -08:00
Holden Karau	8e11ce5378	[SPARK-34018][K8S] NPE in ExecutorPodsSnapshot ### What changes were proposed in this pull request? Label both the statuses and ensure the ExecutorPodSnapshot starts with the default config to match. ### Why are the changes needed? The current test depends on the order rather than testing the desired property. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Labeled the containers statuses, observed failures, added the default label as the initialization point, tests passed again. Built Spark, ran on K8s cluster verified no NPE in driver log. Closes #31071 from holdenk/SPARK-34018-finishedExecutorWithRunningSidecar-doesnt-correctly-constructt-the-test-case. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-01-07 16:47:37 -08:00
Prashant Sharma	f64dfa8727	[SPARK-32221][K8S] Avoid possible errors due to incorrect file size or type supplied in spark conf ### What changes were proposed in this pull request? Skip files if they are binary or very large to fit the configMap's max size. ### Why are the changes needed? Config map cannot hold binary files and there is also a limit on how much data a configMap can hold. This limit can be configured by the k8s cluster admin. This PR, skips such files (with a warning) instead of failing with weird runtime errors. If such files are not skipped, then it would result in mount errors or encoding errors (if binary files are submitted). ### Does this PR introduce _any_ user-facing change? yes, in simple words avoids possible errors due to negligence (for example, placing a large file or a binary file in SPARK_CONF_DIR) and thus improves user experience. ### How was this patch tested? Added relevant tests and improved existing tests. Closes #30472 from ScrapCodes/SPARK-32221/avoid-conf-propagate-errors. Lead-authored-by: Prashant Sharma <prashsh1@in.ibm.com> Co-authored-by: Prashant Sharma <prashant@apache.org> Signed-off-by: Prashant Sharma <prashsh1@in.ibm.com>	2021-01-06 14:55:40 +05:30
Holden Karau	171db85aa2	[SPARK-33874][K8S][FOLLOWUP] Handle long lived sidecars - clean up logging ### What changes were proposed in this pull request? Switch log level from warn to debug when the spark container is not present in the pod's container statuses. ### Why are the changes needed? There are many non-critical situations where the Spark container may not be present, and the warning log level is too high. ### Does this PR introduce _any_ user-facing change? Log message change. ### How was this patch tested? N/A Closes #31047 from holdenk/SPARK-33874-follow-up. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-01-05 13:48:52 -08:00
Holden Karau	448494ebcf	[SPARK-33874][K8S] Handle long lived sidecars ### What changes were proposed in this pull request? For liveness check when checkAllContainers is not set, we check the liveness status of the Spark container if we can find it. ### Why are the changes needed? Some environments may deploy long lived logs collecting side cars which outlive the Spark application. Just because they remain alive does not mean the Spark executor should keep running. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Extended the existing pod status tests. Closes #30892 from holdenk/SPARK-33874-handle-long-lived-sidecars. Lead-authored-by: Holden Karau <hkarau@apple.com> Co-authored-by: Holden Karau <holden@pigscanfly.ca> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-30 14:06:34 +09:00
HyukjinKwon	a99a47ca1d	[SPARK-33748][K8S] Respect environment variables and configurations for Python executables ### What changes were proposed in this pull request? This PR proposes: - Respect `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations in Kubernates just like other cluster types in Spark. - Depreate `spark.kubernetes.pyspark.pythonVersion` and guide users to set the environment variables and configurations for Python executables. NOTE that `spark.kubernetes.pyspark.pythonVersion` is already a no-op configuration without this PR. Default is `3` and other values are disallowed. - In order for Python executable settings to be consistently used, fix `spark.archives` option to unpack into the current working directory in the driver of Kubernates' cluster mode. This behaviour is identical with Yarn's cluster mode. By doing this, users can leverage Conda or virtuenenv in cluster mode as below: ```python conda create -y -n pyspark_conda_env -c conda-forge pyarrow pandas conda-pack conda activate pyspark_conda_env conda pack -f -o pyspark_conda_env.tar.gz PYSPARK_PYTHON=./environment/bin/python spark-submit --archives pyspark_conda_env.tar.gz#environment app.py ``` - Removed several unused or useless codes such as `extractS3Key` and `renameResourcesToLocalFS` ### Why are the changes needed? - To provide a consistent support of PySpark by using `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations. - To provide Conda and virtualenv support via `spark.archives` options. ### Does this PR introduce _any_ user-facing change? Yes: - `spark.kubernetes.pyspark.pythonVersion` is deprecated. - `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, and `spark.pyspark.python` and `spark.pyspark.driver.python` configurations are respected. ### How was this patch tested? Manually tested via: ```bash minikube delete minikube start --cpus 12 --memory 16384 kubectl create namespace spark-integration-test cat <<EOF \| kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: spark namespace: spark-integration-test EOF kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test dev/make-distribution.sh --pip --tgz -Pkubernetes resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.2.0-SNAPSHOT-bin-3.2.0.tgz --service-account spark --namespace spark-integration-test ``` Unittests were also added. Closes #30735 from HyukjinKwon/SPARK-33748. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-15 08:56:45 +09:00
Holden Karau	5885cc15ca	[SPARK-33261][K8S] Add a developer API for custom feature steps ### What changes were proposed in this pull request? Add a developer API for custom driver & executor feature steps. ### Why are the changes needed? While we allow templates for the basis of pod creation, some deployments need more flexibility in how the pods are configured. This adds a developer API for custom deployments. ### Does this PR introduce _any_ user-facing change? New developer API. ### How was this patch tested? Extended tests to verify custom step is applied when configured. Closes #30206 from holdenk/SPARK-33261-allow-people-to-extend-pod-feature-steps. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2020-12-14 12:05:28 -08:00
Holden Karau	bf2c88ccae	[SPARK-33716][K8S] Fix potential race condition during pod termination ### What changes were proposed in this pull request? Check that the pod state is not pending or running even if there is a deletion timestamp. ### Why are the changes needed? This can occur when the pod state and deletion timestamp are not updated by etcd in sync & we get a pod snapshot during an inconsistent view. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual testing with local version of Minikube on an overloaded computer that caused out of sync updates. Closes #30693 from holdenk/SPARK-33716-decommissioning-race-condition-during-pod-snapshot. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-14 02:09:59 -08:00
Kousuke Saruta	d662b95535	[SPARK-33754][K8S][DOCS] Update kubernetes/integration-tests/README.md to follow the default Hadoop profile updated ### What changes were proposed in this pull request? This PR updates `kubernetes/integration-tests/README.md`. ### Why are the changes needed? To follow the current Hadoop profile (hadoop-3.2). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I have confirmed that the integration tests pass with the following command for both Hadoop 3.2 an 2.7. ``` build/mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \ -Pkubernetes \ -Pkubernetes-integration-tests \ -Dspark.kubernetes.test.imageTag=${IMAGE_TAG} \ -Dspark.kubernetes.test.imageRepo=docker.io/kubespark \ -Dspark.kubernetes.test.namespace=default \ -Dspark.kubernetes.test.deployMode=minikube \ -Dtest.include.tags=k8s ``` Closes #30726 from sarutak/update-kube-integ-readme. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-11 01:52:13 -08:00
Kousuke Saruta	795db05bf6	[SPARK-33732][K8S][TESTS] Kubernetes integration tests doesn't work with Minikube 1.9+ ### What changes were proposed in this pull request? This PR changes `Minikube.scala` for Kubernetes integration tests to work with Minikube 1.9+. `Minikube.scala` assumes that `apiserver.key` and `apiserver.crt` are in `~/.minikube/`. But as of Minikube 1.9, they are in `~/.minikube/profiles/<profile>`. ### Why are the changes needed? Currently, Kubernetes integration tests doesn't work with Minikube 1.9+. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I confirmed the following test passes. ``` $ build/sbt -Pkubernetes -Pkubernetes-integration-tests package 'kubernetes-integration-tests/testOnly -- -z "SparkPi with no"' ``` Closes #30700 from sarutak/minikube-1.9. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-09 22:04:09 -08:00
Holden Karau	1c7f5f1ac7	[SPARK-33724][K8S] Add decom script as a configuration param ### What changes were proposed in this pull request? Makes the location of the decommission script used in Kubernetes for graceful shutdown configurable. ### Why are the changes needed? Some environments don't use the Spark image builder and instead mount the decompressed Spark distro. In those envs configuring the location of the decommissioning script is required. ### Does this PR introduce _any_ user-facing change? New configuration parameter. ### How was this patch tested? Existing decommissioning integration test. Closes #30694 from holdenk/SPARK-33724-allow-decommissioning-script-location-to-be-configured. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-09 20:42:10 -08:00
Holden Karau	991b7977b5	[SPARK-33727][K8S] Fall back from gnupg.net to openpgp.org ### What changes were proposed in this pull request? While building R docker image if we can't fetch the key from gnupg.net fall back to openpgp.org ### Why are the changes needed? gnupg.net key servers are flaky and sometimes fail to resolve or return keys. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tried to add key on my desktop, it failed, then tried to add key with openpgp.org and it succeed. Closes #30696 from holdenk/SPARK-33727-gnupg-server-is-flaky. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-10 11:35:55 +09:00
Prashant Sharma	6317ba29a1	[SPARK-33668][K8S][TEST] Fix flaky test "Verify logging configuration is picked from the provided ### What changes were proposed in this pull request? Fix flaky test "Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties." The test is flaking, with multiple flaked instances - the reason for the failure has been similar to: ``` The code passed to eventually never returned normally. Attempted 109 times over 3.0079882413999997 minutes. Last failure message: Failure executing: GET at: https://192.168.39.167:8443/api/v1/namespaces/b37fc72a991b49baa68a2eaaa1516463/pods/spark-pi-97a9bc76308e7fe3-exec-1/log?pretty=false. Message: pods "spark-pi-97a9bc76308e7fe3-exec-1" not found. Received status: Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=pods, name=spark-pi-97a9bc76308e7fe3-exec-1, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "spark-pi-97a9bc76308e7fe3-exec-1" not found, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={}).. (KubernetesSuite.scala:402) ``` https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36854/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36852/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36850/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36848/console From the above failures, it seems, that executor finishes too quickly and is removed by spark before the test can complete. So, in order to mitigate this situation, one way is to turn on the flag "spark.kubernetes.executor.deleteOnTermination" ### Why are the changes needed? Fixes a flaky test. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. May be a few runs of jenkins integration test, may reveal if the problem is resolved or not. Closes #30616 from ScrapCodes/SPARK-33668/fix-flaky-k8s-integration-test. Authored-by: Prashant Sharma <prashsh1@in.ibm.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-05 23:04:55 -08:00
Dongjoon Hyun	de9818f043	[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT ### What changes were proposed in this pull request? This PR aims to update `master` branch version to 3.2.0-SNAPSHOT. ### Why are the changes needed? Start to prepare Apache Spark 3.2.0. ### Does this PR introduce _any_ user-facing change? N/A. ### How was this patch tested? Pass the CIs. Closes #30606 from dongjoon-hyun/SPARK-3.2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-04 14:10:42 -08:00
HyukjinKwon	990bee9c58	[SPARK-33615][K8S] Make 'spark.archives' working in Kubernates ### What changes were proposed in this pull request? This PR proposes to make `spark.archives` configuration working in Kubernates. It works without a problem in standalone cluster but there seems a bug in Kubernates. It fails to fetch the file on the driver side as below: ``` 20/12/03 13:33:53 INFO SparkContext: Added JAR file:/tmp/spark-75004286-c83a-4369-b624-14c5d2d2a748/spark-examples_2.12-3.1.0-SNAPSHOT.jar at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/jars/spark-examples_2.12-3.1.0-SNAPSHOT.jar with timestamp 1607002432558 20/12/03 13:33:53 INFO SparkContext: Added archive file:///tmp/tmp4542734800151332666.txt.tar.gz#test_tar_gz at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz with timestamp 1607002432558 20/12/03 13:33:53 INFO TransportClientFactory: Successfully created connection to spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc/172.17.0.4:7078 after 83 ms (47 ms spent in bootstraps) 20/12/03 13:33:53 INFO Utils: Fetching spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz to /tmp/spark-66573e24-27a3-427c-99f4-36f06d9e9cd5/fetchFileTemp2665785666227461849.tmp 20/12/03 13:33:53 ERROR SparkContext: Error initializing SparkContext. java.lang.RuntimeException: Stream '/files/tmp4542734800151332666.txt.tar.gz' was not found. at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:242) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) ``` This is because `spark.archives` was not actually added on the driver side correctly. The changes here fix it by adding and resolving URIs correctly. ### Why are the changes needed? `spark.archives` feature can be leveraged for many things such as Conda support. We should make it working in Kubernates as well. This is a bug fix too. ### Does this PR introduce _any_ user-facing change? No, this feature is not out yet. ### How was this patch tested? I manually tested with Minikube 1.15.1. For an environment issue (?), I had to use a custom namespace, service account and roles. `default` service account does not work for me and complains it doesn't have permissions to get/list pods, etc. ```bash minikube delete minikube start --cpus 12 --memory 16384 kubectl create namespace spark-integration-test cat <<EOF \| kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: spark namespace: spark-integration-test EOF kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test dev/make-distribution.sh --pip --tgz -Pkubernetes resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.1.0-SNAPSHOT-bin-3.2.0.tgz --service-account spark --namespace spark-integration-test ``` Closes #30581 from HyukjinKwon/SPARK-33615. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-04 19:37:03 +09:00
Prashant Sharma	91182d6cce	[SPARK-33626][K8S][TEST] Allow k8s integration tests to assert both driver and executor logs for expected log(s) ### What changes were proposed in this pull request? Allow k8s integration tests to assert both driver and executor logs for expected log(s) ### Why are the changes needed? Some of the tests will be able to provide full coverage of the use case, by asserting both driver and executor logs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? TBD Closes #30568 from ScrapCodes/expectedDriverLogChanges. Authored-by: Prashant Sharma <prashsh1@in.ibm.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-02 08:43:30 -08:00
Dongjoon Hyun	290aa02179	[SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work ### What changes were proposed in this pull request? This reverts commit SPARK-33212 (`cb3fa6c936`) mostly with three exceptions: 1. `SparkSubmitUtils` was updated recently by SPARK-33580 2. `resource-managers/yarn/pom.xml` was updated recently by SPARK-33104 to add `hadoop-yarn-server-resourcemanager` test dependency. 3. Adjust `com.fasterxml.jackson.module:jackson-module-jaxb-annotations` dependency in K8s module which is updated recently by SPARK-33471. ### Why are the changes needed? According to [HADOOP-16080](https://issues.apache.org/jira/browse/HADOOP-16080) since Apache Hadoop 3.1.1, `hadoop-aws` doesn't work with `hadoop-client-api`. It fails at write operation like the following. 1. Spark distribution with `-Phadoop-cloud` ```scala $ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY 20/11/30 23:01:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context available as 'sc' (master = local[], app id = local-1606806088715). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.parquet("s3a://dongjoon/users.parquet").show 20/11/30 23:01:34 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties +------+--------------+----------------+ \| name\|favorite_color\|favorite_numbers\| +------+--------------+----------------+ \|Alyssa\| null\| [3, 9, 15, 20]\| \| Ben\| red\| []\| +------+--------------+----------------+ scala> Seq(1).toDF.write.parquet("s3a://dongjoon/out.parquet") 20/11/30 23:02:14 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)/ 1] java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V ``` 2. Spark distribution without `-Phadoop-cloud`* ```scala $ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY -c spark.eventLog.enabled=true -c spark.eventLog.dir=s3a://dongjoon/spark-events/ --packages org.apache.hadoop:hadoop-aws:3.2.0,org.apache.hadoop:hadoop-common:3.2.0 ... java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:772) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CI. Closes #30508 from dongjoon-hyun/SPARK-33212-REVERT. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-02 18:23:48 +09:00
Josh Soref	13fd272cd3	Spelling r common dev mlib external project streaming resource managers python ### What changes were proposed in this pull request? This PR intends to fix typos in the sub-modules: * `R` * `common` * `dev` * `mlib` * `external` * `project` * `streaming` * `resource-managers` * `python` Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618 NOTE: The misspellings have been reported at `706a726f87 (commitcomment-44064356)` ### Why are the changes needed? Misspelled words make it harder to read / understand content. ### Does this PR introduce _any_ user-facing change? There are various fixes to documentation, etc... ### How was this patch tested? No testing was performed Closes #30402 from jsoref/spelling-R_common_dev_mlib_external_project_streaming_resource-managers_python. Authored-by: Josh Soref <jsoref@users.noreply.github.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-11-27 10:22:45 -06:00
yangjie01	e3058ba17c	[SPARK-33441][BUILD] Add unused-imports compilation check and remove all unused-imports ### What changes were proposed in this pull request? This pr add a new Scala compile arg to `pom.xml` to defense against new unused imports: - `-Ywarn-unused-import` for Scala 2.12 - `-Wconf:cat=unused-imports:e` for Scala 2.13 The other fIles change are remove all unused imports in Spark code ### Why are the changes needed? Cleanup code and add guarantee to defense against new unused imports ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins or GitHub Action Closes #30351 from LuciferYang/remove-imports-core-module. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-19 14:20:39 +09:00
Stavros Kontopoulos	dcac78e12b	[SPARK-27936][K8S] Support python deps Supports python client deps from the launcher fs. This is a feature that was added for java deps. This PR adds support fo rpythona s well. yes Manually running different scenarios and via examining the driver & executors logs. Also there is an integration test added. I verified that the python resources are added to the spark file server and they are named properly so they dont fail the executors. Note here that as previously the following will not work: primary resource `A.py`: uses a closure defined in submited pyfile `B.py`, context.py only adds to the pythonpath files with certain extension eg. zip, egg, jar. Closes #25870 from skonto/python-deps. Lead-authored-by: Stavros Kontopoulos <skontopo@redhat.com> Co-authored-by: Stavros Kontopoulos <st.kontopoulos@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-18 10:43:41 -08:00

1 2 3 4 5 ...

299 commits