ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Andy Grove	35d4edffa2	[SPARK-28921][BUILD][K8S] Upgrade kubernetes client to 4.4.2 ### What changes were proposed in this pull request? Upgrade kubernetes client from 4.1.2 to 4.4.2 ### Why are the changes needed? To fix compatibility issue with EKS since Amazon rolled out some security patches over the past week; 1.15.3, 1.14.6, 1.13.10, 1.12.10, and 1.11.10. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins and manually test on EKS. Closes #25640 from andygrove/SPARK-28921. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-02 16:50:58 -07:00
Dongjoon Hyun	f7c9de9035	[SPARK-28765][BUILD] Add explict exclusions to avoid JDK11 dependency issue ### What changes were proposed in this pull request? This PR adds explicit exclusions to avoid Maven `JDK11` dependency issues. ### Why are the changes needed? Maven/Ivy seems to be confused during dependency generation on `JDK11` environment. This is not only wrong, but also causes a Jenkins failure during dependency manifest check on `JDK11` environment. JDK8 ``` $ cd core $ mvn -X dependency:tree -Dincludes=jakarta.activation:jakarta.activation-api ... [DEBUG] org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2) [DEBUG] org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile [DEBUG] javax.validation:validation-api:jar:2.0.1.Final:compile ``` JDK11 ``` [DEBUG] org.glassfish.jersey.core:jersey-server:jar:2.29:compile (version managed from 2.22.2) [DEBUG] org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:compile [DEBUG] javax.validation:validation-api:jar:2.0.1.Final:compile [DEBUG] jakarta.xml.bind:jakarta.xml.bind-api🫙2.3.2:compile [DEBUG] jakarta.activation:jakarta.activation-api🫙1.2.1:compile ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Do the following in both `JDK8` and `JDK11` environment. The dependency manifest should not be changed. In the current `master` branch, `JDK11` changes the dependency manifest. ``` $ dev/test-dependencies.sh --replace-manifest ``` Closes #25481 from dongjoon-hyun/SPARK-28765. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-17 10:16:22 -07:00
Marcelo Vanzin	0343854f54	[SPARK-28487][K8S] More responsive dynamic allocation with K8S This change implements a few changes to the k8s pod allocator so that it behaves a little better when dynamic allocation is on. (i) Allow the application to ramp up immediately when there's a change in the target number of executors. Without this change, scaling would only trigger when a change happened in the state of the cluster, e.g. an executor going down, or when the periodical snapshot was taken (default every 30s). (ii) Get rid of pending pod requests, both acknowledged (i.e. Spark knows that a pod is pending resource allocation) and unacknowledged (i.e. Spark has requested the pod but the API server hasn't created it yet), when they're not needed anymore. This avoids starting those executors to just remove them after the idle timeout, wasting resources in the meantime. (iii) Re-work some of the code to avoid unnecessary logging. While not bad without dynamic allocation, the existing logging was very chatty when dynamic allocation was on. With the changes, all the useful information is still there, but only when interesting changes happen. (iv) Gracefully shut down executors when they become idle. Just deleting the pod causes a lot of ugly logs to show up, so it's better to ask pods to exit nicely. That also allows Spark to respect the "don't delete pods" option when dynamic allocation is on. Tested on a small k8s cluster running different TPC-DS workloads. Closes #25236 from vanzin/SPARK-28487. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-08-13 17:29:54 -07:00
Junjie Chen	780d176136	[SPARK-28042][K8S] Support using volume mount as local storage ## What changes were proposed in this pull request? This pr is used to support using hostpath/PV volume mounts as local storage. In KubernetesExecutorBuilder.scala, the LocalDrisFeatureStep is built before MountVolumesFeatureStep which means we cannot use any volumes mount later. This pr adjust the order of feature building steps which moves localDirsFeature at last so that we can check if directories in SPARK_LOCAL_DIRS are set to volumes mounted such as hostPath, PV, or others. ## How was this patch tested? Unit tests Closes #24879 from chenjunjiedada/SPARK-28042. Lead-authored-by: Junjie Chen <jimmyjchen@tencent.com> Co-authored-by: Junjie Chen <cjjnjust@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-07-29 10:44:17 -07:00
Onur Satici	e7c97a3d86	[SPARK-28145][K8S] safe runnable in polling executor source ## What changes were proposed in this pull request? Add error handling to `ExecutorPodsPollingSnapshotSource` Closes #24952 from onursatici/os/polling-source. Authored-by: Onur Satici <onursatici@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-06-28 09:38:43 -05:00
Xiangrui Meng	7056e004ee	[SPARK-27823][CORE] Refactor resource handling code ## What changes were proposed in this pull request? Continue the work from https://github.com/apache/spark/pull/24821. Refactor resource handling code to make the code more readable. Major changes: * Moved resource-related classes to `spark.resource` from `spark`. * Added ResourceUtils and helper classes so we don't need to directly deal with Spark conf. * ResourceID: resource identifier and it provides conf keys * ResourceRequest/Allocation: abstraction for requested and allocated resources * Added `TestResourceIDs` to reference commonly used resource IDs in tests like `spark.executor.resource.gpu`. cc: tgravescs jiangxb1987 Ngone51 ## How was this patch tested? Unit tests for added utils and existing unit tests. Closes #24856 from mengxr/SPARK-27823. Lead-authored-by: Xiangrui Meng <meng@databricks.com> Co-authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>	2019-06-18 17:18:17 -07:00
Stavros Kontopoulos	7912ab85a6	[SPARK-27872][K8S] Fix executor service account inconsistency ## What changes were proposed in this pull request? Fixes the service account inconsistency that breaks pull secrets. It gives the option to the user to setup a specific service account for the executors if he has to (via `spark.kubernetes.authenticate.executor.serviceAccountName`). Defaults to the driver's one. We are not supporting special authentication credentials for the executors with this PR. ## How was this patch tested? Tested manually by launching a Spark job exercising the introduced settings. Added a new integration tests for this fix. Closes #24748 from skonto/fix_executor_sa. Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-06-09 16:28:37 -05:00
Thomas Graves	d30284b5a5	[SPARK-27760][CORE] Spark resources - change user resource config from .count to .amount ## What changes were proposed in this pull request? Change the resource config spark.{executor/driver}.resource.{resourceName}.count to .amount to allow future usage of containing both a count and a unit. Right now we only support counts - # of gpus for instance, but in the future we may want to support units for things like memory - 25G. I think making the user only have to specify a single config .amount is better then making them specify 2 separate configs of a .count and then a .unit. Change it now since its a user facing config. Amount also matches how the spark on yarn configs are setup. ## How was this patch tested? Unit tests and manually verified on yarn and local cluster mode Closes #24810 from tgravescs/SPARK-27760-amount. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2019-06-06 14:16:05 -05:00
Thomas Graves	1277f8fa92	[SPARK-27362][K8S] Resource Scheduling support for k8s ## What changes were proposed in this pull request? Add ability to map the spark resource configs spark.{executor/driver}.resource.{resourceName} to kubernetes Container builder so that we request resources (gpu,s/fpgas/etc) from kubernetes. Note that the spark configs will overwrite any resource configs users put into a pod template. I added a generic vendor config which is only used by kubernetes right now. I intentionally didn't put it into the kubernetes config namespace just to avoid adding more config prefixes. I will add more documentation for this under jira SPARK-27492. I think it will be easier to do all at once to get cohesive story. ## How was this patch tested? Unit tests and manually testing on k8s cluster. Closes #24703 from tgravescs/SPARK-27362. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2019-05-31 15:26:14 -05:00
Yuming Wang	db3e746b64	[SPARK-27875][CORE][SQL][ML][K8S] Wrap all PrintWriter with Utils.tryWithResource ## What changes were proposed in this pull request? This pr wrap all `PrintWriter` with `Utils.tryWithResource` to prevent resource leak. ## How was this patch tested? Existing test Closes #24739 from wangyum/SPARK-27875. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-05-30 19:54:32 +09:00
Stavros Kontopoulos	5e74570c8f	[SPARK-23153][K8S] Support client dependencies with a Hadoop Compatible File System ## What changes were proposed in this pull request? - solves the current issue with --packages in cluster mode (there is no ticket for it). Also note of some [issues](https://issues.apache.org/jira/browse/SPARK-22657) of the past here when hadoop libs are used at the spark submit side. - supports spark.jars, spark.files, app jar. It works as follows: Spark submit uploads the deps to the HCFS. Then the driver serves the deps via the Spark file server. No hcfs uris are propagated. The related design document is [here](https://docs.google.com/document/d/1peg_qVhLaAl4weo5C51jQicPwLclApBsdR1To2fgc48/edit). the next option to add is the RSS but has to be improved given the discussion in the past about it (Spark 2.3). ## How was this patch tested? - Run integration test suite. - Run an example using S3: ``` ./bin/spark-submit \ ... --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.6 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.memory=1G \ --conf spark.kubernetes.namespace=spark \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \ --conf spark.driver.memory=1G \ --conf spark.executor.instances=2 \ --conf spark.sql.streaming.metricsEnabled=true \ --conf "spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image=skonto/spark:k8s-3.0.0 \ --conf spark.kubernetes.file.upload.path=s3a://fdp-stavros-test \ --conf spark.hadoop.fs.s3a.access.key=... \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ --conf spark.hadoop.fs.s3a.fast.upload=true \ --conf spark.kubernetes.executor.deleteOnTermination=false \ --conf spark.hadoop.fs.s3a.secret.key=... \ --conf spark.files=client:///...resolv.conf \ file:///my.jar ** ``` Added integration tests based on [Ceph nano](https://github.com/ceph/cn). Looks very [active](http://www.sebastien-han.fr/blog/2019/02/24/Ceph-nano-is-getting-better-and-better/). Unfortunately minio needs hadoop >= 2.8. Closes #23546 from skonto/support-client-deps. Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Signed-off-by: Erik Erlandson <eerlands@redhat.com>	2019-05-22 16:15:42 -07:00
Arun Mahadevan	1a8c09334d	[SPARK-27754][K8S] Introduce additional config (spark.kubernetes.driver.request.cores) for driver request cores for spark on k8s ## What changes were proposed in this pull request? Spark on k8s supports config for specifying the executor cpu requests (spark.kubernetes.executor.request.cores) but a similar config is missing for the driver. Instead, currently `spark.driver.cores` value is used for integer value. Although `pod spec` can have `cpu` for the fine-grained control like the following, this PR proposes additional configuration `spark.kubernetes.driver.request.cores` for driver request cores. ``` resources: requests: memory: "64Mi" cpu: "250m" ``` ## How was this patch tested? Unit tests Closes #24630 from arunmahadevan/SPARK-27754. Authored-by: Arun Mahadevan <arunm@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-18 21:28:46 -07:00
Adi Muraru	8ef4da753d	[SPARK-27610][YARN] Shade netty native libraries ## What changes were proposed in this pull request? Fixed the `spark-<version>-yarn-shuffle.jar` artifact packaging to shade the native netty libraries: - shade the `META-INF/native/libnetty_*` native libraries when packagin the yarn shuffle service jar. This is required as netty library loader derives that based on shaded package name. - updated the `org/spark_project` shade package prefix to `org/sparkproject` (i.e. removed underscore) as the former breaks the netty native lib loading. This was causing the yarn external shuffle service to fail when spark.shuffle.io.mode=EPOLL ## How was this patch tested? Manual tests Closes #24502 from amuraru/SPARK-27610_master. Authored-by: Adi Muraru <amuraru@adobe.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-05-07 10:47:36 -07:00
Sean Owen	d4420b455a	[SPARK-27323][CORE][SQL][STREAMING] Use Single-Abstract-Method support in Scala 2.12 to simplify code ## What changes were proposed in this pull request? Use Single Abstract Method syntax where possible (and minor related cleanup). Comments below. No logic should change here. ## How was this patch tested? Existing tests. Closes #24241 from srowen/SPARK-27323. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-02 07:37:05 -07:00
Yuming Wang	b670f39fc6	[SPARK-24793][FOLLOW-UP][K8S] Remove duplicate declaration of mockito-core ## What changes were proposed in this pull request? ``` [WARNING] Some problems were encountered while building the effective model for org.apache.spark:spark-kubernetes_2.12🫙3.0.0-SNAPSHOT [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.mockito:mockito-core:jar -> duplicate declaration of version (?) org.apache.spark:spark-kubernetes_2.12:[unknown-version], /Users/yumwang/spark/resource-managers/kubernetes/core/pom.xml, line 98, column 17 ``` This pr remove duplicate declaration of `mockito-core`. ## How was this patch tested? N/A Closes #24256 from wangyum/SPARK-24793-FOLLOW-UP. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-30 21:29:32 -07:00
Stavros Kontopoulos	05168e725d	[SPARK-24793][K8S] Enhance spark-submit for app management - supports `--kill` & `--status` flags. - supports globs which is useful in general check this long standing [issue](https://github.com/kubernetes/kubernetes/issues/17144#issuecomment-272052461) for kubectl. Manually against running apps. Example output: Submission Id reported at launch time: ``` 2019-01-20 23:47:56 INFO Client:58 - Waiting for application spark-pi with submissionId spark:spark-pi-1548020873671-driver to finish... ``` Killing the app: ``` ./bin/spark-submit --kill spark:spark-pi-1548020873671-driver --master k8s://https://192.168.2.8:8443 2019-01-20 23:48:07 WARN Utils:70 - Your hostname, universe resolves to a loopback address: 127.0.0.1; using 192.168.2.8 instead (on interface wlp2s0) 2019-01-20 23:48:07 WARN Utils:70 - Set SPARK_LOCAL_IP if you need to bind to another address ``` App terminates with 143 (SIGTERM, since we have tiny this should lead to [graceful shutdown](https://cloud.google.com/solutions/best-practices-for-building-containers)): ``` 2019-01-20 23:48:08 INFO LoggingPodStatusWatcherImpl:58 - State changed, new state: pod name: spark-pi-1548020873671-driver namespace: spark labels: spark-app-selector -> spark-e4730c80e1014b72aa77915a2203ae05, spark-role -> driver pod uid: 0ba9a794-1cfd-11e9-8215-a434d9270a65 creation time: 2019-01-20T21:47:55Z service account name: spark-sa volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm node name: minikube start time: 2019-01-20T21:47:55Z phase: Running container status: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: running container started at: 2019-01-20T21:48:00Z 2019-01-20 23:48:09 INFO LoggingPodStatusWatcherImpl:58 - State changed, new state: pod name: spark-pi-1548020873671-driver namespace: spark labels: spark-app-selector -> spark-e4730c80e1014b72aa77915a2203ae05, spark-role -> driver pod uid: 0ba9a794-1cfd-11e9-8215-a434d9270a65 creation time: 2019-01-20T21:47:55Z service account name: spark-sa volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm node name: minikube start time: 2019-01-20T21:47:55Z phase: Failed container status: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: terminated container started at: 2019-01-20T21:48:00Z container finished at: 2019-01-20T21:48:08Z exit code: 143 termination reason: Error 2019-01-20 23:48:09 INFO LoggingPodStatusWatcherImpl:58 - Container final statuses: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: terminated container started at: 2019-01-20T21:48:00Z container finished at: 2019-01-20T21:48:08Z exit code: 143 termination reason: Error 2019-01-20 23:48:09 INFO Client:58 - Application spark-pi with submissionId spark:spark-pi-1548020873671-driver finished. 2019-01-20 23:48:09 INFO ShutdownHookManager:58 - Shutdown hook called 2019-01-20 23:48:09 INFO ShutdownHookManager:58 - Deleting directory /tmp/spark-f114b2e0-5605-4083-9203-a4b1c1f6059e ``` Glob scenario: ``` ./bin/spark-submit --status spark:spark-pi* --master k8s://https://192.168.2.8:8443 2019-01-20 22:27:44 WARN Utils:70 - Your hostname, universe resolves to a loopback address: 127.0.0.1; using 192.168.2.8 instead (on interface wlp2s0) 2019-01-20 22:27:44 WARN Utils:70 - Set SPARK_LOCAL_IP if you need to bind to another address Application status (driver): pod name: spark-pi-1547948600328-driver namespace: spark labels: spark-app-selector -> spark-f13f01702f0b4503975ce98252d59b94, spark-role -> driver pod uid: c576e1c6-1c54-11e9-8215-a434d9270a65 creation time: 2019-01-20T01:43:22Z service account name: spark-sa volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm node name: minikube start time: 2019-01-20T01:43:22Z phase: Running container status: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: running container started at: 2019-01-20T01:43:27Z Application status (driver): pod name: spark-pi-1547948792539-driver namespace: spark labels: spark-app-selector -> spark-006d252db9b24f25b5069df357c30264, spark-role -> driver pod uid: 38375b4b-1c55-11e9-8215-a434d9270a65 creation time: 2019-01-20T01:46:35Z service account name: spark-sa volumes: spark-local-dir-1, spark-conf-volume, spark-sa-token-b7wcm node name: minikube start time: 2019-01-20T01:46:35Z phase: Succeeded container status: container name: spark-kubernetes-driver container image: skonto/spark:k8s-3.0.0 container state: terminated container started at: 2019-01-20T01:46:39Z container finished at: 2019-01-20T01:46:56Z exit code: 0 termination reason: Completed ``` Closes #23599 from skonto/submit_ops_extension. Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-03-26 11:55:03 -07:00
Jiaxin Shan	2d0b7cfe44	[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2 ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/23814 was reverted because of Jenkins integration tests failure. After minikube upgrade, Kubernetes client SDK v1.4.2 work with kubernetes v1.13. We can bring this change back. Reference: [Bump Kubernetes Client Version to 4.1.2](https://issues.apache.org/jira/browse/SPARK-26742) [Original PR against master](https://github.com/apache/spark/pull/23814) [Kubernetes client upgrade for Spark 2.4](https://github.com/apache/spark/pull/23993) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Unit Tests: ``` All tests passed. [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 2.343 s] [INFO] Spark Project Tags ................................. SUCCESS [ 2.039 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 12.714 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 2.185 s] [INFO] Spark Project Networking ........................... SUCCESS [ 38.154 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 7.989 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 2.297 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 2.813 s] [INFO] Spark Project Core ................................. SUCCESS [38:03 min] [INFO] Spark Project ML Local Library ..................... SUCCESS [ 3.848 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 56.084 s] [INFO] Spark Project Streaming ............................ SUCCESS [04:58 min] [INFO] Spark Project Catalyst ............................. SUCCESS [06:39 min] [INFO] Spark Project SQL .................................. SUCCESS [37:12 min] [INFO] Spark Project ML Library ........................... SUCCESS [18:59 min] [INFO] Spark Project Tools ................................ SUCCESS [ 0.767 s] [INFO] Spark Project Hive ................................. SUCCESS [33:45 min] [INFO] Spark Project REPL ................................. SUCCESS [01:14 min] [INFO] Spark Project Assembly ............................. SUCCESS [ 1.444 s] [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:12 min] [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 6.719 s] [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [07:00 min] [INFO] Spark Project Examples ............................. SUCCESS [ 21.805 s] [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 0.906 s] [INFO] Spark Avro ......................................... SUCCESS [ 50.486 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:32 h [INFO] Finished at: 2019-03-07T08:39:34Z [INFO] ------------------------------------------------------------------------ ``` Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #24002 from Jeffwan/update_k8s_sdk_master. Authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-03-13 15:04:27 -07:00
chandulal.kavar	d4542a8ba8	[SPARK-27061][K8S] Expose Driver UI port on driver service to access … ## What changes were proposed in this pull request? Expose Spark UI port on driver service to access logs from service. ## How was this patch tested? The patch was tested using unit tests being contributed as a part of the PR Closes #23990 from chandulal/SPARK-27061. Authored-by: chandulal.kavar <cckavar@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-03-11 10:41:31 -07:00
Onur Satici	e9e8bb33ef	[SPARK-27023][K8S] Make k8s client timeouts configurable ## What changes were proposed in this pull request? Make k8s client timeouts configurable. No test suite exists for the client factory class, happy to add one if needed Closes #23928 from onursatici/os/k8s-client-timeouts. Lead-authored-by: Onur Satici <osatici@palantir.com> Co-authored-by: Onur Satici <onursatici@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-06 11:14:39 -08:00
Marcelo Vanzin	14f714fb30	[SPARK-26420][K8S] Generate more unique IDs when creating k8s resource names. Using the current time as an ID is more prone to clashes than people generally realize, so try to make things a bit more unique without necessarily using a UUID, which would eat too much space in the names otherwise. The implemented approach uses some bits from the current time, plus some random bits, which should be more resistant to clashes. Closes #23805 from vanzin/SPARK-26420. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-02-28 20:39:13 -08:00
Marcelo Vanzin	a6ddc9d083	[SPARK-24736][K8S] Let spark-submit handle dependency resolution. Before this change, there was some code in the k8s backend to deal with how to resolve dependencies and make them available to the Spark application. It turns out that none of that code is necessary, since spark-submit already handles all that for applications started in client mode - like the k8s driver that is run inside a Spark-created pod. For that reason, specifically for pyspark, there's no need for the k8s backend to deal with PYTHONPATH; or, in general, to change the URIs provided by the user at all. spark-submit takes care of that. For testing, I created a pyspark script that depends on another module that is shipped with --py-files. Then I used: - --py-files http://.../dep.py http://.../test.py - --py-files http://.../dep.zip http://.../test.py - --py-files local:/.../dep.py local:/.../test.py - --py-files local:/.../dep.zip local:/.../test.py Without this change, all of the above commands fail. With the change, the driver is able to see the dependencies in all the above cases; but executors don't see the dependencies in the last two. That's a bug in shared Spark code that deals with local: dependencies in pyspark (SPARK-26934). I also tested a Scala app using the main jar from an http server. Closes #23793 from vanzin/SPARK-24736. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-02-27 09:49:31 -08:00
liuxian	7912dbb88f	[MINOR] Simplify boolean expression ## What changes were proposed in this pull request? Comparing whether Boolean expression is equal to true is redundant For example: The datatype of `a` is boolean. Before: if (a == true) After: if (a) ## How was this patch tested? N/A Closes #23884 from 10110346/simplifyboolean. Authored-by: liuxian <liu.xian3@zte.com.cn> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-27 08:38:00 -06:00
Marcelo Vanzin	afbff6446f	Revert "[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2" This reverts commit `a3192d966a`.	2019-02-26 13:42:07 -08:00
Jiaxin Shan	a3192d966a	[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2 ## What changes were proposed in this pull request? Changed the `kubernetes-client` version to 4.1.2. Latest version fix error with exec credentials (used by aws eks) and this will be used to talk with kubernetes API server. Users can submit spark job to EKS api endpoint now with this patch. ## How was this patch tested? unit tests and manual tests. Closes #23814 from Jeffwan/update_k8s_sdk. Authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-25 04:56:04 -06:00
Marcelo Vanzin	61c3cdc706	[SPARK-24894][K8S] Make sure valid host names are created for executors. Since the host name is derived from the app name, which can contain arbitrary characters, it needs to be sanitized so that only valid characters are allowed. On top of that, take extra care that truncation doesn't leave characters that are valid except at the start of a host name. Closes #23781 from vanzin/SPARK-24894. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-02-19 15:19:59 -08:00
Marcelo Vanzin	2a67dbfbd3	[SPARK-26595][CORE] Allow credential renewal based on kerberos ticket cache. This change addes a new mode for credential renewal that does not require a keytab; it uses the local ticket cache instead, so it works while the user keeps the cache valid. This can be useful for, e.g., people running long spark-shell sessions where their kerberos login is kept up-to-date. The main change to enable this behavior is in HadoopDelegationTokenManager, with a small change in the HDFS token provider. The other changes are to avoid creating duplicate tokens when submitting the application to YARN; they allow the tokens from the scheduler to be sent to the YARN AM, reducing the round trips to HDFS. For that, the scheduler initialization code was changed a little bit so that the tokens are available when the YARN client is initialized. That basically takes care of a long-standing TODO that was in the code to clean up configuration propagation to the driver's RPC endpoint (in CoarseGrainedSchedulerBackend). Tested with an app designed to stress this functionality, with both keytab and cache-based logins. Some basic kerberos tests on k8s also. Closes #23525 from vanzin/SPARK-26595. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-01-28 13:32:34 -08:00
Rob Vesse	c542c247bb	[SPARK-25887][K8S] Configurable K8S context support This enhancement allows for specifying the desired context to use for the initial K8S client auto-configuration. This allows users to more easily access alternative K8S contexts without having to first explicitly change their current context via kubectl. Explicitly set my K8S context to a context pointing to a non-existent cluster, then launched Spark jobs with explicitly specified contexts via the new `spark.kubernetes.context` configuration property. Example Output: ``` > kubectl config current-context minikube > minikube status minikube: Stopped cluster: kubectl: > ./spark-submit --master k8s://https://localhost:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.context=docker-for-desktop --conf spark.kubernetes.container.image=rvesse/spark:debian local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4 18/10/31 11:57:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/31 11:57:51 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using context docker-for-desktop from users K8S config file 18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-pi-1540987071845-driver namespace: default labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver pod uid: 32462cac-dd04-11e8-b6c6-025000000001 creation time: 2018-10-31T11:57:52Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv node name: N/A start time: N/A phase: Pending container status: N/A 18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-pi-1540987071845-driver namespace: default labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver pod uid: 32462cac-dd04-11e8-b6c6-025000000001 creation time: 2018-10-31T11:57:52Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv node name: docker-for-desktop start time: N/A phase: Pending container status: N/A ... 18/10/31 11:58:03 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-pi-1540987071845-driver namespace: default labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver pod uid: 32462cac-dd04-11e8-b6c6-025000000001 creation time: 2018-10-31T11:57:52Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv node name: docker-for-desktop start time: 2018-10-31T11:57:52Z phase: Succeeded container status: container name: spark-kubernetes-driver container image: rvesse/spark:debian container state: terminated container started at: 2018-10-31T11:57:54Z container finished at: 2018-10-31T11:58:02Z exit code: 0 termination reason: Completed ``` Without the `spark.kubernetes.context` setting this will fail because the current context - `minikube` - is pointing to a non-running cluster e.g. ``` > ./spark-submit --master k8s://https://localhost:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.container.image=rvesse/spark:debian local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4 18/10/31 12:02:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/31 12:02:30 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 18/10/31 12:02:31 WARN WatchConnectionManager: Exec Failure javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979) at sun.security.ssl.Handshaker.process_record(Handshaker.java:914) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:66) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:109) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292) at sun.security.validator.Validator.validate(Validator.java:260) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1491) ... 39 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382) ... 45 more Exception in thread "kubernetes-dispatcher-0" Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask611a9c09 rejected from java.util.concurrent.ScheduledThreadPoolExecutor404819e4[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632) at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:300) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:48) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:213) at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543) at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:204) at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543) at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979) at sun.security.ssl.Handshaker.process_record(Handshaker.java:914) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:66) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:109) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) ... 4 more Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292) at sun.security.validator.Validator.validate(Validator.java:260) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1491) ... 39 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382) ... 45 more 18/10/31 12:02:31 INFO ShutdownHookManager: Shutdown hook called 18/10/31 12:02:31 INFO ShutdownHookManager: Deleting directory /private/var/folders/6b/y1010qp107j9w2dhhy8csvz0000xq3/T/spark-5e649891-8a0f-4f17-bf3a-33b34082eba8 ``` Suggested reviews: mccheah liyinan926 - this is the follow up fix to the bug discovered while working on SPARK-25809 (PR #22805) Closes #22904 from rvesse/SPARK-25887. Authored-by: Rob Vesse <rvesse@dotnetrdf.org> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-01-22 10:25:21 -08:00
Kazuaki Ishizaki	7bf0794651	[SPARK-26463][CORE] Use ConfigEntry for hardcoded configs for scheduler categories. ## What changes were proposed in this pull request? The PR makes hardcoded `spark.dynamicAllocation`, `spark.scheduler`, `spark.rpc`, `spark.task`, `spark.speculation`, and `spark.cleaner` configs to use `ConfigEntry`. ## How was this patch tested? Existing tests Closes #23416 from kiszk/SPARK-26463. Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-22 07:44:36 -06:00
Jungtaek Lim (HeartSaVioR)	38f030725c	[SPARK-26466][CORE] Use ConfigEntry for hardcoded configs for submit categories. ## What changes were proposed in this pull request? The PR makes hardcoded configs below to use `ConfigEntry`. * spark.kryo * spark.kryoserializer * spark.serializer * spark.jars * spark.files * spark.submit * spark.deploy * spark.worker This patch doesn't change configs which are not relevant to SparkConf (e.g. system properties). ## How was this patch tested? Existing tests. Closes #23532 from HeartSaVioR/SPARK-26466-v2. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-16 20:57:21 -06:00
Jungtaek Lim (HeartSaVioR)	d9e4cf67c0	[SPARK-26482][CORE] Use ConfigEntry for hardcoded configs for ui categories ## What changes were proposed in this pull request? The PR makes hardcoded configs below to use `ConfigEntry`. * spark.ui * spark.ssl * spark.authenticate * spark.master.rest * spark.master.ui * spark.metrics * spark.admin * spark.modify.acl This patch doesn't change configs which are not relevant to SparkConf (e.g. system properties). ## How was this patch tested? Existing tests. Closes #23423 from HeartSaVioR/SPARK-26466. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-01-11 10:18:07 -08:00
Marcelo Vanzin	669e8a1559	[SPARK-25689][YARN] Make driver, not AM, manage delegation tokens. This change modifies the behavior of the delegation token code when running on YARN, so that the driver controls the renewal, in both client and cluster mode. For that, a few different things were changed: * The AM code only runs code that needs DTs when DTs are available. In a way, this restores the AM behavior to what it was pre-SPARK-23361, but keeping the fix added in that bug. Basically, all the AM code is run in a "UGI.doAs()" block; but code that needs to talk to HDFS (basically the distributed cache handling code) was delayed to the point where the driver is up and running, and thus when valid delegation tokens are available. * SparkSubmit / ApplicationMaster now handle user login, not the token manager. The previous AM code was relying on the token manager to keep the user logged in when keytabs are used. This required some odd APIs in the token manager and the AM so that the right UGI was exposed and used in the right places. After this change, the logged in user is handled separately from the token manager, so the API was cleaned up, and, as explained above, the whole AM runs under the logged in user, which also helps with simplifying some more code. * Distributed cache configs are sent separately to the AM. Because of the delayed initialization of the cached resources in the AM, it became easier to write the cache config to a separate properties file instead of bundling it with the rest of the Spark config. This also avoids having to modify the SparkConf to hide things from the UI. * Finally, the AM doesn't manage the token manager anymore. The above changes allow the token manager to be completely handled by the driver's scheduler backend code also in YARN mode (whether client or cluster), making it similar to other RMs. To maintain the fix added in SPARK-23361 also in client mode, the AM now sends an extra message to the driver on initialization to fetch delegation tokens; and although it might not really be needed, the driver also keeps the running AM updated when new tokens are created. Tested in a kerberized cluster with the same tests used to validate SPARK-23361, in both client and cluster mode. Also tested with a non-kerberized cluster. Closes #23338 from vanzin/SPARK-25689. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2019-01-07 14:40:08 -06:00
Dongjoon Hyun	e15a319ccd	[SPARK-26536][BUILD][TEST] Upgrade Mockito to 2.23.4 ## What changes were proposed in this pull request? This PR upgrades Mockito from 1.10.19 to 2.23.4. The following changes are required. - Replace `org.mockito.Matchers` with `org.mockito.ArgumentMatchers` - Replace `anyObject` with `any` - Replace `getArgumentAt` with `getArgument` and add type annotation. - Use `isNull` matcher in case of `null` is invoked. ```scala saslHandler.channelInactive(null); - verify(handler).channelInactive(any(TransportClient.class)); + verify(handler).channelInactive(isNull()); ``` - Make and use `doReturn` wrapper to avoid [SI-4775](https://issues.scala-lang.org/browse/SI-4775) ```scala private def doReturn(value: Any) = org.mockito.Mockito.doReturn(value, Seq.empty: _*) ``` ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #23452 from dongjoon-hyun/SPARK-26536. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-04 19:23:38 -08:00
Takuya UESHIN	4419e1daca	[SPARK-26445][CORE] Use ConfigEntry for hardcoded configs for driver/executor categories. ## What changes were proposed in this pull request? The PR makes hardcoded spark.driver, spark.executor, and spark.cores.max configs to use `ConfigEntry`. Note that some config keys are from `SparkLauncher` instead of defining in the config package object because the string is already defined in it and it does not depend on core module. ## How was this patch tested? Existing tests. Closes #23415 from ueshin/issues/SPARK-26445/hardcoded_driver_executor_configs. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-04 22:12:35 +08:00
Jungtaek Lim (HeartSaVioR)	05372d188a	[SPARK-26489][CORE] Use ConfigEntry for hardcoded configs for python/r categories ## What changes were proposed in this pull request? The PR makes hardcoded configs below to use ConfigEntry. * spark.pyspark * spark.python * spark.r This patch doesn't change configs which are not relevant to SparkConf (e.g. system properties, python source code) ## How was this patch tested? Existing tests. Closes #23428 from HeartSaVioR/SPARK-26489. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-01-03 14:30:27 -08:00
Marcelo Vanzin	4b3fe3a9cc	[SPARK-25815][K8S] Support kerberos in client mode, keytab-based token renewal. This change hooks up the k8s backed to the updated HadoopDelegationTokenManager, so that delegation tokens are also available in client mode, and keytab-based token renewal is enabled. The change re-works the k8s feature steps related to kerberos so that the driver does all the credential management and provides all the needed information to executors - so nothing needs to be added to executor pods. This also makes cluster mode behave a lot more similarly to client mode, since no driver-related config steps are run in the latter case. The main two things that don't need to happen in executors anymore are: - adding the Hadoop config to the executor pods: this is not needed since the Spark driver will serialize the Hadoop config and send it to executors when running tasks. - mounting the kerberos config file in the executor pods: this is not needed once you remove the above. The Hadoop conf sent by the driver with the tasks is already resolved (i.e. has all the kerberos names properly defined), so executors do not need access to the kerberos realm information anymore. The change also avoids creating delegation tokens unnecessarily. This means that they'll only be created if a secret with tokens was not provided, and if a keytab is not provided. In either of those cases, the driver code will handle delegation tokens: in cluster mode by creating a secret and stashing them, in client mode by using existing mechanisms to send DTs to executors. One last feature: the change also allows defining a keytab with a "local:" URI. This is supported in client mode (although that's the same as not saying "local:"), and in k8s cluster mode. This allows the keytab to be mounted onto the image from a pre-existing secret, for example. Finally, the new code always sets SPARK_USER in the driver and executor pods. This is in line with how other resource managers behave: the submitting user reflects which user will access Hadoop services in the app. (With kerberos, that's overridden by the logged in user.) That user is unrelated to the OS user the app is running as inside the containers. Tested: - client and cluster mode with kinit - cluster mode with keytab - cluster mode with local: keytab - YARN cluster with keytab (to make sure it isn't broken) Closes #22911 from vanzin/SPARK-25815. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2018-12-18 13:30:09 -08:00
suxingfate	114d0de14c	[SPARK-25922][K8] Spark Driver/Executor "spark-app-selector" label mismatch ## What changes were proposed in this pull request? In K8S Cluster mode, the algorithm to generate spark-app-selector/spark.app.id of spark driver is different with spark executor. This patch makes sure spark driver and executor to use the same spark-app-selector/spark.app.id if spark.app.id is set, otherwise it will use superclass applicationId. In K8S Client mode, spark-app-selector/spark.app.id for executors will use superclass applicationId. ## How was this patch tested? Manually run." Closes #23322 from suxingfate/SPARK-25922. Lead-authored-by: suxingfate <suxingfate@163.com> Co-authored-by: xinglwang <xinglwang@ebay.com> Signed-off-by: Yinan Li <ynli@google.com>	2018-12-17 13:36:57 -08:00
Marcelo Vanzin	a63e7b2a21	[SPARK-25877][K8S] Move all feature logic to feature classes. This change makes the driver and executor builders a lot simpler by encapsulating almost all feature logic into the respective feature classes. The only logic that remains is the creation of the initial pod, which needs to happen before anything else so is better to be left in the builder class. Most feature classes already behave fine when the config has nothing they should handle, but a few minor tweaks had to be added. Unit tests were also updated or added to account for these. The builder suites were simplified a lot and just test the remaining pod-related code in the builders themselves. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #23220 from vanzin/SPARK-25877.	2018-12-12 12:01:21 -08:00
mcheah	57d6fbfa8c	[SPARK-26239] File-based secret key loading for SASL. This proposes an alternative way to load secret keys into a Spark application that is running on Kubernetes. Instead of automatically generating the secret, the secret key can reside in a file that is shared between both the driver and executor containers. Unit tests. Closes #23252 from mccheah/auth-secret-with-file. Authored-by: mcheah <mcheah@palantir.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2018-12-11 13:50:16 -08:00
Marcelo Vanzin	dbd90e5440	[SPARK-26194][K8S] Auto generate auth secret for k8s apps. This change modifies the logic in the SecurityManager to do two things: - generate unique app secrets also when k8s is being used - only store the secret in the user's UGI on YARN The latter is needed so that k8s won't unnecessarily create k8s secrets for the UGI credentials when only the auth token is stored there. On the k8s side, the secret is propagated to executors using an environment variable instead. This ensures it works in both client and cluster mode. Security doc was updated to mention the feature and clarify that proper access control in k8s should be enabled for it to be secure. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #23174 from vanzin/SPARK-26194.	2018-12-06 14:17:13 -08:00
Stavros Kontopoulos	a24e1a126c	[SPARK-26256][K8S] Fix labels for pod deletion ## What changes were proposed in this pull request? Adds proper labels when deleting executor pods. ## How was this patch tested? Manually with tests. Closes #23209 from skonto/fix-deletion-labels. Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2018-12-03 14:57:18 -08:00
Stavros Kontopoulos	0c2935b01d	[SPARK-25515][K8S] Adds a config option to keep executor pods for debugging ## What changes were proposed in this pull request? Keeps K8s executor resources present if case of failure or normal termination. Introduces a new boolean config option: `spark.kubernetes.deleteExecutors`, with default value set to true. The idea is to update Spark K8s backend structures but leave the resources around. The assumption is that since entries are not removed from the `removedExecutorsCache` we are immune to updates that refer to the the executor resources previously removed. The only delete operation not touched is the one in the `doKillExecutors` method. Reason is right now we dont support [blacklisting](https://issues.apache.org/jira/browse/SPARK-23485) and dynamic allocation with Spark on K8s. In both cases in the future we might want to handle these scenarios although its more complicated. More tests can be added if approach is approved. ## How was this patch tested? Manually by running a Spark job and verifying pods are not deleted. Closes #23136 from skonto/keep_pods. Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Signed-off-by: Yinan Li <ynli@google.com>	2018-12-03 09:02:47 -08:00
Marcelo Vanzin	6be272b75b	[SPARK-25876][K8S] Simplify kubernetes configuration types. There are a few issues with the current configuration types used in the kubernetes backend: - they use type parameters for role-specific specialization, which makes type signatures really noisy throughout the code base. - they break encapsulation by forcing the code that creates the config object to remove the configuration from SparkConf before creating the k8s-specific wrapper. - they don't provide an easy way for tests to have default values for fields they do not use. This change fixes those problems by: - creating a base config type with role-specific specialization using inheritance - encapsulating the logic of parsing SparkConf into k8s-specific views inside the k8s config classes - providing some helper code for tests to easily override just the part of the configs they want. Most of the change relates to the above, especially cleaning up the tests. While doing that, I also made some smaller changes elsewhere: - removed unnecessary type parameters in KubernetesVolumeSpec - simplified the error detection logic in KubernetesVolumeUtils; all the call sites would just throw the first exception collected by that class, since they all called "get" on the "Try" object. Now the unnecessary wrapping is gone and the exception is just thrown where it occurs. - removed a lot of unnecessary mocking from tests. - changed the kerberos-related code so that less logic needs to live in the driver builder. In spirit it should be part of the upcoming work in this series of cleanups, but it made parts of this change simpler. Tested with existing unit tests and integration tests. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #22959 from vanzin/SPARK-25876.	2018-11-30 16:23:37 -08:00
Nihar Sheth	3df307aa51	[SPARK-25960][K8S] Support subpath mounting with Kubernetes ## What changes were proposed in this pull request? This PR adds configurations to use subpaths with Spark on k8s. Subpaths (https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath) allow the user to specify a path within a volume to use instead of the volume's root. ## How was this patch tested? Added unit tests. Ran SparkPi on a cluster with event logging pointed at a subpath-mount and verified the driver host created and used the subpath. Closes #23026 from NiharS/k8s_subpath. Authored-by: Nihar Sheth <niharrsheth@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2018-11-26 11:06:02 -08:00
Sean Owen	32365f8177	[SPARK-26090][CORE][SQL][ML] Resolve most miscellaneous deprecation and build warnings for Spark 3 ## What changes were proposed in this pull request? The build has a lot of deprecation warnings. Some are new in Scala 2.12 and Java 11. We've fixed some, but I wanted to take a pass at fixing lots of easy miscellaneous ones here. They're too numerous and small to list here; see the pull request. Some highlights: - `BeanInfo` is deprecated in 2.12, and BeanInfo classes are pretty ancient in Java. Instead, case classes can explicitly declare getters - Eta expansion of zero-arg methods; foo() becomes () => foo() in many cases - Floating-point Range is inexact and deprecated, like 0.0 to 100.0 by 1.0 - finalize() is finally deprecated (just needs to be suppressed) - StageInfo.attempId was deprecated and easiest to remove here I'm not now going to touch some chunks of deprecation warnings: - Parquet deprecations - Hive deprecations (particularly serde2 classes) - Deprecations in generated code (mostly Thriftserver CLI) - ProcessingTime deprecations (we may need to revive this class as internal) - many MLlib deprecations because they concern methods that may be removed anyway - a few Kinesis deprecations I couldn't figure out - Mesos get/setRole, which I don't know well - Kafka/ZK deprecations (e.g. poll()) - Kinesis - a few other ones that will probably resolve by deleting a deprecated method ## How was this patch tested? Existing tests, including manual testing with the 2.11 build and Java 11. Closes #23065 from srowen/SPARK-26090. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-11-19 09:16:42 -06:00
DB Tsai	ad853c5678	[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 ## What changes were proposed in this pull request? This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds. We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11. ## How was this patch tested? existing tests Closes #22967 from dbtsai/scala2.12. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-11-14 16:22:23 -08:00
Marcelo Vanzin	3404a73f4c	[SPARK-25875][K8S] Merge code to set up driver command into a single step. Right now there are 3 different classes dealing with building the driver command to run inside the pod, one for each "binding" supported by Spark. This has two main shortcomings: - the code in the 3 classes is very similar; changing things in one place would probably mean making a similar change in the others. - it gives the false impression that the step implementation is the only place where binding-specific logic is needed. That is not true; there was code in KubernetesConf that was binding-specific, and there's also code in the executor-specific config step. So the 3 classes weren't really working as a language-specific abstraction. On top of that, the current code was propagating command line parameters in a different way depending on the binding. That doesn't seem necessary, and in fact using environment variables for command line parameters is in general a really bad idea, since you can't handle special characters (e.g. spaces) that way. This change merges the 3 different code paths for Java, Python and R into a single step, and also merges the 3 code paths to start the Spark driver in the k8s entry point script. This increases the amount of shared code, and also moves more feature logic into the step itself, so it doesn't live in KubernetesConf. Note that not all logic related to setting up the driver lives in that step. For example, the memory overhead calculation still lives separately, except it now happens in the driver config step instead of outside the step hierarchy altogether. Some of the noise in the diff is because of changes to KubernetesConf, which will be addressed in a separate change. Tested with new and updated unit tests + integration tests. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #22897 from vanzin/SPARK-25875.	2018-11-02 13:58:08 -07:00
Rob Vesse	fc8222298e	[SPARK-25809][K8S][TEST] New K8S integration testing backends ## What changes were proposed in this pull request? Currently K8S integration tests are hardcoded to use a `minikube` based backend. `minikube` is VM based so can be resource hungry and also doesn't cope well with certain networking setups (for example using Cisco AnyConnect software VPN `minikube` is unusable as it detects its own IP incorrectly). This PR Adds a new K8S integration testing backend that allows for using the Kubernetes support in [Docker for Desktop](https://blog.docker.com/2018/07/kubernetes-is-now-available-in-docker-desktop-stable-channel/). It also generalises the framework to be able to run the integration tests against an arbitrary Kubernetes cluster. To Do: - [x] General Kubernetes cluster backend - [x] Documentation on Kubernetes integration testing - [x] Testing of general K8S backend - [x] Check whether change from timestamps being `Time` to `String` in Fabric 8 upgrade needs additional fix up ## How was this patch tested? Ran integration tests with Docker for Desktop and all passed: ![screen shot 2018-10-23 at 14 19 56](https://user-images.githubusercontent.com/2104864/47363460-c5816a00-d6ce-11e8-9c15-56b34698e797.png) Suggested Reviewers: ifilonenko srowen Author: Rob Vesse <rvesse@dotnetrdf.org> Closes #22805 from rvesse/SPARK-25809.	2018-11-01 09:33:55 -07:00
Marcelo Vanzin	68dde3481e	[SPARK-23781][CORE] Merge token renewer functionality into HadoopDelegationTokenManager. This avoids having two classes to deal with tokens; now the above class is a one-stop shop for dealing with delegation tokens. The YARN backend extends that class instead of doing composition like before, resulting in a bit less code there too. The renewer functionality is basically the same code that used to be in YARN's AMCredentialRenewer. That is also the reason why the public API of HadoopDelegationTokenManager is a little bit odd; the YARN AM has some odd requirements for how this all should be initialized, and the weirdness is needed currently to support that. Tested: - YARN with stress app for DT renewal - Mesos and K8S with basic kerberos tests (both tgt and keytab) Closes #22624 from vanzin/SPARK-23781. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2018-10-31 13:00:10 -05:00
Onur Satici	f6cc354d83	[SPARK-24434][K8S] pod template files ## What changes were proposed in this pull request? New feature to pass podspec files for driver and executor pods. ## How was this patch tested? new unit and integration tests - [x] more overwrites in integration tests - [ ] invalid template integration test, documentation Author: Onur Satici <osatici@palantir.com> Author: Yifei Huang <yifeih@palantir.com> Author: onursatici <onursatici@gmail.com> Closes #22146 from onursatici/pod-template.	2018-10-30 13:52:44 -07:00
Ilan Filonenko	e9b71c8f01	[SPARK-25828][K8S] Bumping Kubernetes-Client version to 4.1.0 ## What changes were proposed in this pull request? Changed the `kubernetes-client` version and refactored code that broke as a result ## How was this patch tested? Unit and Integration tests Closes #22820 from ifilonenko/SPARK-25828. Authored-by: Ilan Filonenko <ifilondz@gmail.com> Signed-off-by: Erik Erlandson <eerlands@redhat.com>	2018-10-26 15:59:12 -07:00

1 2

94 commits