03e48c87f5
### What changes were proposed in this pull request? Setting `kubernetes.request.retry.backoffLimit` by default to 3 when the user haven't specified any value for it. This way when k8s API servers gives back HTTP status code >= 500 then an exponential backoff will be triggered (where `kubernetes.request.retry.backoffInterval` is 1000ms by default). For details please check https://github.com/fabric8io/kubernetes-client/issues/3087. ### Why are the changes needed? We experienced some internal K8s errors for example when the `etcdserver` leader election was ongoing the error was propagated to the API client and caused an issue in Spark: ``` Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/dex-app-bl24w4z9/pods/sparkpi-10-fcd3f6781a874212-driver. Message: etcdserver: leader changed. Received status: Status(apiVersion=v1, code=500, details=null, kind=Status, message=etcdserver: leader changed, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, status=Failure, additionalProperties={}). ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Running the integration tests along with `log4j.logger.org.apache.spark.deploy.k8s.SparkKubernetesClientFactory=DEBUG` the log4j config. It produced the following log: ``` 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: 21/07/08 11:01:14 DEBUG org.apache.spark.deploy.k8s.SparkKubernetesClientFactory: Kubernetes client config: { 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestConfig" : { 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "username" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "password" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "oauthToken" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "oauthTokenProvider" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateUsername" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateGroups" : [ null ], 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateExtras" : { }, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectInterval" : 1000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectLimit" : -1, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "connectionTimeout" : 10000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadConnectionTimeout" : 10000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadRequestTimeout" : 120000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffLimit" : 3, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffInterval" : 1000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestTimeout" : 10000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "rollingTimeout" : 900000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "scaleTimeout" : 600000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "loggingInterval" : 20000, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketTimeout" : 5000, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketPingInterval" : 0, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequests" : 64, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequestsPerHost" : 5, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateGroup" : null 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "contexts" : [ { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "talos-default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "admintalos-default" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "admintalos-default" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "arn:aws:eks:us-west-2:392479084068:cluster/mow", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "arn:aws:eks:us-west-2:392479084068:cluster/mow" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "arn:aws:eks:us-west-2:392479084068:cluster/mow" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "minikube", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "extensions" : [ { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "context_info" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ], 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "mow" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ], 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "currentContext" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "minikube", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "extensions" : [ { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "context_info" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ], 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequests" : 64, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequestsPerHost" : 5, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "autoConfigure" : false, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "trustCerts" : false, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "disableHostnameVerification" : false, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "masterUrl" : "https://192.168.64.127:8443/", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "apiVersion" : "v1", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "a0993113b8084cd3868b3052e698b17f", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "caCertFile" : "/Users/attilazsoltpiros/.minikube/ca.crt", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientCertFile" : "/Users/attilazsoltpiros/.minikube/profiles/minikube/client.crt", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientKeyFile" : "/Users/attilazsoltpiros/.minikube/profiles/minikube/client.key", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientKeyAlgo" : "RSA", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientKeyPassphrase" : "changeit", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectInterval" : 1000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectLimit" : -1, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "connectionTimeout" : 10000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadConnectionTimeout" : 10000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadRequestTimeout" : 120000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffLimit" : 3, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffInterval" : 1000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestTimeout" : 10000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "rollingTimeout" : 900000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "scaleTimeout" : 600000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "loggingInterval" : 20000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketTimeout" : 5000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketPingInterval" : 0, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateGroups" : [ null ], 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateExtras" : { }, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "http2Disable" : false, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "noProxy" : [ ], 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "tlsVersions" : [ "TLS_1_2" ], 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "errorMessages" : { 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "401" : "Unauthorized! Token may have expired! Please log-in again.", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "403" : "Forbidden! User minikube doesn't have permission." 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ``` Which contains the expected values: ``` 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffLimit" : 3, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffInterval" : 1000, ``` Closes #33261 from attilapiros/SPARK-35334. Authored-by: attilapiros <piros.attila.zsolt@gmail.com> Signed-off-by: attilapiros <piros.attila.zsolt@gmail.com> |
||
---|---|---|
.. | ||
core | ||
docker/src/main/dockerfiles/spark | ||
integration-tests |