97605cd126
### What changes were proposed in this pull request? This PR aims to detect duplicate `mountPath`s and stop the job. ### Why are the changes needed? If there is a conflict on `mountPath`, the pod is created and repeats the following error messages and keeps running. Spark job should not keep running and wasting the cluster resources. We had better fail at Spark side. ``` $ k get pod -l 'spark-role in (driver,executor)' NAME READY STATUS RESTARTS AGE tpcds 1/1 Running 0 33m ``` ``` 20/10/18 05:09:26 WARN ExecutorPodsSnapshotsStoreImpl: Exception when notifying snapshot subscriber. io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: ... Message: Pod "tpcds-exec-1" is invalid: spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/data1": must be unique. ... ``` **AFTER THIS PR** The job will stop with the following error message instead of keeping running. ``` 20/10/18 06:58:45 ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop due to IllegalArgumentException java.lang.IllegalArgumentException: requirement failed: Found duplicated mountPath: `/data1` ``` ### Does this PR introduce _any_ user-facing change? Yes, but this is a bug fix. ### How was this patch tested? Pass the CI with the newly added test case. Closes #30084 from dongjoon-hyun/SPARK-33175-2. Lead-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> |
||
---|---|---|
.. | ||
core | ||
docker/src/main/dockerfiles/spark | ||
integration-tests |