diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index d0c6012e00..e9c292d21f 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -307,7 +307,18 @@ And, the claim name of a `persistentVolumeClaim` with volume name `checkpointpvc spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.claimName=check-point-pvc-claim ``` -The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below. +The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. + +For example, you can mount a dynamically-created persistent volume claim per executor by using `OnDemand` as a claim name and `storageClass` and `sizeLimit` options like the following. This is useful in case of [Dynamic Allocation](configuration.html#dynamic-allocation). +``` +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.claimName=OnDemand +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.storageClass=gp +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.sizeLimit=500Gi +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.path=/data +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.readOnly=false +``` + +For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below. ## Local Storage @@ -318,6 +329,15 @@ Spark supports using volumes to spill data during shuffles and other operations. --conf spark.kubernetes.driver.volumes.[VolumeType].spark-local-dir-[VolumeName].mount.readOnly=false ``` +Specifically, you can use persistent volume claims if the jobs require large shuffle and sorting operations in executors. + +``` +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=gp +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false +``` If no volume is set as local storage, Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `spark.local.dir` or the environment variable `SPARK_LOCAL_DIRS` . If no directories are explicitly specified then a default directory is created and configured appropriately.