From 6e9c1cf2ac7deb673e2c7ff496a2a2b4cc56727d Mon Sep 17 00:00:00 2001 From: Dongjoon Hyun Date: Mon, 28 Sep 2020 15:42:27 -0700 Subject: [PATCH] [SPARK-33006][DOCS] Add dynamic PVC usage example into K8s doc --- docs/running-on-kubernetes.md | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index d0c6012e00aa..e9c292d21fd4 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -307,7 +307,18 @@ And, the claim name of a `persistentVolumeClaim` with volume name `checkpointpvc spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.claimName=check-point-pvc-claim ``` -The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below. +The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. + +For example, you can mount a dynamically-created persistent volume claim per executor by using `OnDemand` as a claim name and `storageClass` and `sizeLimit` options like the following. This is useful in case of [Dynamic Allocation](configuration.html#dynamic-allocation). +``` +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.claimName=OnDemand +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.storageClass=gp +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.sizeLimit=500Gi +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.path=/data +spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.readOnly=false +``` + +For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below. ## Local Storage @@ -318,6 +329,15 @@ Spark supports using volumes to spill data during shuffles and other operations. --conf spark.kubernetes.driver.volumes.[VolumeType].spark-local-dir-[VolumeName].mount.readOnly=false ``` +Specifically, you can use persistent volume claims if the jobs require large shuffle and sorting operations in executors. + +``` +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=gp +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false +``` If no volume is set as local storage, Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `spark.local.dir` or the environment variable `SPARK_LOCAL_DIRS` . If no directories are explicitly specified then a default directory is created and configured appropriately.