From 3301fa79bd579659b410f52b31e972fe3f3c39eb Mon Sep 17 00:00:00 2001 From: Yikun Jiang Date: Wed, 16 Mar 2022 12:07:56 +0800 Subject: [PATCH 1/5] Add doc for volcano --- docs/running-on-kubernetes.md | 77 +++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index b7adb099bed1..9a13b5e3654a 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -1732,6 +1732,83 @@ Spark allows users to specify a custom Kubernetes schedulers. - Create additional Kubernetes custom resources for driver/executor scheduling. - Set scheduler hints according to configuration or existing Pod info dynamically. +#### Using Volcano as Customized Scheduler for Spark on Kubernetes + +##### Prerequisites +* Volcano supports Spark on Kubernetes since v1.5. Mini version: v1.5.1+. See also [Volcano installation](https://volcano.sh/en/docs/installation). + +##### Usage +Spark on Kubernetes allows using Volcano as a customized scheduler. Users can use Volcano to +support more advanced resource scheduling: queue scheduling, resource reservation, priority scheduling, for example: + +``` +# Specify volcano scheduler +--conf spark.kubernetes.scheduler.name=volcano +# Specify driver/executor VolcanoFeatureStep +--conf spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep +--conf spark.kubernetes.executor.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep +# Specify PodGroup template +--conf spark.kubernetes.scheduler.volcano.podGroupTemplateFile=/path/to/podgroup-template.yaml +``` + +##### Volcano Feature Step +Volcano feature steps help users to create Volcano PodGroup and set driver/executor pod annotation to link this PodGroup. + +Note that, currently only supported driver/job level PodGroup in Volcano Feature Step, executor separate PodGroup is not supported yet. + +##### Volcano PodGroup Template +Volcano defines PodGroup spec using [CRD yaml](https://volcano.sh/en/docs/podgroup/#example) + +Similar to [Pod template](#pod-template), Spark users can similarly use Volcano PodGroup Template to define the PodGroup spec configurations. + +To do so, specify the spark properties `spark.kubernetes.scheduler.volcano.podGroupTemplateFile` to point to files accessible to the `spark-submit` process. + +Below is an example of PodGroup template, see also [PodGroup Introduction](https://volcano.sh/en/docs/podgroup/#introduction): + +``` +apiVersion: scheduling.volcano.sh/v1beta1 +kind: PodGroup +spec: + # Specify minMember to 1 to make driver + minMember: 1 + # Specify minResources to support resource reservation + minResources: + cpu: "2" + memory: "3Gi" + # Specify the priority + priorityClassName: high-priority + queue: default +``` + +##### Features + + + + + + + + + + + + + + + + + +
SchedulingDescriptionConfiguration
Queue scheduling + Queue indicates the resource queue, which adopts FIFO. is also used as the basis for resource division. + help users specify which queue the job to submit. + `spec.queue` field in PodGroup template
Resource reservation + Resource reservation, aka `Gang` scheduling (start all or nothing), helps users reserve resources for specific jobs. + It's useful for ensuring resource are meet the minimum requirements of spark job and avoiding all drivers stuck + due to all executor pending, especially, when cluster resources are very limited. + `spec.minResources` field in PodGroup template
Priority scheduling + It is used to help users to specify job priority in the queue during scheduling. + `spec.priorityClassName` field in PodGroup template
+ ### Stage Level Scheduling Overview Stage level scheduling is supported on Kubernetes when dynamic allocation is enabled. This also requires spark.dynamicAllocation.shuffleTracking.enabled to be enabled since Kubernetes doesn't support an external shuffle service at this time. The order in which containers for different profiles is requested from Kubernetes is not guaranteed. Note that since dynamic allocation on Kubernetes requires the shuffle tracking feature, this means that executors from previous stages that used a different ResourceProfile may not idle timeout due to having shuffle data on them. This could result in using more cluster resources and in the worst case if there are no remaining resources on the Kubernetes cluster then Spark could potentially hang. You may consider looking at config spark.dynamicAllocation.shuffleTracking.timeout to set a timeout, but that could result in data having to be recomputed if the shuffle data is really needed. From b4db3e11b9da0230411e1908485674c82892f990 Mon Sep 17 00:00:00 2001 From: Yikun Jiang Date: Mon, 21 Mar 2022 11:17:38 +0800 Subject: [PATCH 2/5] Address comments --- docs/running-on-kubernetes.md | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 9a13b5e3654a..dd001cc42ec0 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -1734,12 +1734,17 @@ Spark allows users to specify a custom Kubernetes schedulers. #### Using Volcano as Customized Scheduler for Spark on Kubernetes +**This feature is currently experimental. In future versions, there may be behavioral changes around configuration, feature step improvement.** + ##### Prerequisites -* Volcano supports Spark on Kubernetes since v1.5. Mini version: v1.5.1+. See also [Volcano installation](https://volcano.sh/en/docs/installation). +* Spark on Kubernetes with Volcano as a custom scheduler is supported since Spark v3.3.0 and Volcano v1.5.1. +* See also [Volcano installation](https://volcano.sh/en/docs/installation). ##### Usage -Spark on Kubernetes allows using Volcano as a customized scheduler. Users can use Volcano to -support more advanced resource scheduling: queue scheduling, resource reservation, priority scheduling, for example: +Spark on Kubernetes allows using Volcano as a custom scheduler. Users can use Volcano to +support more advanced resource scheduling: queue scheduling, resource reservation, priority scheduling, and more. + +To use Volcano as a custom scheduler the user needs to specify the following configuration options: ``` # Specify volcano scheduler @@ -1752,16 +1757,16 @@ support more advanced resource scheduling: queue scheduling, resource reservatio ``` ##### Volcano Feature Step -Volcano feature steps help users to create Volcano PodGroup and set driver/executor pod annotation to link this PodGroup. +Volcano feature steps help users to create a Volcano PodGroup and set driver/executor pod annotation to link with this PodGroup. -Note that, currently only supported driver/job level PodGroup in Volcano Feature Step, executor separate PodGroup is not supported yet. +Note that currently only driver/job level PodGroup is supported in Volcano Feature Step. Executor PodGroup is not supported yet. ##### Volcano PodGroup Template Volcano defines PodGroup spec using [CRD yaml](https://volcano.sh/en/docs/podgroup/#example) Similar to [Pod template](#pod-template), Spark users can similarly use Volcano PodGroup Template to define the PodGroup spec configurations. -To do so, specify the spark properties `spark.kubernetes.scheduler.volcano.podGroupTemplateFile` to point to files accessible to the `spark-submit` process. +To do so, specify the Spark properties `spark.kubernetes.scheduler.volcano.podGroupTemplateFile` to point to files accessible to the `spark-submit` process. Below is an example of PodGroup template, see also [PodGroup Introduction](https://volcano.sh/en/docs/podgroup/#introduction): @@ -1786,8 +1791,8 @@ spec: Queue scheduling - Queue indicates the resource queue, which adopts FIFO. is also used as the basis for resource division. - help users specify which queue the job to submit. + Queue indicates the resource queue, which adopts FIFO. It is also used as the basis for resource division. + Helps the user to specify to which queue the job should be submitted to. `spec.queue` field in PodGroup template @@ -1795,15 +1800,15 @@ spec: Resource reservation Resource reservation, aka `Gang` scheduling (start all or nothing), helps users reserve resources for specific jobs. - It's useful for ensuring resource are meet the minimum requirements of spark job and avoiding all drivers stuck - due to all executor pending, especially, when cluster resources are very limited. + It is useful for ensource the available resources meet the minimum requirements of the Spark job and avoiding the + situation where drivers are scheduled, and then they are unable to schedule sufficient executors to progress. `spec.minResources` field in PodGroup template Priority scheduling - It is used to help users to specify job priority in the queue during scheduling. + It is used to help users to specify job priority in the queue during scheduling. `spec.priorityClassName` field in PodGroup template From d07b9aa5c69731892ed0dc887796cbcfef938a5f Mon Sep 17 00:00:00 2001 From: Yikun Jiang Date: Thu, 24 Mar 2022 17:24:48 +0800 Subject: [PATCH 3/5] Address comments --- docs/running-on-kubernetes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index dd001cc42ec0..43c7315cae0b 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -1759,7 +1759,7 @@ To use Volcano as a custom scheduler the user needs to specify the following con ##### Volcano Feature Step Volcano feature steps help users to create a Volcano PodGroup and set driver/executor pod annotation to link with this PodGroup. -Note that currently only driver/job level PodGroup is supported in Volcano Feature Step. Executor PodGroup is not supported yet. +Note that currently only driver/job level PodGroup is supported in Volcano Feature Step. ##### Volcano PodGroup Template Volcano defines PodGroup spec using [CRD yaml](https://volcano.sh/en/docs/podgroup/#example) From 75589f56caaaceabbf566387fd891e5275f4d287 Mon Sep 17 00:00:00 2001 From: Yikun Jiang Date: Fri, 25 Mar 2022 17:59:39 +0800 Subject: [PATCH 4/5] Add build with volcano guide --- docs/running-on-kubernetes.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 43c7315cae0b..da10594f5864 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -1740,6 +1740,13 @@ Spark allows users to specify a custom Kubernetes schedulers. * Spark on Kubernetes with Volcano as a custom scheduler is supported since Spark v3.3.0 and Volcano v1.5.1. * See also [Volcano installation](https://volcano.sh/en/docs/installation). +##### Build +To create a Spark distribution along with Volcano suppport like those distributed by the Spark [Downloads page](https://spark.apache.org/downloads.html): + +``` +./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes -Pvolcano +``` + ##### Usage Spark on Kubernetes allows using Volcano as a custom scheduler. Users can use Volcano to support more advanced resource scheduling: queue scheduling, resource reservation, priority scheduling, and more. From 75accf88ee8ca4e989d7f7fd72480ccea9053353 Mon Sep 17 00:00:00 2001 From: Yikun Jiang Date: Tue, 29 Mar 2022 14:50:38 +0800 Subject: [PATCH 5/5] Addres comments --- docs/running-on-kubernetes.md | 80 +++++++++++++---------------------- 1 file changed, 29 insertions(+), 51 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index da10594f5864..62a984bb61e4 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -1737,14 +1737,21 @@ Spark allows users to specify a custom Kubernetes schedulers. **This feature is currently experimental. In future versions, there may be behavioral changes around configuration, feature step improvement.** ##### Prerequisites -* Spark on Kubernetes with Volcano as a custom scheduler is supported since Spark v3.3.0 and Volcano v1.5.1. -* See also [Volcano installation](https://volcano.sh/en/docs/installation). +* Spark on Kubernetes with [Volcano](https://volcano.sh/en) as a custom scheduler is supported since Spark v3.3.0 and Volcano v1.5.1. Below is an example to install Volcano 1.5.1: + + ```bash + # x86_64 + kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.5.1/installer/volcano-development.yaml + + # arm64: + kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.5.1/installer/volcano-development-arm64.yaml + ``` ##### Build -To create a Spark distribution along with Volcano suppport like those distributed by the Spark [Downloads page](https://spark.apache.org/downloads.html): +To create a Spark distribution along with Volcano suppport like those distributed by the Spark [Downloads page](https://spark.apache.org/downloads.html), also see more in ["Building Spark"](https://spark.apache.org/docs/latest/building-spark.html): -``` -./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes -Pvolcano +```bash +./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pkubernetes -Pvolcano ``` ##### Usage @@ -1753,74 +1760,45 @@ support more advanced resource scheduling: queue scheduling, resource reservatio To use Volcano as a custom scheduler the user needs to specify the following configuration options: -``` -# Specify volcano scheduler +```bash +# Specify volcano scheduler and PodGroup template --conf spark.kubernetes.scheduler.name=volcano +--conf spark.kubernetes.scheduler.volcano.podGroupTemplateFile=/path/to/podgroup-template.yaml # Specify driver/executor VolcanoFeatureStep --conf spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep ---conf spark.kubernetes.executor.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep -# Specify PodGroup template ---conf spark.kubernetes.scheduler.volcano.podGroupTemplateFile=/path/to/podgroup-template.yaml +--conf spark.kubernetes.executor.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep``` ``` ##### Volcano Feature Step -Volcano feature steps help users to create a Volcano PodGroup and set driver/executor pod annotation to link with this PodGroup. +Volcano feature steps help users to create a Volcano PodGroup and set driver/executor pod annotation to link with this [PodGroup](https://volcano.sh/en/docs/podgroup/). Note that currently only driver/job level PodGroup is supported in Volcano Feature Step. ##### Volcano PodGroup Template -Volcano defines PodGroup spec using [CRD yaml](https://volcano.sh/en/docs/podgroup/#example) - -Similar to [Pod template](#pod-template), Spark users can similarly use Volcano PodGroup Template to define the PodGroup spec configurations. +Volcano defines PodGroup spec using [CRD yaml](https://volcano.sh/en/docs/podgroup/#example). -To do so, specify the Spark properties `spark.kubernetes.scheduler.volcano.podGroupTemplateFile` to point to files accessible to the `spark-submit` process. +Similar to [Pod template](#pod-template), Spark users can use Volcano PodGroup Template to define the PodGroup spec configurations. +To do so, specify the Spark property `spark.kubernetes.scheduler.volcano.podGroupTemplateFile` to point to files accessible to the `spark-submit` process. +Below is an example of PodGroup template: -Below is an example of PodGroup template, see also [PodGroup Introduction](https://volcano.sh/en/docs/podgroup/#introduction): - -``` +```yaml apiVersion: scheduling.volcano.sh/v1beta1 kind: PodGroup spec: - # Specify minMember to 1 to make driver + # Specify minMember to 1 to make a driver pod minMember: 1 - # Specify minResources to support resource reservation + # Specify minResources to support resource reservation (the driver pod resource and executors pod resource should be considered) + # It is useful for ensource the available resources meet the minimum requirements of the Spark job and avoiding the + # situation where drivers are scheduled, and then they are unable to schedule sufficient executors to progress. minResources: cpu: "2" memory: "3Gi" - # Specify the priority - priorityClassName: high-priority + # Specify the priority, help users to specify job priority in the queue during scheduling. + priorityClassName: system-node-critical + # Specify the queue, indicates the resource queue which the job should be submitted to queue: default ``` -##### Features - - - - - - - - - - - - - - - - - -
SchedulingDescriptionConfiguration
Queue scheduling - Queue indicates the resource queue, which adopts FIFO. It is also used as the basis for resource division. - Helps the user to specify to which queue the job should be submitted to. - `spec.queue` field in PodGroup template
Resource reservation - Resource reservation, aka `Gang` scheduling (start all or nothing), helps users reserve resources for specific jobs. - It is useful for ensource the available resources meet the minimum requirements of the Spark job and avoiding the - situation where drivers are scheduled, and then they are unable to schedule sufficient executors to progress. - `spec.minResources` field in PodGroup template
Priority scheduling - It is used to help users to specify job priority in the queue during scheduling. - `spec.priorityClassName` field in PodGroup template
- ### Stage Level Scheduling Overview Stage level scheduling is supported on Kubernetes when dynamic allocation is enabled. This also requires spark.dynamicAllocation.shuffleTracking.enabled to be enabled since Kubernetes doesn't support an external shuffle service at this time. The order in which containers for different profiles is requested from Kubernetes is not guaranteed. Note that since dynamic allocation on Kubernetes requires the shuffle tracking feature, this means that executors from previous stages that used a different ResourceProfile may not idle timeout due to having shuffle data on them. This could result in using more cluster resources and in the worst case if there are no remaining resources on the Kubernetes cluster then Spark could potentially hang. You may consider looking at config spark.dynamicAllocation.shuffleTracking.timeout to set a timeout, but that could result in data having to be recomputed if the shuffle data is really needed.