From 2263fcf0fdb09f52ab7577c97c5b42daff866588 Mon Sep 17 00:00:00 2001 From: fscnick Date: Sat, 25 Oct 2025 11:34:08 +0800 Subject: [PATCH 1/2] [Doc][KubeRay] add minimum version requirement on kai-scheduler Signed-off-by: fscnick --- doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md b/doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md index d37901f87f2f..d8a3b2c4fa96 100644 --- a/doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md +++ b/doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md @@ -47,7 +47,7 @@ You can arrange queues hierarchically for organizations with multiple teams, for ## Step 1: Install KAI Scheduler -Install KAI Scheduler with gpu-sharing enabled. Choose the desired release version from [KAI Scheduler releases](https://github.com/NVIDIA/KAI-Scheduler/releases) and replace the `` in the following command. +Install KAI Scheduler with gpu-sharing enabled. Choose the desired release version from [KAI Scheduler releases](https://github.com/NVIDIA/KAI-Scheduler/releases) and replace the `` in the following command. (It's recommended to choose v0.10.0 or higher version.) ```bash # Install KAI Scheduler From 2fb94eb8035637b8d9fe066f11c7fce2a55d45f6 Mon Sep 17 00:00:00 2001 From: fscnick Date: Sun, 26 Oct 2025 16:38:37 +0800 Subject: [PATCH 2/2] [Doc][KubeRay] move install kai-scheduler to Prerequisites Signed-off-by: fscnick --- .../kubernetes/k8s-ecosystem/kai-scheduler.md | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md b/doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md index d8a3b2c4fa96..4327a0b1b1fb 100644 --- a/doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md +++ b/doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md @@ -44,17 +44,14 @@ You can arrange queues hierarchically for organizations with multiple teams, for * Kubernetes cluster with GPU nodes * NVIDIA GPU Operator * kubectl configured to access your cluster - -## Step 1: Install KAI Scheduler - -Install KAI Scheduler with gpu-sharing enabled. Choose the desired release version from [KAI Scheduler releases](https://github.com/NVIDIA/KAI-Scheduler/releases) and replace the `` in the following command. (It's recommended to choose v0.10.0 or higher version.) +* Install KAI Scheduler with gpu-sharing enabled. Choose the desired release version from [KAI Scheduler releases](https://github.com/NVIDIA/KAI-Scheduler/releases) and replace the `` in the following command. It's recommended to choose v0.10.0 or higher version. ```bash # Install KAI Scheduler helm upgrade -i kai-scheduler oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler -n kai-scheduler --create-namespace --version --set "global.gpuSharing=true" ``` -## Step 2: Install the KubeRay operator with KAI Scheduler as the batch scheduler +## Step 1: Install the KubeRay operator with KAI Scheduler as the batch scheduler Follow the official KubeRay operator [installation documentation](https://docs.ray.io/en/master/cluster/kubernetes/getting-started/kuberay-operator-installation.html#kuberay-operator-installation) and add the following configuration to enable KAI Scheduler integration: @@ -62,7 +59,7 @@ Follow the official KubeRay operator [installation documentation](https://docs.r --set batchScheduler.name=kai-scheduler ``` -## Step 3: Create KAI Scheduler Queues +## Step 2: Create KAI Scheduler Queues Create a basic queue structure for department-1 and its child team-a. For demo reasons, this example doesn't enforce any quota, overQuotaWeight, or limit. You can configure these parameters depending on your needs: @@ -112,7 +109,7 @@ spec: Note: To make this demo easier to follow, we combined these queue definitions with the RayCluster example in the next step. You can use the single combined YAML file and apply both queues and workloads at once. -## Step 4: Gang scheduling with KAI Scheduler +## Step 3: Gang scheduling with KAI Scheduler The key pattern is to add the queue label to your RayCluster. [Here's a basic example](https://github.com/ray-project/kuberay/tree/master/ray-operator/config/samples/ray-cluster.kai-scheduler.yaml) from the KubeRay repository: @@ -175,7 +172,7 @@ You can submit the same workload above with a specific priority. Modify the abov ``` See the [documentation](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/priority) for more information. -## Step 5: Submitting Ray workers with GPU sharing +## Step 4: Submitting Ray workers with GPU sharing This example creates two workers that share a single GPU (0.5 each, with time-slicing) within a RayCluster. See the [YAML file](https://github.com/ray-project/kuberay/tree/master/ray-operator/config/samples/ray-cluster.kai-gpu-sharing.yaml)):