You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
+5-8Lines changed: 5 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,25 +44,22 @@ You can arrange queues hierarchically for organizations with multiple teams, for
44
44
* Kubernetes cluster with GPU nodes
45
45
* NVIDIA GPU Operator
46
46
* kubectl configured to access your cluster
47
-
48
-
## Step 1: Install KAI Scheduler
49
-
50
-
Install KAI Scheduler with gpu-sharing enabled. Choose the desired release version from [KAI Scheduler releases](https://github.com/NVIDIA/KAI-Scheduler/releases) and replace the `<KAI_SCHEDULER_VERSION>` in the following command.
47
+
* Install KAI Scheduler with gpu-sharing enabled. Choose the desired release version from [KAI Scheduler releases](https://github.com/NVIDIA/KAI-Scheduler/releases) and replace the `<KAI_SCHEDULER_VERSION>` in the following command. It's recommended to choose v0.10.0 or higher version.
## Step 2: Install the KubeRay operator with KAI Scheduler as the batch scheduler
54
+
## Step 1: Install the KubeRay operator with KAI Scheduler as the batch scheduler
58
55
59
56
Follow the official KubeRay operator [installation documentation](https://docs.ray.io/en/master/cluster/kubernetes/getting-started/kuberay-operator-installation.html#kuberay-operator-installation) and add the following configuration to enable KAI Scheduler integration:
60
57
61
58
```bash
62
59
--set batchScheduler.name=kai-scheduler
63
60
```
64
61
65
-
## Step 3: Create KAI Scheduler Queues
62
+
## Step 2: Create KAI Scheduler Queues
66
63
67
64
Create a basic queue structure for department-1 and its child team-a. For demo reasons, this example doesn't enforce any quota, overQuotaWeight, or limit. You can configure these parameters depending on your needs:
68
65
@@ -112,7 +109,7 @@ spec:
112
109
113
110
Note: To make this demo easier to follow, we combined these queue definitions with the RayCluster example in the next step. You can use the single combined YAML file and apply both queues and workloads at once.
114
111
115
-
## Step 4: Gang scheduling with KAI Scheduler
112
+
## Step 3: Gang scheduling with KAI Scheduler
116
113
117
114
The key pattern is to add the queue label to your RayCluster. [Here's a basic example](https://github.com/ray-project/kuberay/tree/master/ray-operator/config/samples/ray-cluster.kai-scheduler.yaml) from the KubeRay repository:
118
115
@@ -175,7 +172,7 @@ You can submit the same workload above with a specific priority. Modify the abov
175
172
```
176
173
See the [documentation](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/priority) for more information.
177
174
178
-
## Step 5: Submitting Ray workers with GPU sharing
175
+
## Step 4: Submitting Ray workers with GPU sharing
179
176
180
177
This example creates two workers that share a single GPU (0.5 each, with time-slicing) within a RayCluster. See the [YAML file](https://github.com/ray-project/kuberay/tree/master/ray-operator/config/samples/ray-cluster.kai-gpu-sharing.yaml)):
0 commit comments