Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
ead34c7
docs: Adding docs for Kuberay KAI scheduler integration
EkinKarabulut Jul 23, 2025
7024252
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Jul 28, 2025
156d5df
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Jul 28, 2025
3a87992
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Jul 28, 2025
8627c3b
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Jul 28, 2025
0fdb725
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Jul 28, 2025
970bc33
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Jul 28, 2025
0f21e8d
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Jul 28, 2025
bf27b38
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Jul 28, 2025
91bf42f
Updating the KAI explanation
EkinKarabulut Aug 18, 2025
f01878d
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
9794ed3
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
87ae6ca
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
dace493
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
5497faf
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
55105b4
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
fa6ad0d
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
4a91176
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
add44b1
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
03019ee
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
36e0327
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 2, 2025
ca1524b
fix: added links to install
EkinKarabulut Oct 2, 2025
017a827
fix: wording
EkinKarabulut Oct 2, 2025
77ee7f5
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 9, 2025
62afa32
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 9, 2025
399552c
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
d40dedf
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
59135cd
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
1afba05
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
57faa86
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
75d2a09
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
5cc1127
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
31100eb
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
2892aa1
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
f35b579
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
7300e7d
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
f4dc60c
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
ba27377
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
b08eaca
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
2df426f
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
156c996
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
6f55166
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
3c1cdda
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
48ba71e
Apply suggestion from @angelinalg
EkinKarabulut Oct 9, 2025
3217d90
Edits on suggestions
EkinKarabulut Oct 9, 2025
dedab5c
Making the suggested changes for the docs, adding curl commands for y…
EkinKarabulut Oct 9, 2025
edd90a0
Merge branch 'master' into docs/kai-scheduler-kuberay
jjyao Oct 9, 2025
e3d11e2
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 19, 2025
077c5b1
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 19, 2025
2447938
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 19, 2025
ed97293
Deleting the yaml example of gpu-sharing
EkinKarabulut Oct 19, 2025
28a1ef5
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
EkinKarabulut Oct 20, 2025
28b4d68
Merge branch 'master' into docs/kai-scheduler-kuberay
jjyao Oct 21, 2025
aad0131
Update doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
rueian Oct 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/cluster/kubernetes/k8s-ecosystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ k8s-ecosystem/ingress
k8s-ecosystem/metrics-references
k8s-ecosystem/prometheus-grafana
k8s-ecosystem/pyspy
k8s-ecosystem/kai-scheduler
k8s-ecosystem/volcano
k8s-ecosystem/yunikorn
k8s-ecosystem/kueue
Expand All @@ -20,6 +21,7 @@ k8s-ecosystem/scheduler-plugins
* {ref}`kuberay-metrics-references`
* {ref}`kuberay-prometheus-grafana`
* {ref}`kuberay-pyspy-integration`
* {ref}`kuberay-kai-scheduler`
* {ref}`kuberay-volcano`
* {ref}`kuberay-yunikorn`
* {ref}`kuberay-kueue`
Expand Down
300 changes: 300 additions & 0 deletions doc/source/cluster/kubernetes/k8s-ecosystem/kai-scheduler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
(kuberay-kai-scheduler)=
# Gang Scheduling, Queue Priority, and GPU Sharing for RayClusters using KAI Scheduler

This guide demonstrates how to use KAI Scheduler for setting up hierarchical queues with quotas, gang scheduling and GPU sharing using RayClusters.


## KAI Scheduler

[KAI Scheduler](https://github.com/NVIDIA/KAI-Scheduler) is a high-performance, scalable Kubernetes scheduler built for AI/ML workloads. Designed to orchestrate GPU clusters at massive scale, KAI optimizes GPU allocation and supports the full AI lifecycle - from interactive development to large distributed training and inference. Some of the key features are:
- **Bin-packing & Spread Scheduling**: Optimize node usage either by minimizing fragmentation (bin-packing) or increasing resiliency and load balancing (spread scheduling)
- **GPU Sharing**: Allow multiple Ray workloads from across teams to be packed on the same GPU, letting your organization fit more work onto your existing hardware and reducing idle GPU time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **GPU Sharing**: Allow multiple Ray workloads from across teams to be packed on the same GPU, letting your organization fit more work onto your existing hardware and reducing idle GPU time.
- **GPU sharing**: Allow Ray to pack multiple workloads from across teams on the same GPU, letting your organization fit more work onto your existing hardware and reducing idle GPU time.

- **Workload Autoscaling**: Scale Ray replicas/workers within min/max while respecting gang constraints
- **Cluster Autoscaling**: Compatible with dynamic cloud infrastructures (including auto-scalers like Karpenter)
- **Workload Priorities**: Prioritize Ray workloads effectively within queues
- **Hierarchical Queues & Fairness**: Two-level queues with quotas, over-quota weights, limits and equitable resource distribution between queues using DRF
and many more.
For more details and key features, please refer to [the documentation](https://github.com/NVIDIA/KAI-Scheduler?tab=readme-ov-file#key-features).

### Core Components

1. **PodGroups**: PodGroups are atomic units for scheduling and represent one or more interdependent pods that must be executed as a single unit, also known as gang scheduling. They are vital for distributed workloads. KAI Scheduler includes a **PodGrouper** that handles gang scheduling automatically.

**How PodGrouper works:**
```
RayCluster "distributed-training":
├── Head Pod: 1 GPU
└── Worker Group: 4 × 0.5 GPU = 2 GPUs
Total Group Requirement: 3 GPUs

PodGrouper ensures all 5 pods (1 head + 4 workers) are scheduled together or none at all.
```

2. **Queues**: Queues enforce fairness in resource distribution using:

- Quota: The baseline amount of resources guaranteed to the queue. Quotas are allocated first to ensure fairness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Quota: The baseline amount of resources guaranteed to the queue. Quotas are allocated first to ensure fairness.
- Quota: The baseline amount of resources guaranteed to the queue. Queues allocate quotas first to ensure fairness.

- Queue Priority: Determines the order in which queues receive resources beyond their quota. Higher-priority queues are served first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Queue Priority: Determines the order in which queues receive resources beyond their quota. Higher-priority queues are served first.
- Queue priority: Determines the order in which queues receive resources beyond their quota. The schedules serves the higher-priority queues first.

- Over-Quota Weight: Controls how surplus resources are shared among queues within the same priority level. Queues with higher weights receive a larger share of the extra resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Over-Quota Weight: Controls how surplus resources are shared among queues within the same priority level. Queues with higher weights receive a larger share of the extra resources.
- Over-quota weight: Controls how the scheduler shares surplus resources among queues within the same priority level. Queues with higher weights receive a larger share of the extra resources.

- Limit: Defines the maximum resources that the queue can consume.

Queues can be arranged hierarchically for organizations with multiple teams (e.g. departments with multiple teams).

## Prerequisites

* Kubernetes cluster with GPU nodes
* NVIDIA GPU Operator
* kubectl configured to access your cluster

## Step 1: Install KAI Scheduler

Install KAI Scheduler with gpu-sharing enabled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Install KAI Scheduler with gpu-sharing enabled:
Install KAI Scheduler with gpuSharing enabled:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to put KAI-scheduler release page link in this section(Install KAI Scheduler) to help the user find the version easily ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems need to install GPU-operator first as mentioned in KAI-scheduler Prerequisites even if not using GPU. Or, the kai-operator will prompt

no matches for kind \"ClusterPolicy\" in version \"nvidia.com/v1\"

By the way, I guess it might result from some recent change. Previously, it seems not need to install GPU-operator.


```bash
# Install KAI Scheduler
helm upgrade -i kai-scheduler oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler -n kai-scheduler --create-namespace --version <KAI_SCHEDULER_VERSION> --set "global.gpuSharing=true"
```

## Step 2: Install the KubeRay operator with KAI Scheduler as the batch scheduler

Follow the official KubeRay operator [installation documentation](https://docs.ray.io/en/master/cluster/kubernetes/getting-started/kuberay-operator-installation.html#kuberay-operator-installation) and add the following configuration to enable KAI Scheduler integration:

```bash
--set batchScheduler.name=kai-scheduler
```

## Step 3: Create KAI Scheduler Queues

Create a basic queue structure for department-1 and its child team-a (for demo reasons, we did not enforce any quota, overQuotaWeight and limit. Users can setup these parameters depending on their needs):

```yaml
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
name: department-1
spec:
#priority: 100 (optional)
resources:
cpu:
quota: -1
limit: -1
overQuotaWeight: 1
gpu:
quota: -1
limit: -1
overQuotaWeight: 1
memory:
quota: -1
limit: -1
overQuotaWeight: 1
---
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
name: team-a
spec:
#priority: 200 (optional)
parentQueue: department-1
resources:
cpu:
quota: -1
limit: -1
overQuotaWeight: 1
gpu:
quota: -1
limit: -1
overQuotaWeight: 1
memory:
quota: -1
limit: -1
overQuotaWeight: 1

# Verify queues are created
kubectl get queues
```

## Step 4: Gang-Scheduling with KAI Scheduler

The key pattern is to simply add the queue label to your RayCluster. [Here's a basic example](https://github.com/ray-project/kuberay/tree/master/ray-operator/config/samples/ray-cluster.kai-scheduler.yaml) from the KubeRay repository:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The key pattern is to simply add the queue label to your RayCluster. [Here's a basic example](https://github.com/ray-project/kuberay/tree/master/ray-operator/config/samples/ray-cluster.kai-scheduler.yaml) from the KubeRay repository:
The key pattern adds the queue label to your RayCluster. See the [basic example](https://github.com/ray-project/kuberay/tree/master/ray-operator/config/samples/ray-cluster.kai-scheduler.yaml) from the KubeRay repository


```yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: raycluster-sample
labels:
kai.scheduler/queue: team-a # This is the essential configuration!
spec:
headGroupSpec:
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.41.0
resources:
requests:
cpu: "1"
memory: "2Gi"
workerGroupSpecs:
- groupName: worker
replicas: 2
minReplicas: 2
template:
spec:
containers:
- name: ray-worker
image: rayproject/ray:2.41.0
resources:
requests:
cpu: "1"
memory: "1Gi"

```

Apply this RayCluster:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The user is instructed to run kubectl apply -f ray-cluster.kai-scheduler.yaml below, but there's no instruction to save the preceding YAML definition into that file. This can be confusing for users. Please add a note to save the YAML content to a file.


```bash
kubectl apply -f ray-cluster.kai-scheduler.yaml

# Watch the pods get scheduled
kubectl get pods -w
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to put the output of the command to help the reader verify the expect output easily.

```

## Setting Priorities for Workloads

In Kubernetes, assigning different priorities to workloads ensures efficient resource management, minimizes service disruption, and supports better scaling. By prioritizing workloads, KAI Scheduler schedules jobs according to their assigned priority. When sufficient resources aren't available for a workload, the scheduler can preempt lower-priority workloads to free up resources for higher-priority ones. This approach ensures that mission-critical services are always prioritized in resource allocation.

KAI scheduler deployment comes with several predefined priority classes:

- train (50) - can be used for preemptible training workloads
- build-preemptible (75) - can be used for preemptible build/interactive workloads
- build (100) - can be used for build/interactive workloads (non-preemptible)
- inference (125) - can be used for inference workloads (non-preemptible)

You can submit the same workload above with a specific priority. Here is an example how to turn the above example into a build class workload

```yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: raycluster-sample
labels:
kai.scheduler/queue: team-a # This is the essential configuration!
priorityClassName: build # Here you can specify the priority class (optional)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
priorityClassName: build # Here you can specify the priority class (optional)
priorityClassName: build # Specify the priority class (optional)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: RayCluster Priority Class Misplacement

The priorityClassName field is incorrectly placed in metadata.labels. For RayClusters, priorityClassName belongs in the pod template spec (e.g., spec.headGroupSpec.template.spec and spec.workerGroupSpecs[].template.spec), not as a label. This placement means the priority class won't be applied to the pods.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Priority Class Misplacement in RayCluster

The priorityClassName field is incorrectly placed in the RayCluster's metadata.labels section. This field belongs in the pod template spec (e.g., headGroupSpec.template.spec.priorityClassName and workerGroupSpecs[].template.spec.priorityClassName) to ensure the priority class is applied to pods.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Priority Class Misplaced in RayCluster Metadata

The priorityClassName is incorrectly specified as a label in the RayCluster metadata. This field belongs in the pod template spec for both head and worker groups, and its current placement prevents the priority class from being applied to the pods.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Priority Class Misplacement in RayCluster

The priorityClassName field is currently set as a label on the RayCluster metadata. In Kubernetes, priorityClassName belongs in the pod template spec (e.g., spec.headGroupSpec.template.spec.priorityClassName), so its current placement means the priority class won't be applied to the pods.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: RayCluster Priority Class Misplacement

The priorityClassName field is incorrectly placed under metadata.labels in the RayCluster examples. It should be specified within the pod template spec for the head and worker groups, otherwise the priority class won't be applied to the pods.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Priority Class Misplaced in RayCluster

The priorityClassName is shown in the RayCluster's metadata.labels, but it needs to be in the pod template's spec for the priority class to apply to the Ray pods. As currently placed, the priority class configuration will be ignored.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: RayCluster Priority Class Misplacement

The priorityClassName is incorrectly placed in the RayCluster's metadata.labels section. Kubernetes expects this field within the pod template spec (e.g., spec.headGroupSpec.template.spec.priorityClassName), so its current location prevents the priority class from being applied to the Ray pods.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: RayCluster Priority Class Placement Error

The priorityClassName is incorrectly placed under metadata.labels in the RayCluster example. Kubernetes expects priorityClassName within the pod template spec (spec.template.spec.priorityClassName) for it to be applied to pods, so it won't function as intended here.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: RayCluster PriorityClass Placement Error

The priorityClassName is incorrectly placed under metadata.labels in the RayCluster definition. For Kubernetes pods, priorityClassName should be specified within the pod template's spec section, like spec.template.spec.priorityClassName.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: YAML Configuration Error: Incorrect Field Placement

The example YAML shows priorityClassName under labels, but it's a spec-level field. This placement means the workload's priority won't be recognized, preventing proper prioritization.

Fix in Cursor Fix in Web

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @EkinKarabulut, could you make priorityClassName be spec level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others look good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rueian KAI Scheduler reads priority classes from workload labels (metadata.labels.priorityClassName) rather than pod specs, which allows it to assign priority to entire workloads. This is consistent with KAI Scheduler's official documentation and examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice! I will put a comment saying that it should not be the priorityClassName in the pod spec.

spec:
headGroupSpec:
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.41.0
resources:
requests:
cpu: "1"
memory: "2Gi"
workerGroupSpecs:
- groupName: worker
replicas: 2
minReplicas: 2
template:
spec:
containers:
- name: ray-worker
image: rayproject/ray:2.41.0
resources:
requests:
cpu: "1"
memory: "1Gi"

```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: RayCluster YAML Configuration Error

The priorityClassName field in the RayCluster example is incorrectly placed under metadata.labels. This field is a spec-level attribute that belongs within the spec.template.spec of both headGroupSpec and workerGroupSpecs. Its current placement results in invalid YAML, causing Kubernetes to reject the configuration.

Fix in Cursor Fix in Web


Please refer to documentation [here](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/priority)for more information.

## Step 5: Submitting Ray workers with GPU sharing

This example creates two workers that share a single GPU (0.5 each, with time-slicing) within a RayCluster (find the yaml file [here](https://github.com/ray-project/kuberay/tree/master/ray-operator/config/samples/ray-cluster.kai-gpu-sharing.yaml)):

```yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: raycluster-half-gpu
labels:
kai.scheduler/queue: team-a
spec:
headGroupSpec:
template:
spec:
containers:
- name: head
image: rayproject/ray:2.46.0
resources:
limits:
cpu: "1"
memory: "2Gi"

# ---- Two workers share one GPU (0.5 each) ----
workerGroupSpecs:
- groupName: shared-gpu
replicas: 2
minReplicas: 2
rayStartParams:
num-gpus: "0.5"
template:
metadata:
annotations:
gpu-fraction: "0.5"
spec:
containers:
- name: worker
image: rayproject/ray:2.46.0
resources:
limits:
cpu: "1"
memory: "2Gi"
```

```bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to a previous comment, the user is instructed to run kubectl apply below, but there's no instruction to save the YAML definition into a file. Please add a note to save the YAML content to a file (e.g., ray-cluster.kai-gpu-sharing.yaml).

kubectl apply -f ray-cluster.kai-gpu-sharing.yaml

# Watch the pods get scheduled
kubectl get pods -w
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to put the output of the command to help the reader verify the expect output easily.

```

Note: GPU sharing via time-slicing in this example occurs only at the Kubernetes layer, allowing multiple pods to share a single GPU device. Memory isolation is not enforced, so applications must manage their own usage to prevent interference. For other GPU sharing approaches (e.g., MPS), see the [the KAI documentation](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/gpu-sharing).

### Verify GPU Sharing is Working

To confirm that GPU sharing is working correctly, use these commands:

```bash
# 1. Check GPU fraction annotations and shared GPU groups
kubectl get pods -l ray.io/cluster=raycluster-half-gpu -o custom-columns="NAME:.metadata.name,NODE:.spec.nodeName,GPU-FRACTION:.metadata.annotations.gpu-fraction,GPU-GROUP:.metadata.labels.runai-gpu-group"
```

You should see both worker pods on the same node with `GPU-FRACTION: 0.5` and the same `GPU-GROUP` ID:

NAME NODE GPU-FRACTION GPU-GROUP
raycluster-half-gpu-head ip-xxx-xx-xx-xxx <none> <none>
raycluster-half-gpu-shared-gpu-worker-67tvw ip-xxx-xx-xx-xxx 0.5 3e456911-a6ea-4b1a-8f55-e90fba89ad76
raycluster-half-gpu-shared-gpu-worker-v5tpp ip-xxx-xx-xx-xxx 0.5 3e456911-a6ea-4b1a-8f55-e90fba89ad76

This shows that both workers have the same `NVIDIA_VISIBLE_DEVICES` (same physical GPU) and `GPU-FRACTION: 0.50`.

## Troubleshooting

### Check for missing queue labels

If pods remain in `Pending` state, the most common issue is missing queue labels.

Please check operator logs for KAI Scheduler errors and look for error messages like:

```bash
"Queue label missing from RayCluster; pods will remain pending"
```
**Solution**: Ensure your RayCluster has the queue label that exists in the cluster:

```yaml
metadata:
labels:
kai.scheduler/queue: default # Add this label
```