RayJob Volcano Integration #3972

win5923 · 2025-08-18T16:23:38Z

Why are these changes needed?

RayJob Volcano support: Adds Volcano scheduler support for RayJob CRD.
Gang scheduling: Ensures Ray pods and submitter pod are scheduled together as a unit, preventing partial scheduling issues.

E2E

Deploy the KubeRay operator with the batch-scheduler volcano:

./ray-operator/bin/manager -leader-election-namespace default -use-kubernetes-proxy -batch-scheduler=volcano

Create a RayJob with a head node (1 CPU + 2Gi of RAM), two workers (1 CPU + 1Gi of RAM each) and one submitter pod (0.5 CPU + 200Mi of RAM), for a total of 3500m CPU and 4296Mi of RAM

kubectl apply -f ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml

Add an additional RayJob with the same configuration but with a different name

sed 's/rayjob-sample-0/rayjob-sample-1/' ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml | kubectl apply -f-

All the pods stuck on pending for new RayJob

PodGroup

ray-rayjob-sample-0-pg:

$ k get podgroup ray-rayjob-sample-0-pg  -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  creationTimestamp: "2025-09-25T15:16:14Z"
  generation: 3
  name: ray-rayjob-sample-0-pg
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayJob
    name: rayjob-sample-0
    uid: e7652cc7-7593-4bd1-8ab1-bc043e62d7e5
  resourceVersion: "8779"
  uid: 84247ace-fcb5-4bce-9e18-b33e3769b941
spec:
  minMember: 3
  minResources:
    cpu: 3500m
    memory: 4296Mi
  queue: kuberay-test-queue
status:
  conditions:
  - lastTransitionTime: "2025-09-25T15:16:15Z"
    reason: tasks in gang are ready to be scheduled
    status: "True"
    transitionID: 6ccaf1db-e4f6-4cfa-ad71-f3abf039e03c
    type: Scheduled
  phase: Running
  running: 1

ray-rayjob-sample-1-pg:

$ k get podgroup ray-rayjob-sample-1-pg  -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  creationTimestamp: "2025-09-25T15:17:54Z"
  generation: 2
  name: ray-rayjob-sample-1-pg
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayJob
    name: rayjob-sample-1
    uid: 3a98a4fe-19a5-4f36-9ba3-ebd252c5a267
  resourceVersion: "9080"
  uid: 0dde7617-6fa7-4867-97f6-2deb965170a1
spec:
  minMember: 3
  minResources:
    cpu: 3500m
    memory: 4296Mi
  queue: kuberay-test-queue
status:
  conditions:
  - lastTransitionTime: "2025-09-25T15:17:55Z"
    message: '3/3 tasks in gang unschedulable: pod group is not ready, 3 Pending,
      3 minAvailable; Pending: 3 Unschedulable'
    reason: NotEnoughResources
    status: "True"
    transitionID: cfb01bbc-c53b-42e4-9b02-8e56c46b8e6c
    type: Unschedulable
  phase: Pending

Queue

$ k get queue kuberay-test-queue -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"scheduling.volcano.sh/v1beta1","kind":"Queue","metadata":{"annotations":{},"name":"kuberay-test-queue"},"spec":{"capability":{"cpu":4,"memory":"6Gi"},"weight":1}}
  creationTimestamp: "2025-09-25T15:17:54Z"
  generation: 2
  name: kuberay-test-queue
  resourceVersion: "9089"
  uid: 2690f4ca-aa29-4812-aa21-3d0228dfa271
spec:
  capability:
    cpu: 4
    memory: 6Gi
  parent: root
  reclaimable: true
  weight: 1
status:
  allocated:
    cpu: "3"
    memory: 4Gi
    pods: "3"
  reservation: {}
  state: Open

Testing RayJob HTTPMode

apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-sample-2
  labels:
    ray.io/scheduler-name: volcano
    volcano.sh/queue-name: kuberay-test-queue
spec:
  submissionMode: HTTPMode

$ k get podgroup ray-rayjob-sample-2-pg -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  creationTimestamp: "2025-09-25T15:20:15Z"
  generation: 2
  name: ray-rayjob-sample-2-pg
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayJob
    name: rayjob-sample-2
    uid: cc971b5a-347f-4e12-bc24-e6a65710f8d8
  resourceVersion: "9342"
  uid: e87b6399-74fa-4b42-9d88-3413c5b84865
spec:
  minMember: 3
  minResources:
    cpu: "3"
    memory: 4Gi
  queue: kuberay-test-queue
status:
  conditions:
  - lastTransitionTime: "2025-09-25T15:20:16Z"
    message: '3/3 tasks in gang unschedulable: pod group is not ready, 3 Pending,
      3 minAvailable; Pending: 3 Unschedulable'
    reason: NotEnoughResources
    status: "True"
    transitionID: c5c9e5c2-1bd4-4b0e-b405-78331ea6caf1
    type: Unschedulable
  phase: Pending

Related issue number

Closes #1580

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

Signed-off-by: Troy Chiu <[email protected]>

Signed-off-by: win5923 <[email protected]>

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

…notations Signed-off-by: win5923 <[email protected]>

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

Signed-off-by: win5923 <[email protected]>

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

troychiu · 2025-10-05T08:57:54Z

cc @Future-Outlier @rueian

Signed-off-by: win5923 <[email protected]>

Copilot

Pull Request Overview

This PR adds Volcano scheduler support for RayJob CRD, enabling gang scheduling to ensure Ray pods and submitter pods are scheduled together as a unit. This prevents partial scheduling issues where only some pods of a RayJob get scheduled.

Extends the existing Volcano batch scheduler to support RayJob objects in addition to RayCluster
Implements PodGroup creation for RayJob resources with proper resource calculation including submitter pod resources
Adds comprehensive test coverage for RayJob Volcano integration with different submission modes

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
ray-operator/controllers/ray/utils/util.go	Exports `SumResourceList` function for use in Volcano scheduler
ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go	Adds RayJob support to Volcano scheduler with gang scheduling logic
ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler_test.go	Adds comprehensive test coverage for RayJob Volcano integration
ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml	Provides sample configuration for testing RayJob with Volcano scheduler

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-07T20:45:27Z

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

+	// MinMember intentionally excludes the submitter pod to avoid a startup deadlock
+	// (submitter waits for cluster; gang would wait for submitter). We still add the
+	// submitter's resource requests into MinResources so capacity is reserved.


The comment explains the design decision well, but could be clearer about what 'gang would wait for submitter' means. Consider expanding to explain that the gang scheduler would wait for all pods including the submitter to be schedulable before scheduling any, creating a circular dependency.

Suggested change

// MinMember intentionally excludes the submitter pod to avoid a startup deadlock

// (submitter waits for cluster; gang would wait for submitter). We still add the

// submitter's resource requests into MinResources so capacity is reserved.

// MinMember intentionally excludes the submitter pod to avoid a startup deadlock.

// If the submitter pod were included in MinMember, the gang scheduler would wait for

// all pods—including the submitter—to be schedulable before scheduling any of them.

// This creates a circular dependency: the submitter pod waits for the cluster to be ready,

// but the cluster cannot be scheduled until the submitter is also schedulable. To avoid this,

// we exclude the submitter from MinMember, but still add its resource requests into MinResources

// so that capacity is reserved for it.

Copilot · 2025-10-07T20:45:28Z

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

+	podGroup := volcanoschedulingv1beta1.PodGroup{}
+	if err := v.cli.Get(ctx, types.NamespacedName{Namespace: owner.GetNamespace(), Name: podGroupName}, &podGroup); err != nil {
 		if !errors.IsNotFound(err) {
+			logger.Error(err, "failed to get PodGroup", "podGroupName", podGroupName, "ownerKind", utils.GetCRDType(owner.GetLabels()[utils.RayOriginatedFromCRDLabelKey]), "ownerName", owner.GetName(), "ownerNamespace", owner.GetNamespace())


Potential nil pointer dereference if owner.GetLabels() returns nil. The code should check if labels exist before accessing the map.

Copilot · 2025-10-07T20:45:28Z

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

 			}

-			logger.Error(err, "Pod group CREATE error!", "PodGroup.Error", err)
+			logger.Error(err, "failed to create PodGroup", "name", podGroupName, "ownerKind", utils.GetCRDType(owner.GetLabels()[utils.RayOriginatedFromCRDLabelKey]), "ownerName", owner.GetName(), "ownerNamespace", owner.GetNamespace())


Potential nil pointer dereference if owner.GetLabels() returns nil. The code should check if labels exist before accessing the map.

Copilot · 2025-10-07T20:45:28Z

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

 			podGroup.Spec.MinResources = &totalResource
 			if err := v.cli.Update(ctx, &podGroup); err != nil {
-				logger.Error(err, "Pod group UPDATE error!", "podGroup", podGroupName)
+				logger.Error(err, "failed to update PodGroup", "name", podGroupName, "ownerKind", utils.GetCRDType(owner.GetLabels()[utils.RayOriginatedFromCRDLabelKey]), "ownerName", owner.GetName(), "ownerNamespace", owner.GetNamespace())


Potential nil pointer dereference if owner.GetLabels() returns nil. The code should check if labels exist before accessing the map.

rueian · 2025-10-07T20:52:03Z

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

+	// MinMember intentionally excludes the submitter pod to avoid a startup deadlock
+	// (submitter waits for cluster; gang would wait for submitter). We still add the
+	// submitter's resource requests into MinResources so capacity is reserved.
+	if rayJob.Spec.SubmissionMode == rayv1.K8sJobMode {


Do we need to do this for SidecarMode?

I think in SidecarMode, when calculating the head pod's resource, we will call the function CalculatePodResource, and this func will iterate all containers in the head pod spec like thisfor _, container := range podSpec.Containers.
so I think sidecar mode will work, but if we can get a screenshot about a test for sidecar mode I will appreciate it.

cc @win5923

wait, @rueian is right.
when using sidecar mode, the Min Resources's CPU shoulde be 3.5, but this is not correct here.

Thanks @rueian!
I’ve already added the check for SidecarMode. ae2941f

Future-Outlier

looks nice!
but I think we have to

calculate the correct resource in the sidecar mode
test gangscheduling behavior before this PR get merged.
add gangscheduling label in the example, and test k8s job mode, http mode, and sidecar mode.

flowchart TD
    %% User creates RayJob
    A[User creates RayJob] --> B{Check RayJob Labels}
    B --> C["ray.io/gang-scheduling-enabled = \"true\""]
    B --> D["volcano.sh/queue-name = \"default\""]
    B --> E["ray.io/priority-class-name = \"high\""]
    
    %% RayJob Controller processes RayJob
    C --> F[RayJob Controller]
    D --> F
    E --> F
    
    F --> G[constructRayClusterForRayJob]
    G --> H[Copy RayJob Labels to RayCluster]
    G --> I[Copy RayJob Annotations to RayCluster]
    
    %% Batch Scheduler Manager intervention
    H --> J{BatchSchedulerManager Check}
    I --> J
    J --> K[VolcanoBatchScheduler.DoBatchSchedulingOnSubmission]
    
    %% Volcano scheduler handles RayJob
    K --> L[handleRayJob]
    L --> M[calculatePodGroupParams]
    M --> N[Calculate MinMember and MinResources]
    N --> O[MinMember = Head + Workers]
    N --> P[MinResources = Head + Workers + Submitter]
    
    %% Create PodGroup
    O --> Q[syncPodGroup]
    P --> Q
    Q --> R[Create PodGroup CRD]
    R --> S["PodGroup Name: ray-jobname-pg"]
    R --> T["MinMember: Head + Workers"]
    R --> U["MinResources: Total Resources"]
    R --> V["Queue: volcano.sh/queue-name"]
    R --> W["PriorityClassName: ray.io/priority-class-name"]
    
    %% RayCluster creation
    S --> X[RayCluster Created]
    T --> X
    U --> X
    V --> X
    W --> X
    
    %% RayCluster Controller handles Pods
    X --> Y[RayCluster Controller]
    Y --> Z[buildHeadPod]
    Y --> AA[buildWorkerPod]
    
    %% Add Volcano metadata to each Pod
    Z --> BB[VolcanoBatchScheduler.AddMetadataToPod]
    AA --> BB
    
    BB --> CC[Set Pod Annotations]
    CC --> DD["scheduling.volcano.sh/group-name = ray-jobname-pg"]
    CC --> EE["batch.volcano.sh/task-spec = groupName"]
    
    BB --> FF[Set Pod Labels]
    FF --> GG["volcano.sh/queue-name = \"default\""]
    
    BB --> HH[Set Pod Spec]
    HH --> II["schedulerName = \"volcano\""]
    HH --> JJ["priorityClassName = \"high\""]
    
    %% RayJob Controller creates Kubernetes Job
    F --> KK[Create Kubernetes Job]
    KK --> LL[Kubernetes Job Controller]
    LL --> MM[Create Job Pod]
    
    %% Add Volcano metadata to Job Pod
    MM --> NN[VolcanoBatchScheduler.AddMetadataToPod]
    NN --> OO[Set Job Pod Annotations]
    OO --> PP["scheduling.volcano.sh/group-name = ray-jobname-pg"]
    OO --> QQ["batch.volcano.sh/task-spec = \"submitter\""]
    
    NN --> RR[Set Job Pod Labels]
    RR --> SS["volcano.sh/queue-name = \"default\""]
    
    NN --> TT[Set Job Pod Spec]
    TT --> UU["schedulerName = \"volcano\""]
    TT --> VV["priorityClassName = \"high\""]
    
    %% Pod scheduling
    DD --> WW[All Pods submitted to Volcano]
    EE --> WW
    GG --> WW
    II --> WW
    JJ --> WW
    PP --> WW
    QQ --> WW
    SS --> WW
    UU --> WW
    VV --> WW
    
    WW --> XX[Volcano Gang Scheduler]
    XX --> YY[Check PodGroup status]
    YY --> ZZ[Wait for PodGroup resources]
    ZZ --> AAA[Check MinMember and MinResources]
    AAA --> BBB[Schedule all Pods simultaneously]
    
    %% Final state
    BBB --> CCC[Ray Cluster Running]
    CCC --> DDD[Job Pod Executing]
    DDD --> EEE[Submit Ray Job to Ray Cluster]
    EEE --> FFF[Execute Ray Job]
    
    %% Style definitions
    classDef userAction fill:#e1f5fe
    classDef controller fill:#f3e5f5
    classDef scheduler fill:#e8f5e8
    classDef pod fill:#fff3e0
    classDef volcano fill:#ffebee
    classDef podgroup fill:#f0f8ff
    classDef job fill:#f0f8ff
    
    class A userAction
    class F,G,Y,LL controller
    class K,L,M,N,Q,XX scheduler
    class Z,AA,MM pod
    class BB,CC,DD,EE,FF,GG,HH,II,JJ,NN,OO,PP,QQ,RR,SS,TT,UU,VV volcano
    class R,S,T,U,V,W podgroup
    class KK,DDD,EEE job

Future-Outlier · 2025-10-08T05:14:29Z

ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml

+  labels:
+    ray.io/scheduler-name: volcano
+    volcano.sh/queue-name: kuberay-test-queue


Hi, @win5923
can we add ray.io/gang-scheduling-enabled: "true" in the example and test them?

Currently, adding ray.io/gang-scheduling-enabled: "true" does not have any effect. This only works with YuniKorn or the Scheduler plugin.

kuberay/ray-operator/controllers/ray/batchscheduler/yunikorn/yunikorn_scheduler.go

Lines 120 to 123 in c6bafa3

func (y *YuniKornScheduler) isGangSchedulingEnabled(obj metav1.Object) bool {

_, exist := obj.GetLabels()[utils.RayGangSchedulingEnabled]

return exist

}

kuberay/ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go

Lines 109 to 112 in c6bafa3

func (k *KubeScheduler) isGangSchedulingEnabled(app *rayv1.RayCluster) bool {

_, exist := app.Labels[utils.RayGangSchedulingEnabled]

return exist

}

And I think this is a breaking change if we add this check. We should also update the doc to mention that starting from version 1.5.0, users need to add ray.io/gang-scheduling-enabled: "true" to enable gang scheduling for Volcano. https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html

I found that volcano's default scheduler configmap is gang scheduling enabled!
so in the future, if the user want to disable it, we might need to tell them to edit the configmap or figure out some way to control it by adding more information in our CR.

thank you!!

apiVersion: v1 data: volcano-scheduler.conf: | actions: "enqueue, allocate, backfill" tiers: - plugins: - name: priority - name: gang enablePreemptable: false - name: conformance - plugins: - name: overcommit - name: drf enablePreemptable: false - name: predicates - name: proportion - name: nodeorder - name: binpack

Future-Outlier · 2025-10-08T05:17:09Z

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

+	// MinMember intentionally excludes the submitter pod to avoid a startup deadlock
+	// (submitter waits for cluster; gang would wait for submitter). We still add the
+	// submitter's resource requests into MinResources so capacity is reserved.
+	if rayJob.Spec.SubmissionMode == rayv1.K8sJobMode {


wait, @rueian is right.
when using sidecar mode, the Min Resources's CPU shoulde be 3.5, but this is not correct here.

Future-Outlier

will request changes until all comments are resolved

Signed-off-by: win5923 <[email protected]>

Future-Outlier · 2025-10-09T09:01:42Z

ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml

+  labels:
+    ray.io/scheduler-name: volcano
+    volcano.sh/queue-name: kuberay-test-queue


I found that volcano's default scheduler configmap is gang scheduling enabled!
so in the future, if the user want to disable it, we might need to tell them to edit the configmap or figure out some way to control it by adding more information in our CR.

thank you!!

apiVersion: v1 data: volcano-scheduler.conf: | actions: "enqueue, allocate, backfill" tiers: - plugins: - name: priority - name: gang enablePreemptable: false - name: conformance - plugins: - name: overcommit - name: drf enablePreemptable: false - name: predicates - name: proportion - name: nodeorder - name: binpack

Future-Outlier · 2025-10-09T09:26:38Z

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

+	if rayJob.Spec.SubmissionMode == rayv1.K8sJobMode || rayJob.Spec.SubmissionMode == rayv1.SidecarMode {
+		submitterTemplate := common.GetSubmitterTemplate(&rayJob.Spec, rayJob.Spec.RayClusterSpec)
+		submitterResource := utils.CalculatePodResource(submitterTemplate.Spec)
+		totalResourceList = append(totalResourceList, submitterResource)
+	}


I think the result for this code is correct, but the behavior is not.
for k8s mode, we should get submitter's information from submitterTemplate
for sidecar mode, we should get submitter's information from GetDefaultSubmitterContainer, since we use this function currently.

update:
I've discussed offline with @win5923
I am writing a commit to fix this!

Younikorn also uses GetSubmitterTemplate for sidecar mode. Would you like to update the Younikorn part as well (newTaskGroupsFromRayJobSpec)?

yes I can do it now

sidecar mode (only 2 items in task-groups) (the head pod resource = head container + submitter container)

k8s mode (3 items in task-groups (head pod, worker pod, and submmiter pod))

Hi, @rueian
I have tested the Yunikorn integration in both Kubernetes job mode and sidecar mode and can confirm that no code changes are required. The existing logic correctly handles both scenarios.

Kubernetes Job Mode

The function newTaskGroupsFromRayJobSpec is ultimately called by AddMetadataToChildResource. Within the RayJob controller, AddMetadataToChildResource is only invoked when the RayJob is configured for Kubernetes job mode, as seen in these two locations:

1st place:

kuberay/ray-operator/controllers/ray/rayjob_controller.go

Lines 582 to 606 in c6bafa3

func (r *RayJobReconciler) createK8sJobIfNeed(ctx context.Context, rayJobInstance *rayv1.RayJob, rayClusterInstance *rayv1.RayCluster) error {

logger := ctrl.LoggerFrom(ctx)

job := &batchv1.Job{}

namespacedName := common.RayJobK8sJobNamespacedName(rayJobInstance)

if err := r.Client.Get(ctx, namespacedName, job); err != nil {

if errors.IsNotFound(err) {

submitterTemplate, err := getSubmitterTemplate(rayJobInstance, rayClusterInstance)

if err != nil {

return err

}

if r.options.BatchSchedulerManager != nil {

if scheduler, err := r.options.BatchSchedulerManager.GetScheduler(); err == nil {

scheduler.AddMetadataToChildResource(ctx, rayJobInstance, &submitterTemplate, utils.RayNodeSubmitterGroupLabelValue)

} else {

return err

}

}

return r.createNewK8sJob(ctx, rayJobInstance, submitterTemplate)

}

return err

}

logger.Info("The submitter Kubernetes Job for RayJob already exists", "Kubernetes Job", job.Name)

return nil

}

2nd place:

kuberay/ray-operator/controllers/ray/rayjob_controller.go

Lines 949 to 956 in c6bafa3

if r.options.BatchSchedulerManager != nil && rayJobInstance.Spec.SubmissionMode == rayv1.K8sJobMode {

if scheduler, err := r.options.BatchSchedulerManager.GetScheduler(); err == nil {

// Group name is only used for individual pods to specify their task group ("headgroup", "worker-group-1", etc.).

// RayCluster contains multiple groups, so we pass an empty string.

scheduler.AddMetadataToChildResource(ctx, rayJobInstance, rayClusterInstance, "")

} else {

return nil, err

}

That's why it behaves correctly.

Because of this, the Yunikorn-specific logic is correctly applied only when the RayJob creates a Kubernetes Job, and it behaves as expected.

Sidecar Mode

In sidecar mode, the submitter container is added to the Ray head pod, which is part of the RayCluster specification. When the RayCluster controller reconciles the RayCluster custom resource, it calculates the task groups for the head and worker pods. At that point, the head pod correctly contains both the Ray head container and the submitter sidecar container, ensuring their resources are accounted for in the task group calculation, as handled by the logic here:

kuberay/ray-operator/controllers/ray/rayjob_controller.go

Lines 978 to 1016 in c6bafa3

func (r *RayJobReconciler) constructRayClusterForRayJob(rayJobInstance *rayv1.RayJob, rayClusterName string) (*rayv1.RayCluster, error) {

labels := make(map[string]string, len(rayJobInstance.Labels))

for key, value := range rayJobInstance.Labels {

labels[key] = value

}

labels[utils.RayOriginatedFromCRNameLabelKey] = rayJobInstance.Name

labels[utils.RayOriginatedFromCRDLabelKey] = utils.RayOriginatedFromCRDLabelValue(utils.RayJobCRD)

rayCluster := &rayv1.RayCluster{

ObjectMeta: metav1.ObjectMeta{

Labels: labels,

Annotations: rayJobInstance.Annotations,

Name: rayClusterName,

Namespace: rayJobInstance.Namespace,

},

Spec: *rayJobInstance.Spec.RayClusterSpec.DeepCopy(),

}

// Set the ownership in order to do the garbage collection by k8s.

if err := ctrl.SetControllerReference(rayJobInstance, rayCluster, r.Scheme); err != nil {

return nil, err

}

// Inject a submitter container into the head Pod in SidecarMode.

if rayJobInstance.Spec.SubmissionMode == rayv1.SidecarMode {

sidecar, err := getSubmitterContainer(rayJobInstance, rayCluster)

if err != nil {

return nil, err

}

rayCluster.Spec.HeadGroupSpec.Template.Spec.Containers = append(

rayCluster.Spec.HeadGroupSpec.Template.Spec.Containers, sidecar)

// In K8sJobMode, the submitter Job relies on the K8s Job backoffLimit API to restart if it fails.

// This mainly handles WebSocket connection failures caused by transient network issues.

// In SidecarMode, however, the submitter container shares the same network namespace as the Ray dashboard,

// so restarts are no longer needed.

rayCluster.Spec.HeadGroupSpec.Template.Spec.RestartPolicy = corev1.RestartPolicyNever

}

return rayCluster, nil

}

Signed-off-by: Future-Outlier <[email protected]>

rueian · 2025-10-09T11:08:38Z

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go

+			corev1.ResourceCPU:    submitterContainer.Resources.Requests[corev1.ResourceCPU],
+			corev1.ResourceMemory: submitterContainer.Resources.Requests[corev1.ResourceMemory],


Can we take all the resource types into account here? We'd better not assume there are only CPU and memory.

just fix it here, thank you!
cf6b48b

Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Rueian <[email protected]>

win5923 changed the title ~~[POC] RayJob Volcano Integration~~ RayJob Volcano Integration Aug 18, 2025

win5923 force-pushed the rayjob-volcano branch 8 times, most recently from bc3811c to d591688 Compare September 11, 2025 16:10

win5923 marked this pull request as ready for review September 11, 2025 16:23

win5923 requested review from MortalHappiness, andrewsykim, kevin85421 and rueian as code owners September 11, 2025 16:23

win5923 force-pushed the rayjob-volcano branch from cf98064 to f820ab8 Compare September 11, 2025 16:28

Future-Outlier mentioned this pull request Sep 17, 2025

[Feature] Support for Volcano Network Topology Aware Scheduling #3641

Closed

2 tasks

troychiu and others added 2 commits September 22, 2025 13:56

modify batch scheduler interface to support CRD other than RayCluster

6e4184a

Signed-off-by: Troy Chiu <[email protected]>

[Feature] RayJob Volcano integration

973fe82

Signed-off-by: win5923 <[email protected]>

win5923 marked this pull request as draft September 22, 2025 13:59

win5923 force-pushed the rayjob-volcano branch 7 times, most recently from 26af624 to ace94b2 Compare September 23, 2025 15:53

win5923 marked this pull request as ready for review September 23, 2025 15:54

win5923 force-pushed the rayjob-volcano branch 3 times, most recently from c10c53e to 9cd200c Compare September 23, 2025 16:37

win5923 force-pushed the rayjob-volcano branch from a96d3a4 to dd38daa Compare September 25, 2025 16:32

Remove empty ResourceList

a33a3b7

Signed-off-by: win5923 <[email protected]>

owenowenisme reviewed Sep 27, 2025

View reviewed changes

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go Outdated Show resolved Hide resolved

win5923 force-pushed the rayjob-volcano branch from f4c9d27 to a716f20 Compare September 27, 2025 15:11

Add K8sJobMode check to prevent YuniKorn from adding submitter pod an…

42479c2

…notations Signed-off-by: win5923 <[email protected]>

win5923 force-pushed the rayjob-volcano branch from a716f20 to 42479c2 Compare September 27, 2025 15:12

win5923 mentioned this pull request Oct 3, 2025

[Feature] Support Volcano Network Topology Aware Scheduling for kuberay #4105

Merged

4 tasks

troychiu reviewed Oct 3, 2025

View reviewed changes

Apply Troy's comments

597e57d

Signed-off-by: win5923 <[email protected]>

win5923 requested a review from troychiu October 4, 2025 17:46

troychiu approved these changes Oct 5, 2025

View reviewed changes

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go Outdated Show resolved Hide resolved

ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go Outdated Show resolved Hide resolved

Log more information

a64c011

Signed-off-by: win5923 <[email protected]>

win5923 force-pushed the rayjob-volcano branch from 00e5d6c to a64c011 Compare October 7, 2025 14:11

rueian requested a review from Copilot October 7, 2025 20:43

Copilot AI reviewed Oct 7, 2025

View reviewed changes

rueian reviewed Oct 7, 2025

View reviewed changes

Future-Outlier reviewed Oct 8, 2025

View reviewed changes

Future-Outlier requested changes Oct 8, 2025

View reviewed changes

Add sidecarmode

b8cb224

Signed-off-by: win5923 <[email protected]>

win5923 force-pushed the rayjob-volcano branch from ae2941f to b8cb224 Compare October 8, 2025 16:05

rueian approved these changes Oct 8, 2025

View reviewed changes

Future-Outlier reviewed Oct 9, 2025

View reviewed changes

fix

84d39f9

Signed-off-by: Future-Outlier <[email protected]>

Future-Outlier approved these changes Oct 9, 2025

View reviewed changes

rueian reviewed Oct 9, 2025

View reviewed changes

Update Rueian's advice

cf6b48b

Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Rueian <[email protected]>

rueian merged commit 362da3d into ray-project:master Oct 9, 2025
27 checks passed

win5923 mentioned this pull request Oct 30, 2025

[Docs][KubeRay] Add Volcano RayJob gang scheduling example ray-project/ray#58320

Merged

win5923 mentioned this pull request Nov 22, 2025

[Chore] Remove an unused variable in volcano scheduler #4223

Merged

4 tasks

-	// MinMember intentionally excludes the submitter pod to avoid a startup deadlock
-	// (submitter waits for cluster; gang would wait for submitter). We still add the
-	// submitter's resource requests into MinResources so capacity is reserved.
+	// MinMember intentionally excludes the submitter pod to avoid a startup deadlock.
+	// If the submitter pod were included in MinMember, the gang scheduler would wait for
+	// all pods—including the submitter—to be schedulable before scheduling any of them.
+	// This creates a circular dependency: the submitter pod waits for the cluster to be ready,
+	// but the cluster cannot be scheduled until the submitter is also schedulable. To avoid this,
+	// we exclude the submitter from MinMember, but still add its resource requests into MinResources
+	// so that capacity is reserved for it.

	func (y *YuniKornScheduler) isGangSchedulingEnabled(obj metav1.Object) bool {
	_, exist := obj.GetLabels()[utils.RayGangSchedulingEnabled]
	return exist
	}

	func (k KubeScheduler) isGangSchedulingEnabled(app rayv1.RayCluster) bool {
	_, exist := app.Labels[utils.RayGangSchedulingEnabled]
	return exist
	}

	func (r RayJobReconciler) createK8sJobIfNeed(ctx context.Context, rayJobInstance rayv1.RayJob, rayClusterInstance *rayv1.RayCluster) error {
	logger := ctrl.LoggerFrom(ctx)
	job := &batchv1.Job{}
	namespacedName := common.RayJobK8sJobNamespacedName(rayJobInstance)
	if err := r.Client.Get(ctx, namespacedName, job); err != nil {
	if errors.IsNotFound(err) {
	submitterTemplate, err := getSubmitterTemplate(rayJobInstance, rayClusterInstance)
	if err != nil {
	return err
	}
	if r.options.BatchSchedulerManager != nil {
	if scheduler, err := r.options.BatchSchedulerManager.GetScheduler(); err == nil {
	scheduler.AddMetadataToChildResource(ctx, rayJobInstance, &submitterTemplate, utils.RayNodeSubmitterGroupLabelValue)
	} else {
	return err
	}
	}
	return r.createNewK8sJob(ctx, rayJobInstance, submitterTemplate)
	}
	return err
	}

	logger.Info("The submitter Kubernetes Job for RayJob already exists", "Kubernetes Job", job.Name)
	return nil
	}

	if r.options.BatchSchedulerManager != nil && rayJobInstance.Spec.SubmissionMode == rayv1.K8sJobMode {
	if scheduler, err := r.options.BatchSchedulerManager.GetScheduler(); err == nil {
	// Group name is only used for individual pods to specify their task group ("headgroup", "worker-group-1", etc.).
	// RayCluster contains multiple groups, so we pass an empty string.
	scheduler.AddMetadataToChildResource(ctx, rayJobInstance, rayClusterInstance, "")
	} else {
	return nil, err
	}

	func (r RayJobReconciler) constructRayClusterForRayJob(rayJobInstance rayv1.RayJob, rayClusterName string) (*rayv1.RayCluster, error) {
	labels := make(map[string]string, len(rayJobInstance.Labels))
	for key, value := range rayJobInstance.Labels {
	labels[key] = value
	}
	labels[utils.RayOriginatedFromCRNameLabelKey] = rayJobInstance.Name
	labels[utils.RayOriginatedFromCRDLabelKey] = utils.RayOriginatedFromCRDLabelValue(utils.RayJobCRD)
	rayCluster := &rayv1.RayCluster{
	ObjectMeta: metav1.ObjectMeta{
	Labels: labels,
	Annotations: rayJobInstance.Annotations,
	Name: rayClusterName,
	Namespace: rayJobInstance.Namespace,
	},
	Spec: *rayJobInstance.Spec.RayClusterSpec.DeepCopy(),
	}

	// Set the ownership in order to do the garbage collection by k8s.
	if err := ctrl.SetControllerReference(rayJobInstance, rayCluster, r.Scheme); err != nil {
	return nil, err
	}

	// Inject a submitter container into the head Pod in SidecarMode.
	if rayJobInstance.Spec.SubmissionMode == rayv1.SidecarMode {
	sidecar, err := getSubmitterContainer(rayJobInstance, rayCluster)
	if err != nil {
	return nil, err
	}
	rayCluster.Spec.HeadGroupSpec.Template.Spec.Containers = append(
	rayCluster.Spec.HeadGroupSpec.Template.Spec.Containers, sidecar)
	// In K8sJobMode, the submitter Job relies on the K8s Job backoffLimit API to restart if it fails.
	// This mainly handles WebSocket connection failures caused by transient network issues.
	// In SidecarMode, however, the submitter container shares the same network namespace as the Ray dashboard,
	// so restarts are no longer needed.
	rayCluster.Spec.HeadGroupSpec.Template.Spec.RestartPolicy = corev1.RestartPolicyNever
	}

	return rayCluster, nil
	}

		corev1.ResourceCPU: submitterContainer.Resources.Requests[corev1.ResourceCPU],
		corev1.ResourceMemory: submitterContainer.Resources.Requests[corev1.ResourceMemory],

RayJob Volcano Integration #3972

RayJob Volcano Integration #3972

Uh oh!

Conversation

win5923 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

E2E

PodGroup

Queue

Testing RayJob HTTPMode

Related issue number

Checks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

troychiu commented Oct 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Future-Outlier Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Future-Outlier left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

win5923 Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Future-Outlier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sidecar mode (only 2 items in task-groups) (the head pod resource = head container + submitter container)

k8s mode (3 items in task-groups (head pod, worker pod, and submmiter pod))

Uh oh!

Future-Outlier Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

win5923 commented Aug 18, 2025 •

edited

Loading

Future-Outlier Oct 8, 2025 •

edited

Loading

Future-Outlier left a comment •

edited

Loading

win5923 Oct 8, 2025 •

edited

Loading

Future-Outlier Oct 9, 2025 •

edited

Loading