Skip to content

Conversation

@owenowenisme
Copy link
Member

@owenowenisme owenowenisme commented Aug 14, 2025

Why are these changes needed?

This PR integrates RayJob with Apache Yunikorn as a batch scheduler, enabling RayJob to leverage Yunikorn’s features such as gang-scheduling. This allows both the submitter pod and RayCluster to be included in gang-scheduling.

Special thanks to the original contribution from @troychiu! 🙏 (#3379)

In this PR:

  • Added batchSchedulerManager to RayJobReconcileOption in main.go
  • Modified the RayJob controller to add essential labels and annotations to both RayCluster and the submitter pod
  • Added ray-job.yunikorn-scheduler.yaml as a demonstration script for gang-scheduling with RayJob
  • Added unit tests for new functions in yunikorn_scheduler.go

Validation

  1. Set the Yunikorn queue to 6Gi memory and 4 cores, same as for RayCluster.
    image

  2. Deploy the KubeRay operator with the batch-scheduler set to Yunikorn:

    ./bin/manager -leader-election-namespace default -use-kubernetes-proxy -batch-scheduler=yunikorn
    
  3. Deploy a sample RayJob:

    k apply -f config/samples/ray-job.yunikorn-scheduler.yaml
    

    Each RayJob consists of:

    • 1 Head pod (CPU: 1, Memory: 2Gi)
    • 1 Worker pod (CPU: 1, Memory: 2Gi)
    • 1 Submitter pod (CPU: 500m, Memory: 200Mi)

    Total: 2.5 CPU and 4.2Gi memory. Therefore, our queue cannot fit two RayJobs at the same time.

  4. Deploy another RayJob using the same script but change the RayJob name and app ID from job-0 to job-1.
    We can see that the resources for RayJob1 (including the submitter pod from job-1) are being held by Yunikorn.

Note that even though the submitter pod of ‎job-1 has sufficient resources for scheduling, gang-scheduling requires that all pods be scheduled together.
image

  1. Delete RayJob-0.
    After deleting RayJob-0, the resources should be sufficient for RayJob-1, so RayJob-1 is now up and running.
    image

Testing compatibility with RayJob HTTPMode

To make sure RayJob Yunikorn works well with HTTPMode submission, I did the following steps:

  1. Use the same sample yaml but set the submissionMode to HTTPMode
apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-yunikorn-0
  labels:
    ray.io/gang-scheduling-enabled: "true"
    yunikorn.apache.org/app-id: rayjob-yunikorn-0
    yunikorn.apache.org/queue: root.test
spec:
  # submissionMode specifies how RayJob submits the Ray job to the RayCluster.
  # The default value is "K8sJobMode", meaning RayJob will submit the Ray job via a submitter Kubernetes Job.
  # The alternative value is "HTTPMode", indicating that KubeRay will submit the Ray job by sending an HTTP request to the RayCluster.
  submissionMode: "HTTPMode"
  entrypoint: python /home/ray/samples/sample_code.py
  1. Check the task group annotation, the submitter group shouldn't exist, since we use HTTP request instead of submitter pod to submit job.
yunikorn.apache.org/task-groups: '[{"minResource":{"cpu":"1","memory":"2Gi"},"name":"headgroup","minMember":1},{"minResource":{"cpu":"1","memory":"2Gi"},"name":"small-group","minMember":1}]'
image
  1. Check the job result and log
    I enter the shell of head pod and query the job using ray job list and ray job logs JobID, it should be successful with the correct log.
image

Related issue number

Closes #3284

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@owenowenisme owenowenisme force-pushed the rayjob-yunikorn-integration branch 2 times, most recently from ed2f0b2 to 6c0d860 Compare August 17, 2025 08:23
@owenowenisme owenowenisme force-pushed the rayjob-yunikorn-integration branch 2 times, most recently from a091f5e to ebf0cb3 Compare August 18, 2025 04:03
Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
@owenowenisme owenowenisme marked this pull request as ready for review August 20, 2025 07:05
@owenowenisme owenowenisme requested a review from win5923 August 20, 2025 07:07
Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
@owenowenisme owenowenisme force-pushed the rayjob-yunikorn-integration branch from 926225e to 4eb38e5 Compare August 20, 2025 10:16
Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
owenowenisme and others added 2 commits August 21, 2025 23:49
Co-authored-by: Jun-Hao Wan <[email protected]>
Signed-off-by: Owen Lin (You-Cheng Lin) <[email protected]>
Co-authored-by: Jun-Hao Wan <[email protected]>
Signed-off-by: Owen Lin (You-Cheng Lin) <[email protected]>
Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
name: rayjob-yunikorn-scheduler-0
labels:
ray.io/gang-scheduling-enabled: "true"
yunikorn.apache.org/app-id: test-yunikorn-job-0
Copy link
Collaborator

@win5923 win5923 Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to set yunikorn.apache.org/app-id and RayJob.Name to the same name, so that when renaming, the app-id is updated together. This makes it easier for users to understand when a newly created RayJob is stuck in the Accepted state but not running yet.

image

Ref: https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/yunikorn.html#step-4-use-apache-yunikorn-for-gang-scheduling

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just updated.

return ctrl.Result{RequeueAfter: RayJobDefaultRequeueDuration}, err
}
} else {
return ctrl.Result{RequeueAfter: RayJobDefaultRequeueDuration}, err
Copy link
Collaborator

@win5923 win5923 Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: GetScheduler always returns nil. For consistency, we can change it to return an error like in other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice advice, we could open a follow-up PR after this one, since RayCluster controller need to be modified as well.

Copy link
Collaborator

@win5923 win5923 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, Thanks for your hard work! I’ll take another look at the test part.

cc @troychiu for review

…scheduler.go

Co-authored-by: Jun-Hao Wan <[email protected]>
Signed-off-by: Owen Lin (You-Cheng Lin) <[email protected]>
Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works on my laptop

image

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks good to me, thank you

Comment on lines 75 to 80
var submitterGroupSpec corev1.PodSpec
if rayJobSpec.SubmitterPodTemplate != nil {
submitterGroupSpec = rayJobSpec.SubmitterPodTemplate.Spec
} else {
submitterGroupSpec = common.GetDefaultSubmitterTemplate(rayJobSpec.RayClusterSpec).Spec
}
Copy link
Collaborator

@win5923 win5923 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check whether RayJobSpec.RayClusterSpec is nil before calling newTaskGroupsFromRayJobSpec, otherwise it can cause a nil pointer panic when RayJob uses an existing cluster with gang-scheduling enabled.

Alternatively, we could add this check in rayjob_controller.go, just prevent user use both batch scheduler and clusterSelector.

example like:

apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-use-existing-raycluster
  labels:
    ray.io/gang-scheduling-enabled: "true"
    yunikorn.apache.org/app-id: test-yunikorn-0
    yunikorn.apache.org/queue: root.test
spec:
  entrypoint: python -c "import ray; ray.init(); print(ray.cluster_resources())"
  clusterSelector:
    ray.io/cluster: raycluster-kuberay

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
But why would the user enable gang-scheduling with clusterSelector?
The RayJob will only have one submitter pod when using clusterSelector which will never need gang-scheduling IIUC?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, this wouldn’t happen, but we want to prevent users from actually setting it this way, because with the current logic it would cause a panic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid using gang scheduling + cluster selector, maybe we have to add this to validation rule in rayjob.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a follow-up pr

@Future-Outlier
Copy link
Member

cc @owenowenisme for solving the resolve conflict

@owenowenisme owenowenisme force-pushed the rayjob-yunikorn-integration branch from 845f92a to 0646149 Compare September 17, 2025 02:51
Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for solving the merge conflict

Comment on lines 47 to 48
// TODO: remove the legacy labels, i.e "applicationId" and "queue", directly populate labels
// RayClusterApplicationIDLabelName to RayClusterQueueLabelName to pod labels.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep these comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, added them back. Thanks!

// propagateTaskGroupsAnnotation is a helper function that propagates the task groups annotation to the child
// if the parent has the task groups annotation, it will be copied to the child
// if the parent doesn't have the task groups annotation, a new one will be created
// TODO: remove the legacy labels, i.e "applicationId" and "queue", directly populate labels
// RayApplicationIDLabelName and RayApplicationQueueLabelName to pod labels.
// Currently we use this function to translate labels "yunikorn.apache.org/app-id" and "yunikorn.apache.org/queue"
// to legacy labels "applicationId" and "queue", this is for the better compatibilities to support older yunikorn
// versions.

Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
if rayJobSpec.SubmitterPodTemplate != nil {
submitterGroupSpec = rayJobSpec.SubmitterPodTemplate.Spec
} else {
submitterGroupSpec = common.GetDefaultSubmitterTemplate(rayJobSpec.RayClusterSpec).Spec
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks like from

submitterTemplate = common.GetDefaultSubmitterTemplate(&rayClusterInstance.Spec)

Is it possible to use the submitter template passed in so that we don't have duplicate logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about create a new public function?

// old one
func GetDefaultSubmitterTemplate(rayClusterSpec *rayv1.RayClusterSpec) corev1.PodTemplateSpec {
	return corev1.PodTemplateSpec{
		Spec: corev1.PodSpec{
			Containers: []corev1.Container{
				GetDefaultSubmitterContainer(rayClusterSpec),
			},
			RestartPolicy: corev1.RestartPolicyNever,
		},
	}
}

// new one
func GetDefaultSubmitterTemplate(rayClusterSpec *rayv1.RayClusterSpec) corev1.PodTemplateSpec {
	return corev1.PodTemplateSpec{
		Spec: GetSubmitterPodSpec(rayClusterSpec)
	}
}

func GetSubmitterPodSpec(rayClusterSpec *rayv1.RayClusterSpec) corev1.PodSpec {
	return corev1.PodSpec{
				Containers: []corev1.Container{
					GetDefaultSubmitterContainer(rayClusterSpec),
				},
				RestartPolicy: corev1.RestartPolicyNever,
			}
}

In this case, we can call GetSubmitterPodSpec directly.

@owenowenisme owenowenisme force-pushed the rayjob-yunikorn-integration branch 2 times, most recently from 3503dca to 99b2d5e Compare September 18, 2025 09:55
@owenowenisme owenowenisme force-pushed the rayjob-yunikorn-integration branch from 99b2d5e to 906ab83 Compare September 18, 2025 10:43
Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @troychiu for the final review!

Copy link
Collaborator

@troychiu troychiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you! cc @rueian

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this again, this can be merged IMO.

image

@rueian rueian merged commit c9fa013 into ray-project:master Sep 22, 2025
27 checks passed
edoakes pushed a commit to ray-project/ray that referenced this pull request Nov 3, 2025
Starting with KubeRay 1.5.0, KubeRay supports gang scheduling for RayJob
custom resources.
Just add a mention for Yunikorn scheduler.

Related to ray-project/kuberay#3948.

Signed-off-by: win5923 <[email protected]>
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
…ect#58375)

Starting with KubeRay 1.5.0, KubeRay supports gang scheduling for RayJob
custom resources.
Just add a mention for Yunikorn scheduler.

Related to ray-project/kuberay#3948.

Signed-off-by: win5923 <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…ect#58375)

Starting with KubeRay 1.5.0, KubeRay supports gang scheduling for RayJob
custom resources.
Just add a mention for Yunikorn scheduler.

Related to ray-project/kuberay#3948.

Signed-off-by: win5923 <[email protected]>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ect#58375)

Starting with KubeRay 1.5.0, KubeRay supports gang scheduling for RayJob
custom resources.
Just add a mention for Yunikorn scheduler.

Related to ray-project/kuberay#3948.

Signed-off-by: win5923 <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ect#58375)

Starting with KubeRay 1.5.0, KubeRay supports gang scheduling for RayJob
custom resources.
Just add a mention for Yunikorn scheduler.

Related to ray-project/kuberay#3948.

Signed-off-by: win5923 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][RayJob] Integrate YuniKorn with RayJob

5 participants