[Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy #4185

win5923 · 2025-11-10T16:23:51Z

Why are these changes needed?

Currently, when users update a RayCluster spec (e.g., update the image), users must re-create the cluster or manually set spec.suspend to true and after all Pods are deleted and then set it back to false which is not particularly convenient for users deploying with GitOps systems like ArgoCD.

Ref:

[Feature] Identify and apply changes on ray-cluster #2534
[Feature] Add ability to modify the image of a worker group in RayCluster with rolling upgrade or restart #3905

Design doc: https://docs.google.com/document/d/1xQLm0-WQWD-FkufxBJYklOJGvVn4RLk0_vPjLD5ax7o/edit?usp=sharing

Changes

Add spec.upgradeStrategy field to RayCluster CRD
Supports two values:
- Recreate: During upgrade, Recreate strategy will delete all existing pods before creating new ones.
- None: No new pod will be created while the strategy is set to None

Implementation

Store hash of HeadGroupSpec.Template to head pod and workerGroup.Template to worker pod annotations during creation with ray.io/pod-template-hash
Compare stored hash with current head pod or worker pod template hash to detect changes and recreate all pods

I only compare the HeadGroupSpec.Template and workerGroup.Template because these define the pod-related configurations. The RayCluster.Spec contains many dynamic and component-specific settings (e.g., autoscaler configs, rayStartParams).

Example:

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: raycluster-kuberay
spec:
  upgradeStrategy:
    type: Recreate
  rayVersion: '2.48.0'

Related issue number

Closes #2534 #3905

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

win5923 · 2025-11-10T17:30:23Z

Hi @andrewsykim, I followed you previous comments to adding a spec.upgradeStrategy API to RayCluster. But for now. I'm concerned this approach may introduce some issues:

Confusion with existing API: We already have upgradeStrategy for RayService. Adding another upgradeStrategy to RayCluster could be confusing for users and creates unclear separation of concerns.
Breaking RayJob workflows: For RayJob, setting upgradeStrategy=Recreate on the RayCluster would cause pod recreation during job execution, leading to job interruption and loss of running jobs.

Maybe we can just add a feature gate instead of using spec.upgradeStrategy.type field in RayCluster to enable the recreate behavior. WDYT?

andrewsykim · 2025-11-13T22:27:48Z

Maybe we can just add a feature gate instead of using spec.upgradeStrategy.type field in RayCluster to enable the recreate behavior. WDYT?

Feature gates are used to gate features that are in early development and not ready for wider adoption, it shouldn't be used to change the behavior of RayCluster because it will eventually be on by default (and forced on).

andrewsykim · 2025-11-13T22:29:28Z

I think both of those concerns are valid, but I don't think this is a problem with separation of concerns as RayCluster is a building block for both RayService and RayJob. For those cases you mentioned, we should have validation to ensure RayCluster upgrade strategy cannot be set when used w/ RayJob and RayService

Signed-off-by: win5923 <[email protected]>

ray-operator/apis/ray/v1/raycluster_types.go

win5923 · 2025-12-07T07:05:54Z

Something to be mindful of with this strategy is that adding new fields or slightly changing some property of the template (e.g. changing default entrypoint), could trigger an unintended recreate when a user upgrades KubeRay. I think we ran into something similar when dealing with upgrade behavior in RayService, see #2320 for reference.

Thanks for pointing this out. I’ll add the ray.io/kuberay-version annotation to both the head and worker pods as well, so we can detect KubeRay version changes and avoid unintended recreates. I really appreciate your comment.

Updated: I think we don't need to check whether the KubeRayVersion is same or not, cause we only compare HeadGroupSpec.Template and WorkerGroupSpecs.Template, which are just corev1.PodTemplateSpec. This comparison is independent of the KubeRay Operator version.

Check #4185 (comment)

ray-operator/controllers/ray/common/pod.go

machichima · 2025-12-10T11:38:30Z

ray-operator/controllers/ray/raycluster_controller.go

+	for _, pod := range allPods.Items {
+		podVersion := pod.Annotations[utils.KubeRayVersion]
+		if podVersion != "" && podVersion != utils.KUBERAY_VERSION {
+			logger.Info("Pods have different KubeRay version, updating pod annotations",
+				"pod", pod.Name,
+				"podVersion", podVersion,
+				"currentVersion", utils.KUBERAY_VERSION)
+			if err := r.updatePodsAnnotations(ctx, instance, &allPods); err != nil {
+				logger.Error(err, "Failed to update pod annotations for KubeRay version change")
+			}
+			return false
+		}
+	}


Not quite understand this part. When the KubeRayVersion updated, we will update the pods' kuberay version annotation, then what will happen? It seems like we do not have extra handling when kuberay version update. If so, should we just remove it? Then I think we do not need updatePodsAnnotations and calculatePodTemplateHash.

My initial thought was to follow the approach used in RayService:
when the KubeRay Operator is upgraded, the reconciliation logic entering shouldRecreatePodsForUpgrade would first check whether the KubeRayVersion has changed. If it differed, we would compute a new pod template hash and update both ray.io/pod-template-hash and ray.io/kuberay-version. So It didn't happen anything in the next reconcilation.

But after revisiting it, I think we only compare the HeadGroupSpec.Template and WorkerGroupSpecs.Template, which are just corev1.PodTemplateSpec. Also we do not compare any fields modified by the KubeRay Operator, because the hash is generated from the DefaultHeadPodTemplate and DefaultWorkerPodTemplate at the very beginning.

I think this comparison is independent of the KubeRay Operator version, but related to k8s upgrade.
During a Kubernetes upgrade, nodes are typically drained, which means Pods will be recreated anyway. Therefore, we don’t need to explicitly handle or compare Kubernetes API–level changes in this logic.

If there’s anything I haven’t considered or have misunderstood, please let me know.
cc @andrewsykim @Future-Outlier

ray-operator/controllers/ray/raycluster_controller.go

Signed-off-by: win5923 <[email protected]>

Co-authored-by: Nary Yeh <[email protected]> Signed-off-by: Jun-Hao Wan <[email protected]>

…r pods" This reverts commit 5f3afb3.

Signed-off-by: win5923 <[email protected]>

machichima · 2025-12-14T01:52:10Z

I'm thinking of a case when user do:

update the raycluster to new settings (from state A -> B)
during deletion, change to other settings (from state B -> C)

How will this situation being handled? Are we 1) deleting all pods of config A and create pod for config B, then 2) delete all pods of config B and create pod for config C?

And what if user switch from state A -> B, then change from B -> A again?

Could you please confirm how we will handle those 2 cases? Thanks!

ray-operator/controllers/ray/raycluster_controller.go

ray-operator/controllers/ray/utils/validation.go

machichima · 2025-12-14T02:12:35Z

ray-operator/test/e2e/raycluster_test.go

+		if err != nil {
+			return false
+		}
+		return newHeadPod.Name != initialHeadPodName && newHeadPod.Status.Phase == corev1.PodRunning


Can we also check the hash is different from the original one?

machichima · 2025-12-14T02:14:15Z

ray-operator/test/e2e/raycluster_test.go

+
+	// Wait for cluster to become ready
+	LogWithTimestamp(test.T(), "Waiting for RayCluster %s/%s to become ready again", rayCluster.Namespace, rayCluster.Name)
+	g.Eventually(RayCluster(test, namespace.Name, rayCluster.Name), TestTimeoutMedium).


I think we can change the order for check result to:

check if the cluster becomes ready

check fields in the new head pod (your g.Eventually for head pod above)

add one for checking fields in the worker pods

WDYT?

I removed the RayCluster status check because once the RayCluster status transitions to rayv1.Ready, it will not be updated again, unless the cluster is suspended.

75dbd2a

kuberay/ray-operator/controllers/ray/raycluster_controller.go

Lines 1516 to 1522 in bd0d46b

if reconcileErr == nil && len(runtimePods.Items) == int(newInstance.Status.DesiredWorkerReplicas)+1 { // workers + 1 head

if utils.CheckAllPodsRunning(ctx, runtimePods) {

newInstance.Status.State = rayv1.Ready

newInstance.Status.Reason = ""

}

}

win5923 · 2025-12-14T05:56:56Z

I'm thinking of a case when user do:

update the raycluster to new settings (from state A -> B)

during deletion, change to other settings (from state B -> C)

How will this situation being handled? Are we 1) deleting all pods of config A and create pod for config B, then 2) delete all pods of config B and create pod for config C?

Yes, that is the current behavior. Whenever the RayCluster pod template hash differs from the value initially stored in the ray.io/pod-template-hash annotation, we will recreate the Pod to ensure it matches the updated template.

And what if user switch from state A -> B, then change from B -> A again?

Could you please confirm how we will handle those 2 cases? Thanks!

It will Recreate all pods from A -> B, and then Recreate all pods from B -> A.

Co-authored-by: Nary Yeh <[email protected]> Signed-off-by: Jun-Hao Wan <[email protected]>

Signed-off-by: win5923 <[email protected]>

win5923 requested review from MortalHappiness, andrewsykim, kevin85421 and rueian as code owners November 10, 2025 16:23

win5923 marked this pull request as draft November 10, 2025 16:24

win5923 force-pushed the raycluster-upgradeStrategy branch 6 times, most recently from 710166a to d261b0b Compare November 10, 2025 17:11

win5923 changed the title ~~[draft] Support recreate pods for RayCluster using RayClusterSpec~~ [draft] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy Nov 10, 2025

win5923 force-pushed the raycluster-upgradeStrategy branch 7 times, most recently from 05b8108 to 7109cf1 Compare November 19, 2025 17:27

win5923 changed the title ~~[draft] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy~~ [Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy Nov 19, 2025

win5923 force-pushed the raycluster-upgradeStrategy branch 2 times, most recently from 3d448e6 to 8bcce91 Compare November 19, 2025 18:26

[Feature] Support recreate pods for RayCluster using RayClusterSpec

bf87764

Signed-off-by: win5923 <[email protected]>

win5923 force-pushed the raycluster-upgradeStrategy branch from 8bcce91 to bf87764 Compare November 19, 2025 18:28

win5923 marked this pull request as ready for review November 19, 2025 18:30

win5923 force-pushed the raycluster-upgradeStrategy branch 2 times, most recently from c9d35b2 to 8d4c813 Compare November 20, 2025 17:03

seanlaii reviewed Dec 6, 2025

View reviewed changes

ray-operator/apis/ray/v1/raycluster_types.go Outdated Show resolved Hide resolved

win5923 force-pushed the raycluster-upgradeStrategy branch from 08fe6c5 to f3a9b81 Compare December 7, 2025 06:55

win5923 force-pushed the raycluster-upgradeStrategy branch from a59d6f9 to 5f3afb3 Compare December 7, 2025 15:10

CheyuWu self-requested a review December 7, 2025 16:42

machichima reviewed Dec 8, 2025

View reviewed changes

ray-operator/controllers/ray/common/pod.go Outdated Show resolved Hide resolved

win5923 force-pushed the raycluster-upgradeStrategy branch from f2e53cf to 9c06b5f Compare December 8, 2025 13:14

machichima reviewed Dec 10, 2025

View reviewed changes

win5923 force-pushed the raycluster-upgradeStrategy branch 2 times, most recently from 06274ce to acd7739 Compare December 13, 2025 02:11

win5923 and others added 10 commits December 13, 2025 02:12

Remove deepcopy in GeneratePodTemplateHash

f8102f3

Signed-off-by: win5923 <[email protected]>

Refactor ValidateRayClusterUpgradeOptions

c93e25b

Signed-off-by: win5923 <[email protected]>

add kubebuilder:validation

9b622e5

Signed-off-by: win5923 <[email protected]>

Rename the RayServiceUpgradeType and RayClusterUpgradeType constants

5ec2fff

Signed-off-by: win5923 <[email protected]>

add ray.io/kuberay-version annotations for head pod and worker pods

4c724b5

Signed-off-by: win5923 <[email protected]>

Update ray-operator/controllers/ray/common/pod.go

d159038

Co-authored-by: Nary Yeh <[email protected]> Signed-off-by: Jun-Hao Wan <[email protected]>

Revert "add ray.io/kuberay-version annotations for head pod and worke…

9661177

…r pods" This reverts commit 5f3afb3.

add rayClusterScaleExpectation.Delete for deleteAllPods

1923c25

Signed-off-by: win5923 <[email protected]>

Apply suggestions

b829245

Signed-off-by: win5923 <[email protected]>

Merge branch 'upstream-master' into raycluster-upgradeStrategy

6ac79fc

machichima reviewed Dec 14, 2025

View reviewed changes

ray-operator/controllers/ray/raycluster_controller.go Outdated Show resolved Hide resolved

machichima reviewed Dec 14, 2025

View reviewed changes

win5923 and others added 2 commits December 14, 2025 15:24

Update ray-operator/controllers/ray/utils/validation.go

907ebe2

Co-authored-by: Nary Yeh <[email protected]> Signed-off-by: Jun-Hao Wan <[email protected]>

Use ExpectScalePod instead of rayClusterScaleExpectation.Delete

fca023c

Signed-off-by: win5923 <[email protected]>

win5923 force-pushed the raycluster-upgradeStrategy branch from 4b6a527 to b6135f4 Compare December 14, 2025 09:08

check the hash is different for TestRayClusterUpgradeStrategy e2e test

75dbd2a

Signed-off-by: win5923 <[email protected]>

win5923 force-pushed the raycluster-upgradeStrategy branch from b6135f4 to 75dbd2a Compare December 14, 2025 09:38

Future-Outlier self-assigned this Dec 15, 2025

	if reconcileErr == nil && len(runtimePods.Items) == int(newInstance.Status.DesiredWorkerReplicas)+1 { // workers + 1 head
	if utils.CheckAllPodsRunning(ctx, runtimePods) {
	newInstance.Status.State = rayv1.Ready
	newInstance.Status.Reason = ""
	}
	}

[Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy #4185

Are you sure you want to change the base?

[Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy #4185

Uh oh!

Conversation

win5923 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Changes

Implementation

Example:

Related issue number

Checks

Uh oh!

win5923 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewsykim commented Nov 13, 2025

Uh oh!

andrewsykim commented Nov 13, 2025

Uh oh!

Uh oh!

win5923 commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

machichima Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

win5923 Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

machichima Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

machichima commented Dec 14, 2025

Uh oh!

Uh oh!

Uh oh!

machichima Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

machichima Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

win5923 Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

win5923 commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

win5923 commented Nov 10, 2025 •

edited

Loading

win5923 commented Nov 10, 2025 •

edited

Loading

win5923 commented Dec 7, 2025 •

edited

Loading

win5923 Dec 10, 2025 •

edited

Loading

win5923 commented Dec 14, 2025 •

edited

Loading