-
Notifications
You must be signed in to change notification settings - Fork 666
[Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy #4185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
710166a to
d261b0b
Compare
|
Hi @andrewsykim, I followed you previous comments to adding a spec.upgradeStrategy API to RayCluster. But for now. I'm concerned this approach may introduce some issues:
Maybe we can just add a feature gate instead of using spec.upgradeStrategy.type field in RayCluster to enable the recreate behavior. WDYT? |
Feature gates are used to gate features that are in early development and not ready for wider adoption, it shouldn't be used to change the behavior of RayCluster because it will eventually be on by default (and forced on). |
|
I think both of those concerns are valid, but I don't think this is a problem with separation of concerns as RayCluster is a building block for both RayService and RayJob. For those cases you mentioned, we should have validation to ensure RayCluster upgrade strategy cannot be set when used w/ RayJob and RayService |
05b8108 to
7109cf1
Compare
3d448e6 to
8bcce91
Compare
Signed-off-by: win5923 <[email protected]>
8bcce91 to
bf87764
Compare
c9d35b2 to
8d4c813
Compare
08fe6c5 to
f3a9b81
Compare
Updated: I think we don't need to check whether the KubeRayVersion is same or not, cause we only compare Check #4185 (comment) |
a59d6f9 to
5f3afb3
Compare
f2e53cf to
9c06b5f
Compare
| for _, pod := range allPods.Items { | ||
| podVersion := pod.Annotations[utils.KubeRayVersion] | ||
| if podVersion != "" && podVersion != utils.KUBERAY_VERSION { | ||
| logger.Info("Pods have different KubeRay version, updating pod annotations", | ||
| "pod", pod.Name, | ||
| "podVersion", podVersion, | ||
| "currentVersion", utils.KUBERAY_VERSION) | ||
| if err := r.updatePodsAnnotations(ctx, instance, &allPods); err != nil { | ||
| logger.Error(err, "Failed to update pod annotations for KubeRay version change") | ||
| } | ||
| return false | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite understand this part. When the KubeRayVersion updated, we will update the pods' kuberay version annotation, then what will happen? It seems like we do not have extra handling when kuberay version update. If so, should we just remove it? Then I think we do not need updatePodsAnnotations and calculatePodTemplateHash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thought was to follow the approach used in RayService:
when the KubeRay Operator is upgraded, the reconciliation logic entering shouldRecreatePodsForUpgrade would first check whether the KubeRayVersion has changed. If it differed, we would compute a new pod template hash and update both ray.io/pod-template-hash and ray.io/kuberay-version. So It didn't happen anything in the next reconcilation.
But after revisiting it, I think we only compare the HeadGroupSpec.Template and WorkerGroupSpecs.Template, which are just corev1.PodTemplateSpec. Also we do not compare any fields modified by the KubeRay Operator, because the hash is generated from the DefaultHeadPodTemplate and DefaultWorkerPodTemplate at the very beginning.
I think this comparison is independent of the KubeRay Operator version, but related to k8s upgrade.
During a Kubernetes upgrade, nodes are typically drained, which means Pods will be recreated anyway. Therefore, we don’t need to explicitly handle or compare Kubernetes API–level changes in this logic.
If there’s anything I haven’t considered or have misunderstood, please let me know.
cc @andrewsykim @Future-Outlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG to me!
06274ce to
acd7739
Compare
Signed-off-by: win5923 <[email protected]>
Signed-off-by: win5923 <[email protected]>
Signed-off-by: win5923 <[email protected]>
Signed-off-by: win5923 <[email protected]>
Signed-off-by: win5923 <[email protected]>
Co-authored-by: Nary Yeh <[email protected]> Signed-off-by: Jun-Hao Wan <[email protected]>
…r pods" This reverts commit 5f3afb3.
Signed-off-by: win5923 <[email protected]>
Signed-off-by: win5923 <[email protected]>
|
I'm thinking of a case when user do:
How will this situation being handled? Are we 1) deleting all pods of config A and create pod for config B, then 2) delete all pods of config B and create pod for config C? And what if user switch from state A -> B, then change from B -> A again? Could you please confirm how we will handle those 2 cases? Thanks! |
| if err != nil { | ||
| return false | ||
| } | ||
| return newHeadPod.Name != initialHeadPodName && newHeadPod.Status.Phase == corev1.PodRunning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check the hash is different from the original one?
|
|
||
| // Wait for cluster to become ready | ||
| LogWithTimestamp(test.T(), "Waiting for RayCluster %s/%s to become ready again", rayCluster.Namespace, rayCluster.Name) | ||
| g.Eventually(RayCluster(test, namespace.Name, rayCluster.Name), TestTimeoutMedium). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can change the order for check result to:
- check if the cluster becomes ready
- check fields in the new head pod (your
g.Eventuallyfor head pod above) - add one for checking fields in the worker pods
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the RayCluster status check because once the RayCluster status transitions to rayv1.Ready, it will not be updated again, unless the cluster is suspended.
kuberay/ray-operator/controllers/ray/raycluster_controller.go
Lines 1516 to 1522 in bd0d46b
| if reconcileErr == nil && len(runtimePods.Items) == int(newInstance.Status.DesiredWorkerReplicas)+1 { // workers + 1 head | |
| if utils.CheckAllPodsRunning(ctx, runtimePods) { | |
| newInstance.Status.State = rayv1.Ready | |
| newInstance.Status.Reason = "" | |
| } | |
| } | |
Yes, that is the current behavior. Whenever the RayCluster pod template hash differs from the value initially stored in the
It will Recreate all pods from A -> B, and then Recreate all pods from B -> A. |
Co-authored-by: Nary Yeh <[email protected]> Signed-off-by: Jun-Hao Wan <[email protected]>
Signed-off-by: win5923 <[email protected]>
4b6a527 to
b6135f4
Compare
Signed-off-by: win5923 <[email protected]>
b6135f4 to
75dbd2a
Compare
Why are these changes needed?
Currently, when users update a RayCluster spec (e.g., update the image), users must re-create the cluster or manually set spec.suspend to true and after all Pods are deleted and then set it back to false which is not particularly convenient for users deploying with GitOps systems like ArgoCD.
Ref:
Design doc: https://docs.google.com/document/d/1xQLm0-WQWD-FkufxBJYklOJGvVn4RLk0_vPjLD5ax7o/edit?usp=sharing
Changes
spec.upgradeStrategyfield to RayCluster CRDRecreate: During upgrade, Recreate strategy will delete all existing pods before creating new ones.None: No new pod will be created while the strategy is set to NoneImplementation
HeadGroupSpec.Templateto head pod andworkerGroup.Templateto worker pod annotations during creation withray.io/pod-template-hashI only compare the
HeadGroupSpec.TemplateandworkerGroup.Templatebecause these define the pod-related configurations. TheRayCluster.Speccontains many dynamic and component-specific settings (e.g., autoscaler configs, rayStartParams).Example:
Related issue number
Closes #2534 #3905
Checks