Skip to content

[RayCluster] Improved the efficiency when checking rayclusters' expectations#4209

Merged
rueian merged 1 commit intoray-project:masterfrom
antgroup:raycluster-scale-expectation-fail-fast
Dec 21, 2025
Merged

[RayCluster] Improved the efficiency when checking rayclusters' expectations#4209
rueian merged 1 commit intoray-project:masterfrom
antgroup:raycluster-scale-expectation-fail-fast

Conversation

@harryge00
Copy link
Contributor

@harryge00 harryge00 commented Nov 19, 2025

Why are these changes needed?

The current implementation of IsSatisfied in scale_expectation.go , will iterate through all pods in itemsCache. This is not necessary. When one pod fails to meet the expectation, we can return directly. fast-fail means return earlier once a pod's scaling result is not as expected.

Current code:

	for i := range items {
		rp := items[i].(*rayPod)
		pod := &corev1.Pod{}
		isPodSatisfied := false
		// check if pod is scaled correctly
                ...
		if isPodSatisfied {
			if err := r.itemsCache.Delete(items[i]); err != nil {
				panic(err)
			}
		} else {
			// Here we can return directly. No need to continue.
			// It is inefficient especially when
			// we have created lots of pods and few of them are synced from kube-apiserver
			isSatisfied = false
		}
	}

For example, we may create 10 pods, and expect to successfully get the 10 pods. If none of the pods is ready, no need to iterate through the 10 pods every time. Just find one pod is missing and then break.

I also refactored the code, reduced nested if-else conditions.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@harryge00
Copy link
Contributor Author

harryge00 commented Nov 19, 2025

@Eikykun Since you implemented the first version of RayClusterScaleExpectation, could you view ?

@harryge00 harryge00 force-pushed the raycluster-scale-expectation-fail-fast branch 2 times, most recently from c5cedc5 to cf8f80e Compare November 20, 2025 08:14
@harryge00 harryge00 changed the title [RayCluster] fast-fail in RayClusterScaleExpectation's IsSatisfied [RayCluster] Improved the efficiency when checking rayclusters' expectations Nov 20, 2025
@harryge00 harryge00 force-pushed the raycluster-scale-expectation-fail-fast branch 5 times, most recently from e9d4859 to cc80b2e Compare November 20, 2025 13:51
Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @fscnick @machichima @seanlaii for review if you are interested

for more detail, you can refer #2566

@seanlaii
Copy link
Contributor

seanlaii commented Dec 6, 2025

Thanks for the contribution! Overall LGTM. Once the above comments are addressed, I will approve.

@fscnick
Copy link
Collaborator

fscnick commented Dec 7, 2025

Thanks for the contribution. LGTM. After the comments are resolved, I will approve.

@harryge00 harryge00 force-pushed the raycluster-scale-expectation-fail-fast branch from cc80b2e to 8a2b8a2 Compare December 11, 2025 13:07
@harryge00
Copy link
Contributor Author

@seanlaii @fscnick Thanks for reviewing. I’ve addressed all the points in the conversation.

Copy link
Contributor

@seanlaii seanlaii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@fscnick fscnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@machichima machichima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you

@machichima
Copy link
Collaborator

cc @rueian for final review and re-trigger the CI as the failed CI does not seem to related to this PR's change

@rueian rueian merged commit c273bf2 into ray-project:master Dec 21, 2025
27 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in @Future-Outlier's kuberay project Dec 22, 2025
@Future-Outlier Future-Outlier moved this from Done to In Progress in @Future-Outlier's kuberay project Dec 22, 2025
JiangJiaWei1103 pushed a commit to JiangJiaWei1103/kuberay that referenced this pull request Dec 22, 2025
@Future-Outlier
Copy link
Member

cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

@Future-Outlier Future-Outlier moved this from In Progress to Done in @Future-Outlier's kuberay project Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants