Skip to content

Conversation

@ryanaoleary
Copy link
Contributor

Why are these changes needed?

This PR updates the GCS state manager to append the labels set on a Node and retrieved from GCS to the dynamic_node_labels on the NodeState passed to the autoscaler. This PR also updates the V2 autoscaler to consider labels when calling _compute_score to schedule a ResourceRequest on a Node, assigning a higher score to a node type with labels that satisfy a LabelSelector.

Related issue number

#51564

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ryanaoleary ryanaoleary requested a review from a team as a code owner June 5, 2025 10:38
@ryanaoleary
Copy link
Contributor Author

@MengjinYan I ended up implementing the logic to pass labels from GCS to the autoscaler and consider labels in the autoscaler when scheduling in the same PR because it made it easier to test the change.

Copy link
Contributor

@can-anyscale can-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some tests are failing, could be related @ryanaoleary

Copy link
Contributor

@MengjinYan MengjinYan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we should also add the logic to parse the LabelSelector from the GCS GetClusterStatusReply in autoscaler to populate the label selector information.

Also, is it possible to add end to end test with label selector?

@MengjinYan
Copy link
Contributor

@rueian Can you take a look at the PR as well from autoscaler's perspective to see if we missed anything? Thanks!

@MengjinYan MengjinYan requested a review from rueian July 2, 2025 21:47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we should also change the _sort_resource_request logic in _try_schedule to add labels to the sorting mechanism, as there can resource requests with exact same resource requirements but different label selectors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, done in 8cb1816. I had it sort by the length of the constraints of the first selector, similar to how we sort based on the length of the placement constraints.

// raylet knows about.
int64 backlog_size = 4;
// The label selector constraints for this Resource shape on a node.
repeated LabelSelector label_selectors = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a repeated LabelSelectorConstraint field in the LabelSelector. Why do we need repeated here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LabelSelectorConstraints are the conditions associated with one label selector with AND semantics (i.e. we require a node to have labels accelerator-type="TPU" and market-type="SPOT" if those two respective constraints are specified). The repeated label selector is so that we can support fallback with label selectors (i.e. the user specifies two separate label selectors with their own constraint(s), and we fallback to the second one if the first is unsatisfiable).

@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Jul 23, 2025
@ray-gardener ray-gardener bot added community-contribution Contributed by the community core Issues that should be addressed in Ray Core labels Jul 23, 2025
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Copy link
Contributor

@dayshah dayshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally just minor nits, not really blockers


namespace std {
template <>
struct hash<ray::LabelSelector> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using std sets/maps with this somewhere instead of absl ones. Would prefer sticking to absl and just defining the absl hash functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using std::hash in task_spec.h here for the scheduling class descriptor. I think since we're using absl::flat_hash_map in that class I can just go ahead and switch it all to use absl hash functions instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9b3bbbe.

ryanaoleary and others added 10 commits July 28, 2025 21:44
Co-authored-by: Dhyey Shah <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
@ryanaoleary ryanaoleary requested a review from dayshah July 29, 2025 09:21
@jjyao jjyao changed the title Update V2 Autoscaler to support scheduling using Node labels and LabelSelector API [Core] Update V2 Autoscaler to support scheduling using Node labels and LabelSelector API Jul 30, 2025
@jjyao jjyao merged commit ffffcd4 into ray-project:master Jul 30, 2025
5 checks passed
krishnakalyan3 pushed a commit to krishnakalyan3/ray that referenced this pull request Jul 30, 2025
…nd LabelSelector API (ray-project#53578)

Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Rueian <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
Signed-off-by: Krishna Kalyan <[email protected]>
avibasnet31 pushed a commit to avibasnet31/ray that referenced this pull request Aug 2, 2025
…nd LabelSelector API (ray-project#53578)

Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Rueian <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
Signed-off-by: avigyabb <[email protected]>
avibasnet31 pushed a commit to avibasnet31/ray that referenced this pull request Aug 2, 2025
…nd LabelSelector API (ray-project#53578)

Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Rueian <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
Signed-off-by: avigyabb <[email protected]>
kamil-kaczmarek pushed a commit that referenced this pull request Aug 4, 2025
…nd LabelSelector API (#53578)

Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Rueian <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
…nd LabelSelector API (ray-project#53578)

Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Rueian <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
Signed-off-by: jugalshah291 <[email protected]>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
…nd LabelSelector API (ray-project#53578)

Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Rueian <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants