Skip to content

Conversation

@harshit-anyscale
Copy link
Contributor

adding external scaler enabled flag in the application config, which will dictate, whether to allow the external scalers to update the num replicas for an application or not.

this is being done as part of the custom autoscaling story

@harshit-anyscale harshit-anyscale requested a review from a team as a code owner October 15, 2025 06:30
@harshit-anyscale harshit-anyscale self-assigned this Oct 15, 2025
@harshit-anyscale harshit-anyscale added the go add ONLY when ready to merge, run all tests label Oct 15, 2025
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an external_scaler_enabled flag to control whether external scalers can modify the number of replicas for an application. The changes correctly plumb this new flag through the system, from the user-facing APIs down to the controller logic, and add validation to prevent conflicts with Serve's built-in autoscaling. The implementation is solid, but I've identified a couple of areas for improvement in the error handling of the new scaling API endpoint. Specifically, one issue involves returning a full traceback in an API error response, and another is incomplete handling of 'not found' errors for deployments, which could lead to unhelpful 503 server errors. I've provided suggestions to address these points for a more robust and user-friendly API.

@ray-gardener ray-gardener bot added the serve Ray Serve Related Issue label Oct 15, 2025
Signed-off-by: harshit <[email protected]>
Copy link
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a black box test that using serve config, then introduce enable_external_scaler switchback. First enable then disable then enable. Ensure that works. Make sure to not restart reset between switchback.

Signed-off-by: harshit <[email protected]>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: harshit <[email protected]>
Signed-off-by: harshit <[email protected]>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: harshit <[email protected]>
cursor[bot]

This comment was marked as outdated.

@abrarsheikh
Copy link
Contributor

add a black box test that using serve config, then introduce enable_external_scaler switchback. First enable then disable then enable. Ensure that works. Make sure to not restart reset between switchback.

not sure if you saw this ^. Additionally it would be good to add a test for using external autoscaler in both imperative and declarative case.

Signed-off-by: harshit <[email protected]>
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

# From customer's viewpoint, the deployment is deleted instead of being deleted
# as they must have already executed the delete command
{"error": "Deployment is deleted"},
{"error": str(e)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 ?

Signed-off-by: harshit <[email protected]>
Signed-off-by: harshit <[email protected]>
Copy link
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. @zcin mind taking a look as well.

@abrarsheikh
Copy link
Contributor

@harshit-anyscale please resolve comments that have been addressed, makes it easier to review future revisions.

@harshit-anyscale
Copy link
Contributor Author

@harshit-anyscale please resolve comments that have been addressed, makes it easier to review future revisions.

done, resolved them.

Copy link
Contributor

@zcin zcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm

deployment_args_list.append(deployment_args_proto.SerializeToString())

application_args_proto = ApplicationArgs()
application_args_proto.external_scaler_enabled = app.external_scaler_enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abrarsheikh fyi we should probably also move route_prefix here, route_prefix being deployment level is legacy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshit-anyscale create a GH issue for this

Signed-off-by: harshit <[email protected]>
Signed-off-by: harshit <[email protected]>
Signed-off-by: harshit <[email protected]>
Signed-off-by: harshit <[email protected]>
@abrarsheikh abrarsheikh merged commit 5435a56 into master Nov 24, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the add-external-scaler-c3 branch November 24, 2025 17:32
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
adding external scaler enabled flag in the application config, which
will dictate, whether to allow the external scalers to update the num
replicas for an application or not.

this is being done as part of the [custom autoscaling
story](https://docs.google.com/document/d/1KtMUDz1O3koihG6eh-QcUqudZjNAX3NsqqOMYh3BoWA/edit?tab=t.0#heading=h.2vf4s2d7ca46)

---------

Signed-off-by: harshit <[email protected]>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
adding external scaler enabled flag in the application config, which
will dictate, whether to allow the external scalers to update the num
replicas for an application or not.

this is being done as part of the [custom autoscaling
story](https://docs.google.com/document/d/1KtMUDz1O3koihG6eh-QcUqudZjNAX3NsqqOMYh3BoWA/edit?tab=t.0#heading=h.2vf4s2d7ca46)

---------

Signed-off-by: harshit <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants