[Serve] Prioritize stopping most recently scaled-up replicas during downscaling #52929

ktyxx · 2025-05-10T06:51:17Z

Why are these changes needed?

This PR improves the downscaling behavior in Ray Serve by modifying the logic in _get_replicas_to_stop() within Default DeploymentScheduler.

Previously, the scheduler selected replicas to stop by traversing the least loaded nodes in ascending order. This often resulted in stopping replicas that had been scheduled earlier and placed optimally using the _best_fit_node() strategy.

This led to several drawbacks:

Long-lived replicas, which were scheduled on best-fit nodes, were removed first — leading to inefficient reuse of resources.
Recently scaled-up replicas, which were placed on less utilized nodes, were kept longer despite being suboptimal.
Cold-start overhead increased, as newer replicas were removed before fully warming up.

This PR reverses the node traversal order during downscaling so that more recently added replicas are prioritized for termination, in cases where other conditions (e.g., running state and number of replicas per node) are equal. These newer replicas are typically less optimal in placement and not yet fully warmed up.

Preserving long-lived replicas improves performance stability and reduces unnecessary resource fragmentation.

Related issue number

N/A

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

abrarsheikh · 2025-05-23T16:14:54Z

python/ray/serve/_private/deployment_scheduler.py

since node_to_running_replicas_of_target_deployment[node_id] is a set, we dont get any guarantees that its going to stop replicas in reverse order. This needs a different implementation

Apologies for the delayed response — you were absolutely right.

node_to_running_replicas_of_target_deployment[node_id] was a set, so relying on reversed(list(...)) didn’t guarantee replica stop order. That was indeed a problem.

To address this, I've updated the implementation to use:

for node_id, _ in reversed( # noqa: C413 sorted(node_to_running_replicas_of_all_deployments.items(), key=key) ):

i dont think node_id is chronologically increasing, but I could be wrong, can you look into that. If I am right then the sorted node_id will not help with the task you set out to achieve.

Thanks for the review, @abrarsheikh.
You’re right—sorting by node_id doesn’t give chronological order, so this change doesn’t achieve the intended behavior. I’ll close this PR and revisit the down-scale logic in a separate patch. Appreciate your time and feedback!

github-actions · 2025-06-10T00:32:47Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

github-actions · 2025-06-26T00:39:57Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

ktyxx · 2025-07-14T04:56:22Z

Reopening this PR after closing it due to the use of set() which doesn’t preserve insertion order.

The goal of this change is to prefer stopping more recently launched replicas (which are often less optimized) before long-lived ones. This helps preserve well-placed warm replicas and improves scaling behavior.

By replacing replicas_to_stop with a list and preserving replica order, we avoid non-deterministic stop behavior. The performance impact is negligible, and the change aligns well with the scheduling goals of Serve.

The benefits of more stable and intelligent downscaling justify reopening this PR.

Signed-off-by: kitae <[email protected]>

…C413 Signed-off-by: kitae <[email protected]>

Signed-off-by: kitae <[email protected]>

…scale-down behavior Signed-off-by: kitae <[email protected]>

Signed-off-by: kitae <[email protected]>

python/ray/serve/_private/deployment_scheduler.py

abrarsheikh · 2025-07-15T05:56:14Z

python/ray/serve/_private/deployment_scheduler.py

    ) -> Set[ReplicaID]:
-        """Prioritize replicas running on a node with fewest replicas of
-            all deployments.
+        """Prioritize replicas on nodes with fewest replicas of all deployments


I prefer the old function implementation with inline comments, and variable names were easier to read.

Thanks for the feedback! I've reverted the original variable names and inline comments while keeping the list+reversed logic. Let me know if anything else looks off.

…ewest replicas first Signed-off-by: kitae <[email protected]>

zcin · 2025-08-01T20:17:46Z

python/ray/serve/_private/deployment_scheduler.py

        https://github.com/ray-project/ray/issues/20599.
        """
-        replicas_to_stop = set()
+        replicas_to_stop: List[ReplicaID] = []


why does replicas_to_stop need to be a list?

We need a list to preserve insertion order. Each replica is inserted into
self._running_replicas[deployment_id] exactly once in
on_replica_running() when it reaches RUNNING:

self._running_replicas[deployment_id][replica_id] = node_id

Python 3.7+ dicts preserve that insertion order, so keys are oldest → newest.
By iterating with reversed(list(...)) we get newest → oldest, which lets us
stop the most recently launched replica first when multiple replicas live on
the same node. We cast the list back to a set before returning so the public
API stays unchanged.

That makes sense, but why does replicas_to_stop need to be a list?

Just to confirm — are you asking why we keep replicas_to_stop itself as a list even though newest_first_replicas is already ordered newest → oldest?

We need the list while filling it: we append newest → oldest and exit once len == max_num_to_stop; a set would drop that order and break the LIFO guarantee.
Right before returning we cast to set(...), so the caller still gets a Set[ReplicaID]

Yeah exactly, my question is why do we need to have replicas_to_stop be ordered?
You seem to just be

initializing an empty list replicas_to_stop

running replicas_to_stop.append(pending_launching_recovering_replica)

running if running_replica not in replicas_to_stop: replicas_to_stop.append(running_replica)

returning set(replicas_to_stop)

If replicas_to_stop was a set, you could omit the check in (3)? Perhaps I'm missing something here.

a set would drop that order and break the LIFO guarantee

@ktyxx why does replicas_to_stop also need to be in order if LIFO is already guaranteed at the previous step when iterating over reversed(list(...))

You’re right—the container returned by _get_replicas_to_stop itself doesn’t need to be ordered.
My previous patch changed it to a list, but that was unnecessary.

I’ve switched replicas_to_stop back to a set and rewrote the selection logic so the LIFO rule is enforced by
1. taking the per-deployment _running_replicas, reversing it once (newest → oldest)
2. selecting from each node’s bucket in that order.

Thanks for pointing this out!

zcin · 2025-08-07T19:58:02Z

python/ray/serve/_private/deployment_scheduler.py

+        ordered_running_replicas.reverse()
+
+        # Bucket the (newest-first) replicas by node for fast lookup.
+        replicas_grouped_by_node: Dict[str, List[ReplicaID]] = defaultdict(list)


seems like we should replace node_to_running_replicas_of_target_deployment with this new replicas_grouped_by_node? maybe name it something like ordered_running_replicas_of_target_deployment

Thanks good catch. I removed the redundant node_to_running_replicas_of_target_deployment check and now use a single source of truth: ordered_running_replicas_of_target_deployment

…d_running_replicas_of_target_deployment Signed-off-by: kitae <[email protected]>

Signed-off-by: kitae <[email protected]>

ktyxx · 2025-11-20T08:09:46Z

@zcin Hi! Just a gentle ping on this PR when you get a chance.
No rush, just making sure it's on your radar. Thanks!

…ownscaling (ray-project#52929)   ## Why are these changes needed?  This PR improves the downscaling behavior in Ray Serve by modifying the logic in `_get_replicas_to_stop()` within Default `DeploymentScheduler`. Previously, the scheduler selected replicas to stop by traversing the least loaded nodes in ascending order. This often resulted in stopping replicas that had been scheduled earlier and placed optimally using the `_best_fit_node()` strategy. This led to several drawbacks: - Long-lived replicas, which were scheduled on best-fit nodes, were removed first — leading to inefficient reuse of resources. - Recently scaled-up replicas, which were placed on less utilized nodes, were kept longer despite being suboptimal. - Cold-start overhead increased, as newer replicas were removed before fully warming up. This PR reverses the node traversal order during downscaling so that **more recently added replicas are prioritized for termination**, *in cases where other conditions (e.g., running state and number of replicas per node) are equal*. These newer replicas are typically less optimal in placement and not yet fully warmed up. Preserving long-lived replicas improves performance stability and reduces unnecessary resource fragmentation. ## Related issue number  N/A ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kitae <[email protected]>

…ownscaling (ray-project#52929)   ## Why are these changes needed?  This PR improves the downscaling behavior in Ray Serve by modifying the logic in `_get_replicas_to_stop()` within Default `DeploymentScheduler`. Previously, the scheduler selected replicas to stop by traversing the least loaded nodes in ascending order. This often resulted in stopping replicas that had been scheduled earlier and placed optimally using the `_best_fit_node()` strategy. This led to several drawbacks: - Long-lived replicas, which were scheduled on best-fit nodes, were removed first — leading to inefficient reuse of resources. - Recently scaled-up replicas, which were placed on less utilized nodes, were kept longer despite being suboptimal. - Cold-start overhead increased, as newer replicas were removed before fully warming up. This PR reverses the node traversal order during downscaling so that **more recently added replicas are prioritized for termination**, *in cases where other conditions (e.g., running state and number of replicas per node) are equal*. These newer replicas are typically less optimal in placement and not yet fully warmed up. Preserving long-lived replicas improves performance stability and reduces unnecessary resource fragmentation. ## Related issue number  N/A ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kitae <[email protected]> Signed-off-by: YK <[email protected]>

…ownscaling (ray-project#52929)   ## Why are these changes needed?  This PR improves the downscaling behavior in Ray Serve by modifying the logic in `_get_replicas_to_stop()` within Default `DeploymentScheduler`. Previously, the scheduler selected replicas to stop by traversing the least loaded nodes in ascending order. This often resulted in stopping replicas that had been scheduled earlier and placed optimally using the `_best_fit_node()` strategy. This led to several drawbacks: - Long-lived replicas, which were scheduled on best-fit nodes, were removed first — leading to inefficient reuse of resources. - Recently scaled-up replicas, which were placed on less utilized nodes, were kept longer despite being suboptimal. - Cold-start overhead increased, as newer replicas were removed before fully warming up. This PR reverses the node traversal order during downscaling so that **more recently added replicas are prioritized for termination**, *in cases where other conditions (e.g., running state and number of replicas per node) are equal*. These newer replicas are typically less optimal in placement and not yet fully warmed up. Preserving long-lived replicas improves performance stability and reduces unnecessary resource fragmentation. ## Related issue number  N/A ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kitae <[email protected]>

hainesmichaelc added the community-contribution Contributed by the community label May 12, 2025

masoudcharkhabi added serve Ray Serve Related Issue stability labels May 12, 2025

ktyxx force-pushed the fix-replica-scale-down-order branch from b081d11 to 02a57df Compare May 13, 2025 09:15

hainesmichaelc added community-backlog and removed community-backlog labels May 22, 2025

akshay-anyscale requested a review from a team May 23, 2025 13:20

akshay-anyscale removed the stability label May 23, 2025

abrarsheikh reviewed May 23, 2025

View reviewed changes

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 10, 2025

ktyxx force-pushed the fix-replica-scale-down-order branch 2 times, most recently from d9e507b to 277db74 Compare June 11, 2025 05:18

github-actions bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 12, 2025

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 26, 2025

ktyxx closed this Jun 27, 2025

ktyxx reopened this Jul 14, 2025

ktyxx added 5 commits July 14, 2025 14:45

fix: prefer stopping most recently scaled-up replicas

f6008e7

Signed-off-by: kitae <[email protected]>

fix: update test expectations for replica downscale order change

df48313

Signed-off-by: kitae <[email protected]>

fix: preserve replica stop order using reversed(sorted(...)) # noqa: …

7b8840d

…C413 Signed-off-by: kitae <[email protected]>

test: fix assertions for reversed replica stop order

65a8365

Signed-off-by: kitae <[email protected]>

fix(serve): ensure replicas are stopped in creation order for better …

f7df605

…scale-down behavior Signed-off-by: kitae <[email protected]>

ktyxx force-pushed the fix-replica-scale-down-order branch from 3cf778b to f7df605 Compare July 14, 2025 05:46

github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Jul 14, 2025

fix: stop newest replicas first when nodes tie

dad6bb8

Signed-off-by: kitae <[email protected]>

abrarsheikh reviewed Jul 15, 2025

View reviewed changes

fix: restore original var names/comments; use list+reversed to stop n…

e6fd5c8

…ewest replicas first Signed-off-by: kitae <[email protected]>

zcin reviewed Aug 1, 2025

View reviewed changes

zcin reviewed Aug 7, 2025

View reviewed changes

ktyxx force-pushed the fix-replica-scale-down-order branch from 11c2be9 to f8e7e27 Compare August 27, 2025 02:44

fix: remove node_to_running_replicas_of_target_deployment; use ordere…

63dbed7

…d_running_replicas_of_target_deployment Signed-off-by: kitae <[email protected]>

ktyxx force-pushed the fix-replica-scale-down-order branch from f8e7e27 to 63dbed7 Compare August 27, 2025 02:50

zcin self-requested a review August 27, 2025 18:49

ktyxx marked this pull request as draft November 17, 2025 01:42

Merge branch 'master' into fix-replica-scale-down-order

d748382

ktyxx marked this pull request as ready for review November 17, 2025 01:46

ktyxx and others added 2 commits November 19, 2025 15:03

Merge branch 'master' into fix-replica-scale-down-order

62897db

fix: prioritize pending replicas before running ones in downscale logic

b111097

Signed-off-by: kitae <[email protected]>

abrarsheikh added the go add ONLY when ready to merge, run all tests label Nov 19, 2025

Merge branch 'master' into fix-replica-scale-down-order

9f27bed

zcin approved these changes Nov 21, 2025

View reviewed changes

zcin merged commit eaf2af4 into ray-project:master Nov 21, 2025
6 checks passed

[Serve] Prioritize stopping most recently scaled-up replicas during downscaling #52929

[Serve] Prioritize stopping most recently scaled-up replicas during downscaling #52929

Uh oh!

Conversation

ktyxx commented May 10, 2025

Why are these changes needed?

Related issue number

Checks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 10, 2025

Uh oh!

github-actions bot commented Jun 26, 2025

Uh oh!

ktyxx commented Jul 14, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zcin Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ktyxx commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zcin Aug 6, 2025 •

edited

Loading