Skip to content

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Nov 8, 2025

Summary

Modified replica rank assignment to defer rank allocation until the replica is actually allocated, rather than assigning it during the startup call. This is necessary when we want to add node local rank in future, in order to support node rank and node local rank we need to know the node_id which is only known after replica is allocated.

Changes

  • Changed start() method signature to accept assign_rank_callback instead of a pre-assigned rank parameter
  • Rank is now assigned after _allocated_obj_ref is resolved, ensuring the replica is allocated before rank assignment
  • Pass rank to initialize_and_get_metadata() method on the replica actor, allowing rank to be set during initialization
  • Updated ReplicaBase.initialize() to accept rank as a parameter and set it along with the internal replica context
  • Added PENDING_INITIALIZATION status check to handle cases where _ready_obj_ref is not yet set

Next PR #58479

@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Nov 8, 2025
@abrarsheikh abrarsheikh changed the title pass rank into replicas initialize method [3/n] [Serve] Defer rank assignment after replica is allocated Nov 8, 2025
abrarsheikh added a commit that referenced this pull request Nov 14, 2025
…58473)

### Summary
This PR refactors the replica rank system to support multi-dimensional
ranking (global, node-level, and local ranks) in preparation for
node-local rank tracking. The `ReplicaRank` object now contains three
fields instead of being a simple integer, enabling better coordination
of replicas across nodes.

### Motivation
Currently, Ray Serve only tracks a single global rank per replica. For
advanced use cases like tensor parallelism, model sharding across nodes,
and node-aware coordination, we need to track:
- **Global rank**: Replica's rank across all nodes (0 to N-1)
- **Node rank**: Which node the replica is on (0 to M-1) 
- **Local rank**: Replica's rank on its specific node (0 to K-1)

This PR lays the groundwork by introducing the expanded `ReplicaRank`
schema while maintaining backward compatibility in feature.

### Changes

#### Core Implementation
- **`schema.py`**: Extended `ReplicaRank` to include `node_rank` and
`local_rank` fields (currently set to -1 as placeholders)
- **`replica.py`**: Updated replica actors to handle `ReplicaRank`
objects
- **`context.py`**: Changed `ReplicaContext.rank` type from
`Optional[int]` to `ReplicaRank`

### Current Behavior
- `node_rank` and `local_rank` are set to `-1` (placeholder values).
Will change in future
- Global rank assignment and management works as before
- All existing functionality is preserved

### Breaking Changes
Rank is changing from `int` to `ReplicaRank`

Next PR #58477

---------

Signed-off-by: abrar <[email protected]>
Base automatically changed from LLM-2497-abrar-rank-p2 to master November 14, 2025 06:05
@abrarsheikh abrarsheikh marked this pull request as ready for review November 14, 2025 07:14
@abrarsheikh abrarsheikh requested a review from a team as a code owner November 14, 2025 07:14
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Recovery Breaks Deferred Rank Assignment

During replica recovery after controller restart, initialize_and_get_metadata.remote() is called without passing a rank parameter (line 671), but the method signature now requires it. The _assign_rank_callback is never set during recovery since start() isn't called, and the rank assignment logic in check_ready() (lines 743-746) is skipped because _ready_obj_ref is already set by recover(). This causes recovered replicas to initialize without a rank, breaking the deferred rank assignment feature.

python/ray/serve/_private/deployment_state.py#L627-L673

def recover(self) -> bool:
"""Recover replica version from a live replica actor.
When controller dies, the deployment state loses the info on the version that's
running on each individual replica actor, so as part of the recovery process, we
need to recover the version that is running on the replica actor.
Also confirm that actor is allocated and initialized before marking as running.
Returns: False if the replica actor is no longer alive; the
actor could have been killed in the time between when the
controller fetching all Serve actors in the cluster and when
the controller tries to recover it. Otherwise, return True.
"""
logger.info(f"Recovering {self.replica_id}.")
try:
self._actor_handle = ray.get_actor(
self._actor_name, namespace=SERVE_NAMESPACE
)
except ValueError:
logger.warning(
f"Failed to get handle to replica {self._actor_name} "
"during controller recovery. Marking as dead."
)
return False
try:
self._placement_group = ray.util.get_placement_group(
self._actor_name,
)
except ValueError:
# ValueError is raised if the placement group does not exist.
self._placement_group = None
# Re-fetch initialization proof
self._allocated_obj_ref = self._actor_handle.is_allocated.remote()
# Running actor handle already has all info needed, thus successful
# starting simply means retrieving replica version hash from actor
if self._is_cross_language:
self._ready_obj_ref = self._actor_handle.check_health.remote()
else:
self._ready_obj_ref = (
self._actor_handle.initialize_and_get_metadata.remote()
)

Fix in Cursor Fix in Web


@abrarsheikh
Copy link
Contributor Author

During replica recovery after controller restart, initialize_and_get_metadata.remote() is called without passing a rank parameter (line 671), but the method signature now requires it. The _assign_rank_callback is never set during recovery since start() isn't called, and the rank assignment logic in check_ready() (lines 743-746) is skipped because _ready_obj_ref is already set by recover(). This causes recovered replicas to initialize without a rank, breaking the deferred rank assignment feature.

During replica recovery after controller restart, initialize_and_get_metadata.remote() is called without passing a rank parameter (line 671), but the method signature now requires it

No true, rank is not required.

This causes recovered replicas to initialize without a rank, breaking the deferred rank assignment feature.

this is expect, because we want to fetch rank from already running replica instead to assigning it a rank during recovery.

Signed-off-by: abrar <[email protected]>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Broken Rank Assignment in Replica Recovery

During replica recovery after controller restart, initialize_and_get_metadata.remote() is called without passing a rank parameter, but the method signature now requires it. The _assign_rank_callback is never set during recovery since start() isn't called, and the rank assignment logic in check_ready() is skipped because _ready_obj_ref is already set by recover(). This causes recovered replicas to initialize without a rank, breaking the deferred rank assignment system.

python/ray/serve/_private/deployment_state.py#L669-L672

else:
self._ready_obj_ref = (
self._actor_handle.initialize_and_get_metadata.remote()
)

Fix in Cursor Fix in Web


@ray-gardener ray-gardener bot added the serve Ray Serve Related Issue label Nov 14, 2025
ArturNiederfahrenhorst pushed a commit to ArturNiederfahrenhorst/ray that referenced this pull request Nov 16, 2025
…ay-project#58473)

### Summary
This PR refactors the replica rank system to support multi-dimensional
ranking (global, node-level, and local ranks) in preparation for
node-local rank tracking. The `ReplicaRank` object now contains three
fields instead of being a simple integer, enabling better coordination
of replicas across nodes.

### Motivation
Currently, Ray Serve only tracks a single global rank per replica. For
advanced use cases like tensor parallelism, model sharding across nodes,
and node-aware coordination, we need to track:
- **Global rank**: Replica's rank across all nodes (0 to N-1)
- **Node rank**: Which node the replica is on (0 to M-1) 
- **Local rank**: Replica's rank on its specific node (0 to K-1)

This PR lays the groundwork by introducing the expanded `ReplicaRank`
schema while maintaining backward compatibility in feature.

### Changes

#### Core Implementation
- **`schema.py`**: Extended `ReplicaRank` to include `node_rank` and
`local_rank` fields (currently set to -1 as placeholders)
- **`replica.py`**: Updated replica actors to handle `ReplicaRank`
objects
- **`context.py`**: Changed `ReplicaContext.rank` type from
`Optional[int]` to `ReplicaRank`

### Current Behavior
- `node_rank` and `local_rank` are set to `-1` (placeholder values).
Will change in future
- Global rank assignment and management works as before
- All existing functionality is preserved

### Breaking Changes
Rank is changing from `int` to `ReplicaRank`

Next PR ray-project#58477

---------

Signed-off-by: abrar <[email protected]>
Copy link
Contributor

@zcin zcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the impact of this change on replica startup time?

@abrarsheikh
Copy link
Contributor Author

what is the impact of this change on replica startup time?

I think this adds RAY_SERVE_CONTROL_LOOP_INTERVAL_S delay in replica startup time.

@zcin
Copy link
Contributor

zcin commented Nov 17, 2025

Can we test / measure e2e

@abrarsheikh
Copy link
Contributor Author

Can we test / measure e2e

test application

applications:
- import_path: app:app
  deployments:
  - name: DITest
    num_replicas: 4
    ray_actor_options:
      num_cpus: 0.1
  
  - name: d_2
    num_replicas: 4
    ray_actor_options:
      num_cpus: 0.1
  - name: d_1
    num_replicas: 4
    ray_actor_options:
      num_cpus: 0.1

profile code

diff --git a/python/ray/serve/_private/deployment_state.py b/python/ray/serve/_private/deployment_state.py
index 4b96833aff..829e0a84f0 100644
--- a/python/ray/serve/_private/deployment_state.py
+++ b/python/ray/serve/_private/deployment_state.py
@@ -283,6 +283,7 @@ class ActorReplicaWrapper:

         # Outbound deployments polling state
         self._outbound_deployments: Optional[List[DeploymentID]] = None
+        self._replica_start_ts: float = 0.0

     @property
     def replica_id(self) -> str:
@@ -460,7 +461,7 @@ class ActorReplicaWrapper:
         self._deployment_is_cross_language = (
             deployment_info.deployment_config.is_cross_language
         )
-
+        self._replica_start_ts = time.time()
         logger.info(
             f"Starting {self.replica_id}.",
             extra={"log_to_stderr": False},
@@ -790,6 +791,9 @@ class ActorReplicaWrapper:
                 )
                 return ReplicaStartupStatus.FAILED, repr(e)

+        time_taken = time.time() - self._replica_start_ts
+        logger.info(f"Replica {self._replica_id} started in {time_taken:.2f}s.")
+
         return ReplicaStartupStatus.SUCCEEDED, None

     @property

From this PR

❯ RAY_SERVE_CONTROL_LOOP_INTERVAL_S=2 serve run raw_config.yaml
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,908 controller 3251490 -- Replica Replica(id='ykzh0lp3', deployment='d_1', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,908 controller 3251490 -- Replica Replica(id='wj1jwbst', deployment='d_1', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,909 controller 3251490 -- Replica Replica(id='cob2l0ts', deployment='d_1', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,909 controller 3251490 -- Replica Replica(id='luh8ykjt', deployment='d_1', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,909 controller 3251490 -- Replica Replica(id='8vsm1wg7', deployment='d_2', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,909 controller 3251490 -- Replica Replica(id='ax8lcckm', deployment='d_2', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,910 controller 3251490 -- Replica Replica(id='rdhkqk0q', deployment='d_2', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,910 controller 3251490 -- Replica Replica(id='888udqd7', deployment='d_2', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,910 controller 3251490 -- Replica Replica(id='15e088r3', deployment='DITest', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,911 controller 3251490 -- Replica Replica(id='4r4b7on4', deployment='DITest', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,911 controller 3251490 -- Replica Replica(id='qq5kzseb', deployment='DITest', app='default') started in 4.08s.
(ServeController pid=3251490) INFO 2025-11-19 00:02:14,911 controller 3251490 -- Replica Replica(id='iffoha2e', deployment='DITest', app='default') started in 4.08s.

From master


❯ RAY_SERVE_CONTROL_LOOP_INTERVAL_S=2 serve run raw_config.yaml
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,959 controller 3257145 -- Replica Replica(id='9nry8lw4', deployment='d_1', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,960 controller 3257145 -- Replica Replica(id='q2nq0n66', deployment='d_1', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,960 controller 3257145 -- Replica Replica(id='r9ot49qp', deployment='d_1', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,960 controller 3257145 -- Replica Replica(id='xbmeloll', deployment='d_1', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,961 controller 3257145 -- Replica Replica(id='rl0glnyh', deployment='d_2', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,961 controller 3257145 -- Replica Replica(id='fnmdu1an', deployment='d_2', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,961 controller 3257145 -- Replica Replica(id='erjxm7ur', deployment='d_2', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,962 controller 3257145 -- Replica Replica(id='nuh5449r', deployment='d_2', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,962 controller 3257145 -- Replica Replica(id='tc2zxiwd', deployment='DITest', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,962 controller 3257145 -- Replica Replica(id='uj60v4l7', deployment='DITest', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,962 controller 3257145 -- Replica Replica(id='0p05e6wp', deployment='DITest', app='default') started in 2.06s.
(ServeController pid=3257145) INFO 2025-11-19 00:04:07,963 controller 3257145 -- Replica Replica(id='ab7y7a56', deployment='DITest', app='default') started in 2.06s.

@zcin
Copy link
Contributor

zcin commented Nov 19, 2025

Hmm that's a pretty significant increase. Is there a way to avoid it?

@abrarsheikh
Copy link
Contributor Author

+2 additional seconds to start the replica is because I set RAY_SERVE_CONTROL_LOOP_INTERVAL_S=2 in my run, mkaing sure you saw that. So in the default that the effect is not this prominant.

The other option I can think of to start the replica in the same controller iteration is to use _on_completed API from core, but @edoakes recommended against it.

@abrarsheikh
Copy link
Contributor Author

without the constant

on master
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,685 controller 3469525 -- Replica Replica(id='wa1t182b', deployment='d_1', app='default') started in 0.68s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,686 controller 3469525 -- Replica Replica(id='1gh8kc4h', deployment='d_1', app='default') started in 0.68s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,686 controller 3469525 -- Replica Replica(id='8ht8dpb4', deployment='d_1', app='default') started in 0.68s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,687 controller 3469525 -- Replica Replica(id='ha0yyojf', deployment='d_1', app='default') started in 0.68s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,687 controller 3469525 -- Replica Replica(id='0bi24lia', deployment='d_2', app='default') started in 0.68s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,687 controller 3469525 -- Replica Replica(id='zw0vl107', deployment='d_2', app='default') started in 0.67s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,687 controller 3469525 -- Replica Replica(id='75ctfmjf', deployment='d_2', app='default') started in 0.68s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,688 controller 3469525 -- Replica Replica(id='f8fkr17n', deployment='d_2', app='default') started in 0.68s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,688 controller 3469525 -- Replica Replica(id='qbppjta5', deployment='DITest', app='default') started in 0.68s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,688 controller 3469525 -- Replica Replica(id='zdumdewc', deployment='DITest', app='default') started in 0.67s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,688 controller 3469525 -- Replica Replica(id='au93ydr2', deployment='DITest', app='default') started in 0.67s.
(ServeController pid=3469525) INFO 2025-11-19 17:48:20,689 controller 3469525 -- Replica Replica(id='nx4mtlil', deployment='DITest', app='default') started in 0.67s.


with changes in PR
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,600 controller 3474912 -- Replica Replica(id='930w30uy', deployment='d_1', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,600 controller 3474912 -- Replica Replica(id='r1vgpqpz', deployment='d_1', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,601 controller 3474912 -- Replica Replica(id='cu4clrvo', deployment='d_1', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,601 controller 3474912 -- Replica Replica(id='7vcgxjv8', deployment='d_1', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,601 controller 3474912 -- Replica Replica(id='l1ei9uz0', deployment='d_2', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,602 controller 3474912 -- Replica Replica(id='76hxiwqk', deployment='d_2', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,602 controller 3474912 -- Replica Replica(id='do47qcqo', deployment='d_2', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,602 controller 3474912 -- Replica Replica(id='uoz56m8r', deployment='d_2', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,603 controller 3474912 -- Replica Replica(id='fa11aprv', deployment='DITest', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,603 controller 3474912 -- Replica Replica(id='v8c9wyyj', deployment='DITest', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,604 controller 3474912 -- Replica Replica(id='cvlh1fbc', deployment='DITest', app='default') initialization time: 0.68s
(ServeController pid=3474912) INFO 2025-11-19 17:50:20,604 controller 3474912 -- Replica Replica(id='j52efolh', deployment='DITest', app='default') initialization time: 0.68s

@abrarsheikh abrarsheikh merged commit fa625a6 into master Nov 19, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the LLM-2497-abrar-rank-p3 branch November 19, 2025 18:04
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ay-project#58473)

### Summary
This PR refactors the replica rank system to support multi-dimensional
ranking (global, node-level, and local ranks) in preparation for
node-local rank tracking. The `ReplicaRank` object now contains three
fields instead of being a simple integer, enabling better coordination
of replicas across nodes.

### Motivation
Currently, Ray Serve only tracks a single global rank per replica. For
advanced use cases like tensor parallelism, model sharding across nodes,
and node-aware coordination, we need to track:
- **Global rank**: Replica's rank across all nodes (0 to N-1)
- **Node rank**: Which node the replica is on (0 to M-1)
- **Local rank**: Replica's rank on its specific node (0 to K-1)

This PR lays the groundwork by introducing the expanded `ReplicaRank`
schema while maintaining backward compatibility in feature.

### Changes

#### Core Implementation
- **`schema.py`**: Extended `ReplicaRank` to include `node_rank` and
`local_rank` fields (currently set to -1 as placeholders)
- **`replica.py`**: Updated replica actors to handle `ReplicaRank`
objects
- **`context.py`**: Changed `ReplicaContext.rank` type from
`Optional[int]` to `ReplicaRank`

### Current Behavior
- `node_rank` and `local_rank` are set to `-1` (placeholder values).
Will change in future
- Global rank assignment and management works as before
- All existing functionality is preserved

### Breaking Changes
Rank is changing from `int` to `ReplicaRank`

Next PR ray-project#58477

---------

Signed-off-by: abrar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…roject#58477)

**Summary**

Modified replica rank assignment to defer rank allocation until the
replica is actually allocated, rather than assigning it during the
startup call. This is necessary when we want to add node local rank in
future, in order to support node rank and node local rank we need to
know the node_id which is only known after replica is allocated.

**Changes**

- Changed `start()` method signature to accept `assign_rank_callback`
instead of a pre-assigned `rank` parameter
- Rank is now assigned after `_allocated_obj_ref` is resolved, ensuring
the replica is allocated before rank assignment
- Pass rank to `initialize_and_get_metadata()` method on the replica
actor, allowing rank to be set during initialization
- Updated `ReplicaBase.initialize()` to accept rank as a parameter and
set it along with the internal replica context
- Added `PENDING_INITIALIZATION` status check to handle cases where
`_ready_obj_ref` is not yet set

Next PR ray-project#58479

---------

Signed-off-by: abrar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
400Ping pushed a commit to 400Ping/ray that referenced this pull request Nov 21, 2025
…roject#58477)

**Summary**

Modified replica rank assignment to defer rank allocation until the
replica is actually allocated, rather than assigning it during the
startup call. This is necessary when we want to add node local rank in
future, in order to support node rank and node local rank we need to
know the node_id which is only known after replica is allocated.

**Changes**

- Changed `start()` method signature to accept `assign_rank_callback`
instead of a pre-assigned `rank` parameter
- Rank is now assigned after `_allocated_obj_ref` is resolved, ensuring
the replica is allocated before rank assignment
- Pass rank to `initialize_and_get_metadata()` method on the replica
actor, allowing rank to be set during initialization
- Updated `ReplicaBase.initialize()` to accept rank as a parameter and
set it along with the internal replica context
- Added `PENDING_INITIALIZATION` status check to handle cases where
`_ready_obj_ref` is not yet set

Next PR ray-project#58479

---------

Signed-off-by: abrar <[email protected]>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…ay-project#58473)

### Summary
This PR refactors the replica rank system to support multi-dimensional
ranking (global, node-level, and local ranks) in preparation for
node-local rank tracking. The `ReplicaRank` object now contains three
fields instead of being a simple integer, enabling better coordination
of replicas across nodes.

### Motivation
Currently, Ray Serve only tracks a single global rank per replica. For
advanced use cases like tensor parallelism, model sharding across nodes,
and node-aware coordination, we need to track:
- **Global rank**: Replica's rank across all nodes (0 to N-1)
- **Node rank**: Which node the replica is on (0 to M-1)
- **Local rank**: Replica's rank on its specific node (0 to K-1)

This PR lays the groundwork by introducing the expanded `ReplicaRank`
schema while maintaining backward compatibility in feature.

### Changes

#### Core Implementation
- **`schema.py`**: Extended `ReplicaRank` to include `node_rank` and
`local_rank` fields (currently set to -1 as placeholders)
- **`replica.py`**: Updated replica actors to handle `ReplicaRank`
objects
- **`context.py`**: Changed `ReplicaContext.rank` type from
`Optional[int]` to `ReplicaRank`

### Current Behavior
- `node_rank` and `local_rank` are set to `-1` (placeholder values).
Will change in future
- Global rank assignment and management works as before
- All existing functionality is preserved

### Breaking Changes
Rank is changing from `int` to `ReplicaRank`

Next PR ray-project#58477

---------

Signed-off-by: abrar <[email protected]>
Signed-off-by: YK <[email protected]>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…roject#58477)

**Summary**

Modified replica rank assignment to defer rank allocation until the
replica is actually allocated, rather than assigning it during the
startup call. This is necessary when we want to add node local rank in
future, in order to support node rank and node local rank we need to
know the node_id which is only known after replica is allocated.

**Changes**

- Changed `start()` method signature to accept `assign_rank_callback`
instead of a pre-assigned `rank` parameter
- Rank is now assigned after `_allocated_obj_ref` is resolved, ensuring
the replica is allocated before rank assignment
- Pass rank to `initialize_and_get_metadata()` method on the replica
actor, allowing rank to be set during initialization
- Updated `ReplicaBase.initialize()` to accept rank as a parameter and
set it along with the internal replica context
- Added `PENDING_INITIALIZATION` status check to handle cases where
`_ready_obj_ref` is not yet set

Next PR ray-project#58479

---------

Signed-off-by: abrar <[email protected]>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ay-project#58473)

### Summary
This PR refactors the replica rank system to support multi-dimensional
ranking (global, node-level, and local ranks) in preparation for
node-local rank tracking. The `ReplicaRank` object now contains three
fields instead of being a simple integer, enabling better coordination
of replicas across nodes.

### Motivation
Currently, Ray Serve only tracks a single global rank per replica. For
advanced use cases like tensor parallelism, model sharding across nodes,
and node-aware coordination, we need to track:
- **Global rank**: Replica's rank across all nodes (0 to N-1)
- **Node rank**: Which node the replica is on (0 to M-1) 
- **Local rank**: Replica's rank on its specific node (0 to K-1)

This PR lays the groundwork by introducing the expanded `ReplicaRank`
schema while maintaining backward compatibility in feature.

### Changes

#### Core Implementation
- **`schema.py`**: Extended `ReplicaRank` to include `node_rank` and
`local_rank` fields (currently set to -1 as placeholders)
- **`replica.py`**: Updated replica actors to handle `ReplicaRank`
objects
- **`context.py`**: Changed `ReplicaContext.rank` type from
`Optional[int]` to `ReplicaRank`

### Current Behavior
- `node_rank` and `local_rank` are set to `-1` (placeholder values).
Will change in future
- Global rank assignment and management works as before
- All existing functionality is preserved

### Breaking Changes
Rank is changing from `int` to `ReplicaRank`

Next PR ray-project#58477

---------

Signed-off-by: abrar <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…roject#58477)

**Summary**

Modified replica rank assignment to defer rank allocation until the
replica is actually allocated, rather than assigning it during the
startup call. This is necessary when we want to add node local rank in
future, in order to support node rank and node local rank we need to
know the node_id which is only known after replica is allocated.

**Changes**

- Changed `start()` method signature to accept `assign_rank_callback`
instead of a pre-assigned `rank` parameter
- Rank is now assigned after `_allocated_obj_ref` is resolved, ensuring
the replica is allocated before rank assignment
- Pass rank to `initialize_and_get_metadata()` method on the replica
actor, allowing rank to be set during initialization
- Updated `ReplicaBase.initialize()` to accept rank as a parameter and
set it along with the internal replica context
- Added `PENDING_INITIALIZATION` status check to handle cases where
`_ready_obj_ref` is not yet set

Next PR ray-project#58479

---------

Signed-off-by: abrar <[email protected]>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…ay-project#58473)

### Summary
This PR refactors the replica rank system to support multi-dimensional
ranking (global, node-level, and local ranks) in preparation for
node-local rank tracking. The `ReplicaRank` object now contains three
fields instead of being a simple integer, enabling better coordination
of replicas across nodes.

### Motivation
Currently, Ray Serve only tracks a single global rank per replica. For
advanced use cases like tensor parallelism, model sharding across nodes,
and node-aware coordination, we need to track:
- **Global rank**: Replica's rank across all nodes (0 to N-1)
- **Node rank**: Which node the replica is on (0 to M-1)
- **Local rank**: Replica's rank on its specific node (0 to K-1)

This PR lays the groundwork by introducing the expanded `ReplicaRank`
schema while maintaining backward compatibility in feature.

### Changes

#### Core Implementation
- **`schema.py`**: Extended `ReplicaRank` to include `node_rank` and
`local_rank` fields (currently set to -1 as placeholders)
- **`replica.py`**: Updated replica actors to handle `ReplicaRank`
objects
- **`context.py`**: Changed `ReplicaContext.rank` type from
`Optional[int]` to `ReplicaRank`

### Current Behavior
- `node_rank` and `local_rank` are set to `-1` (placeholder values).
Will change in future
- Global rank assignment and management works as before
- All existing functionality is preserved

### Breaking Changes
Rank is changing from `int` to `ReplicaRank`

Next PR ray-project#58477

---------

Signed-off-by: abrar <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants