[rollout] fix: correct heap-based load balancing in AsyncLLMServerManager#4505
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a load balancing issue in AsyncLLMServerManager by replacing the unpredictable hash(server) with a deterministic index as a tie-breaker in the server selection heap. The change ensures that servers are selected in a predictable, shuffled order when their request counts are equal. My review includes a suggestion to refactor the heap update logic in _choose_server to be more idiomatic and readable, improving maintainability.
| _, _, server = self.weighted_serveres[0] | ||
| self.weighted_serveres[0][0] += 1 | ||
| heapq.heapreplace(self.weighted_serveres, self.weighted_serveres[0]) | ||
| self.request_id_to_server[request_id] = server |
There was a problem hiding this comment.
The current implementation for updating the heap is a bit obscure. It modifies the top element of the heap in-place, which violates the heap invariant, and then uses heapq.heapreplace to restore it. While this works, it's not very readable and relies on implementation details of heapq.heapreplace.
A more idiomatic and clearer approach would be to use heapq.heappop followed by heapq.heappush. This makes the intent of the code much more explicit and improves maintainability.
| _, _, server = self.weighted_serveres[0] | |
| self.weighted_serveres[0][0] += 1 | |
| heapq.heapreplace(self.weighted_serveres, self.weighted_serveres[0]) | |
| self.request_id_to_server[request_id] = server | |
| count, idx, server = heapq.heappop(self.weighted_serveres) | |
| heapq.heappush(self.weighted_serveres, [count + 1, idx, server]) | |
| self.request_id_to_server[request_id] = server |
|
@vermouth1992 @wuxibin89 @PeterSH6 @chenhaiq @zhaochenyang20 @SwordFaith please review the request |
…ager (verl-project#4505) ### What does this PR do? This PR fixes the load balancing issue in AsyncLLMServerManager where the heap-based server selection was using hash values instead of indices, causing unpredictable server selection order after shuffling. Problem: The original implementation used hash(server) as the secondary sort key in the heap When all servers had the same request count (0), the heap would select the server with the minimum hash value, not the first server in the shuffled list This resulted in poor load distribution and defeated the purpose of random shuffling Solution: Replace hash(server) with explicit indices in the heap structure Heap now sorts by (request_count, index, server) instead of (request_count, hash, server) Ensures deterministic selection: when request counts are equal, the server with the lowest index (first in the shuffled list) is always chosen Example: # Before (❌ Broken): server_handles = [Server_A, Server_B, Server_C] random.shuffle(server_handles) # → [Server_C, Server_A, Server_B] weighted_servers = [[0, hash(s), s] for s in server_handles] # Heap might select Server_A first (min hash), not Server_C! # After (✅ Fixed): server_handles = [Server_A, Server_B, Server_C] random.shuffle(server_handles) # → [Server_C, Server_A, Server_B] weighted_servers = [[0, idx, s] for idx, s in enumerate(server_handles)] # Heap correctly selects Server_C first (idx=0) Co-authored-by: Aleksandr Semikin <aesemikin@alice-a100.sas.yp-c.yandex.net>
…ager (verl-project#4505) ### What does this PR do? This PR fixes the load balancing issue in AsyncLLMServerManager where the heap-based server selection was using hash values instead of indices, causing unpredictable server selection order after shuffling. Problem: The original implementation used hash(server) as the secondary sort key in the heap When all servers had the same request count (0), the heap would select the server with the minimum hash value, not the first server in the shuffled list This resulted in poor load distribution and defeated the purpose of random shuffling Solution: Replace hash(server) with explicit indices in the heap structure Heap now sorts by (request_count, index, server) instead of (request_count, hash, server) Ensures deterministic selection: when request counts are equal, the server with the lowest index (first in the shuffled list) is always chosen Example: # Before (❌ Broken): server_handles = [Server_A, Server_B, Server_C] random.shuffle(server_handles) # → [Server_C, Server_A, Server_B] weighted_servers = [[0, hash(s), s] for s in server_handles] # Heap might select Server_A first (min hash), not Server_C! # After (✅ Fixed): server_handles = [Server_A, Server_B, Server_C] random.shuffle(server_handles) # → [Server_C, Server_A, Server_B] weighted_servers = [[0, idx, s] for idx, s in enumerate(server_handles)] # Heap correctly selects Server_C first (idx=0) Co-authored-by: Aleksandr Semikin <aesemikin@alice-a100.sas.yp-c.yandex.net>
What does this PR do?
This PR fixes the load balancing issue in AsyncLLMServerManager where the heap-based server selection was using hash values instead of indices, causing unpredictable server selection order after shuffling.
Problem:
The original implementation used hash(server) as the secondary sort key in the heap
When all servers had the same request count (0), the heap would select the server with the minimum hash value, not the first server in the shuffled list
This resulted in poor load distribution and defeated the purpose of random shuffling
Solution:
Replace hash(server) with explicit indices in the heap structure
Heap now sorts by (request_count, index, server) instead of (request_count, hash, server)
Ensures deterministic selection: when request counts are equal, the server with the lowest index (first in the shuffled list) is always chosen
Example:
Before (❌ Broken):
server_handles = [Server_A, Server_B, Server_C]
random.shuffle(server_handles) # → [Server_C, Server_A, Server_B]
weighted_servers = [[0, hash(s), s] for s in server_handles]
Heap might select Server_A first (min hash), not Server_C!
After (✅ Fixed):
server_handles = [Server_A, Server_B, Server_C]
random.shuffle(server_handles) # → [Server_C, Server_A, Server_B]
weighted_servers = [[0, idx, s] for idx, s in enumerate(server_handles)]
Heap correctly selects Server_C first (idx=0)