add remote dp rank for disaggregation.#18230
add remote dp rank for disaggregation.#18230huitianbai wants to merge 3 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
@ishandhanani What should I do next in this PR? |
|
@huitianbai can you confirm that this change works with the sgl-router (no need to wire anything up I just want to make sure that this doesn't break anything) and with the dynamo router? |
I tested a qwen3-8b P/D disagg example with sgl-router in a A800 server . It works fine. @ishandhanani |
I also tested dynamo-router, this PR will not break anything, but I found another bug. Sometimes, I encountered a crush in sglang with "Overflow when unpacking long long". In dynamo, the bootstrap_room is generated with: (https://github.com/ai-dynamo/dynamo/blob/main/lib/llm/src/kv_router/prefill_router.rs#L322) It may overflow and cause the "Overflow when unpacking long long". I locally fix it by: I think we should limit the bootstrap_room range to avoid overflow. |
|
/tag-and-rerun-ci |
|
PR LGTM - lets go ahead and get this in. @huitianbai - do you want to put one up for dynamo and the dynamo boostrap room as well? |
Ok, I will. |
|
Can you take a look at #19168 |
I think my PR is not required. @ishandhanani |
|
See #19168 |
This PR add a remote dp rank parameter for request, which helps routing for PD disaggregation.
In PD disagg mode, decode node simply uses dp_rank from req as prefill_dp_rank, the rank number belongs to decode itself, it will cause handshake failure.
It will close issue #17560.
Related dynamo discussion: ai-dynamo/dynamo#5638