fix: sglang prefill handler should pass dp rank.#7214
fix: sglang prefill handler should pass dp rank.#7214huitianbai wants to merge 1 commit intoai-dynamo:mainfrom
Conversation
Signed-off-by: baihuitian <baihuitian.bht@gmail.com>
|
👋 Hi huitianbai! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
|
@ishandhanani hi, I made this small change to make data_parallel_rank can be used in sglang prefill engine. |
WalkthroughThe changes implement explicit data parallel rank extraction and normalization across the request handling pipeline. The Rust router now returns u32::MAX as a marker value for the data parallel rank, which the Python handler extracts, normalizes to None when appropriate, and passes to the engine via a new parameter. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (1)
126-137:⚠️ Potential issue | 🟠 MajorUpdate to use
routed_dp_rankinstead of deprecateddata_parallel_rankparameter.SGLang Engine accepts the
data_parallel_rankparameter, but it is deprecated. SGLang recommends usingrouted_dp_rankinstead to avoid deprecation warnings. Update line 135 to passrouted_dp_rank=dp_rankinstead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py` around lines 126 - 137, The call to self.engine.async_generate is passing the deprecated data_parallel_rank parameter; change the invocation in prefill_handler.py to pass routed_dp_rank=dp_rank instead of data_parallel_rank=dp_rank (keep all other kwargs the same and retain the helper _priority_kwargs and existing parameters like sampling_params, stream, bootstrap_host, bootstrap_port, bootstrap_room, external_trace_header, rid, and trace_header).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py`:
- Around line 117-122: decode_handler.py fails to normalize the sentinel
u32::MAX value for dp_rank before calling async_generate; find the places where
routing.get("dp_rank") is read (the dp_rank extractions around the decode
handler branches that later call async_generate) and add the same check used in
prefill_handler: if dp_rank is not None and dp_rank == 2**32 - 1: dp_rank = None
so that dp_rank passed into async_generate is normalized consistently across
handlers.
---
Outside diff comments:
In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py`:
- Around line 126-137: The call to self.engine.async_generate is passing the
deprecated data_parallel_rank parameter; change the invocation in
prefill_handler.py to pass routed_dp_rank=dp_rank instead of
data_parallel_rank=dp_rank (keep all other kwargs the same and retain the helper
_priority_kwargs and existing parameters like sampling_params, stream,
bootstrap_host, bootstrap_port, bootstrap_room, external_trace_header, rid, and
trace_header).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 727419cc-0157-4aa8-8212-4da9f2546137
📒 Files selected for processing (2)
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.pylib/llm/src/kv_router/prefill_router.rs
| routing = inner_request.get("routing") or {} | ||
| priority = routing.get("priority") | ||
| dp_rank = routing.get("dp_rank") | ||
|
|
||
| if dp_rank is not None and dp_rank == 2**32 - 1: | ||
| dp_rank = None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check decode_handler.py for dp_rank handling
rg -n -C5 "dp_rank" components/src/dynamo/sglang/request_handlers/llm/decode_handler.pyRepository: ai-dynamo/dynamo
Length of output: 2097
🏁 Script executed:
# Check how async_generate uses data_parallel_rank
rg -n -B2 -A5 "data_parallel_rank" components/src/dynamo/sglang/request_handlers/llm/ --type py | head -100Repository: ai-dynamo/dynamo
Length of output: 2735
🏁 Script executed:
# Check vllm/handlers.py to understand the lenient routing pattern mentioned in learnings
rg -n -B3 -A8 "_to_local_dp_rank" components/src/dynamo/vllm/handlers.pyRepository: ai-dynamo/dynamo
Length of output: 2187
🏁 Script executed:
# Check if there are other handlers that normalize dp_rank
fd -e py components/src/dynamo/sglang/request_handlers | xargs rg -l "dp_rank" | head -20Repository: ai-dynamo/dynamo
Length of output: 1396
🏁 Script executed:
# Search for any shared normalization logic or base handler class
rg -n "def.*dp_rank" components/src/dynamo/sglang/request_handlers/ --type pyRepository: ai-dynamo/dynamo
Length of output: 42
🏁 Script executed:
# Check the base handler class if one exists
fd -e py . components/src/dynamo/sglang/request_handlers | xargs rg -l "class.*Handler" | head -5Repository: ai-dynamo/dynamo
Length of output: 454
🏁 Script executed:
# Check when/if the normalization was added to prefill_handler
git log -p --all -S "2**32 - 1" -- components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py | head -150Repository: ai-dynamo/dynamo
Length of output: 42
Apply the same u32::MAX normalization in decode_handler.py.
The prefill_handler normalizes the sentinel value 2**32 - 1 (u32::MAX) to None at lines 121–122 before passing dp_rank to async_generate. However, decode_handler.py extracts dp_rank from routing at lines 138 and 180 without performing this normalization, then passes it directly to async_generate at lines 150 and 190.
Both handlers use the same async_generate interface and should normalize the sentinel value consistently. Add the same normalization check in decode_handler.py:
if dp_rank is not None and dp_rank == 2**32 - 1:
dp_rank = NoneThis ensures the handlers follow the same defensive programming pattern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py` around
lines 117 - 122, decode_handler.py fails to normalize the sentinel u32::MAX
value for dp_rank before calling async_generate; find the places where
routing.get("dp_rank") is read (the dp_rank extractions around the decode
handler branches that later call async_generate) and add the same check used in
prefill_handler: if dp_rank is not None and dp_rank == 2**32 - 1: dp_rank = None
so that dp_rank passed into async_generate is normalized consistently across
handlers.
|
I think we need to wait for new sglang version right? Otherwise LGTM |
Agree. |
|
@PeaBrane can you take a look at this? |
When router_mode is not KV, the SimpleRouter path was hardcoding dp_rank to 0, causing prefill to always target the first data parallel rank. Use u32::MAX as a sentinel value instead, and treat it as None on the Python side so SGLang picks the correct rank. Cherry-picked from #7214. Co-Authored-By: huitian bai <baihuitian.bht@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When router_mode is not KV, the SimpleRouter path was hardcoding dp_rank to 0, causing prefill to always target the first data parallel rank. Use u32::MAX as a sentinel value instead, and treat it as None on the Python side so SGLang picks the correct rank. Cherry-picked from #7214. Co-Authored-By: huitian bai <baihuitian.bht@gmail.com>
|
merged with #6736 |
For sglang, prefill kv router select data parallel rank should pass into sglagn engine.
Also to avoid no kv router mode always select dp 0.
Summary by CodeRabbit