fix: sglang prefill handler should pass dp rank. by huitianbai · Pull Request #7214 · ai-dynamo/dynamo

huitianbai · 2026-03-11T07:45:05Z

For sglang, prefill kv router select data parallel rank should pass into sglagn engine.

Also to avoid no kv router mode always select dp 0.

Summary by CodeRabbit

Bug Fixes
- Enhanced routing data extraction and configuration handling for distributed request processing
- Optimized data parallel rank normalization for distributed system deployments to address edge-case scenarios

Signed-off-by: baihuitian <baihuitian.bht@gmail.com>

copy-pr-bot · 2026-03-11T07:45:10Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-03-11T07:45:14Z

👋 Hi huitianbai! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

huitianbai · 2026-03-11T07:46:41Z

@ishandhanani hi, I made this small change to make data_parallel_rank can be used in sglang prefill engine.

coderabbitai · 2026-03-11T07:51:06Z

Walkthrough

The changes implement explicit data parallel rank extraction and normalization across the request handling pipeline. The Rust router now returns u32::MAX as a marker value for the data parallel rank, which the Python handler extracts, normalizes to None when appropriate, and passes to the engine via a new parameter.

Changes

Cohort / File(s)	Summary
Data Parallel Rank Extraction and Normalization `lib/llm/src/kv_router/prefill_router.rs`, `components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py`	Rust router returns u32::MAX for dp_rank in SimpleRouter branch; Python handler extracts routing data, normalizes u32::MAX to None, and passes dp_rank to engine.async_generate via new data_parallel_rank parameter.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A rank flows down the rabbit hole,
From MAX to None, we reach our goal,
With routing data crisp and clear,
The prefill handler holds it dear,
Through engine gates, the data flows—
Where parallel ranks go, nobody knows! 🎯

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description is incomplete, missing required template sections (Overview, Details, Where to start, Related Issues) and contains grammatical issues.	Expand description with all template sections: add Overview section, elaborate Details about routing changes, specify files to review, and include Related Issues section.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: passing data parallel rank from the prefill router to the sglang engine handler.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (1)
126-137: ⚠️ Potential issue | 🟠 Major

Update to use routed_dp_rank instead of deprecated data_parallel_rank parameter.

SGLang Engine accepts the data_parallel_rank parameter, but it is deprecated. SGLang recommends using routed_dp_rank instead to avoid deprecation warnings. Update line 135 to pass routed_dp_rank=dp_rank instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py` around
lines 126 - 137, The call to self.engine.async_generate is passing the
deprecated data_parallel_rank parameter; change the invocation in
prefill_handler.py to pass routed_dp_rank=dp_rank instead of
data_parallel_rank=dp_rank (keep all other kwargs the same and retain the helper
_priority_kwargs and existing parameters like sampling_params, stream,
bootstrap_host, bootstrap_port, bootstrap_room, external_trace_header, rid, and
trace_header).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py`:
- Around line 117-122: decode_handler.py fails to normalize the sentinel
u32::MAX value for dp_rank before calling async_generate; find the places where
routing.get("dp_rank") is read (the dp_rank extractions around the decode
handler branches that later call async_generate) and add the same check used in
prefill_handler: if dp_rank is not None and dp_rank == 2**32 - 1: dp_rank = None
so that dp_rank passed into async_generate is normalized consistently across
handlers.

---

Outside diff comments:
In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py`:
- Around line 126-137: The call to self.engine.async_generate is passing the
deprecated data_parallel_rank parameter; change the invocation in
prefill_handler.py to pass routed_dp_rank=dp_rank instead of
data_parallel_rank=dp_rank (keep all other kwargs the same and retain the helper
_priority_kwargs and existing parameters like sampling_params, stream,
bootstrap_host, bootstrap_port, bootstrap_room, external_trace_header, rid, and
trace_header).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 727419cc-0157-4aa8-8212-4da9f2546137

📥 Commits

Reviewing files that changed from the base of the PR and between e930526 and 5627e16.

📒 Files selected for processing (2)

components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
lib/llm/src/kv_router/prefill_router.rs

coderabbitai · 2026-03-11T07:51:09Z

components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py

+        routing = inner_request.get("routing") or {}
+        priority = routing.get("priority")
+        dp_rank = routing.get("dp_rank")
+
+        if dp_rank is not None and dp_rank == 2**32 - 1:
+            dp_rank = None


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check decode_handler.py for dp_rank handling rg -n -C5 "dp_rank" components/src/dynamo/sglang/request_handlers/llm/decode_handler.py

Repository: ai-dynamo/dynamo

Length of output: 2097

🏁 Script executed:

# Check how async_generate uses data_parallel_rank rg -n -B2 -A5 "data_parallel_rank" components/src/dynamo/sglang/request_handlers/llm/ --type py | head -100

Repository: ai-dynamo/dynamo

Length of output: 2735

🏁 Script executed:

# Check vllm/handlers.py to understand the lenient routing pattern mentioned in learnings rg -n -B3 -A8 "_to_local_dp_rank" components/src/dynamo/vllm/handlers.py

Repository: ai-dynamo/dynamo

Length of output: 2187

🏁 Script executed:

# Check if there are other handlers that normalize dp_rank fd -e py components/src/dynamo/sglang/request_handlers | xargs rg -l "dp_rank" | head -20

Repository: ai-dynamo/dynamo

Length of output: 1396

🏁 Script executed:

# Search for any shared normalization logic or base handler class rg -n "def.*dp_rank" components/src/dynamo/sglang/request_handlers/ --type py

Repository: ai-dynamo/dynamo

Length of output: 42

🏁 Script executed:

# Check the base handler class if one exists fd -e py . components/src/dynamo/sglang/request_handlers | xargs rg -l "class.*Handler" | head -5

Repository: ai-dynamo/dynamo

Length of output: 454

🏁 Script executed:

# Check when/if the normalization was added to prefill_handler git log -p --all -S "2**32 - 1" -- components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py | head -150

Repository: ai-dynamo/dynamo

Length of output: 42

Apply the same u32::MAX normalization in decode_handler.py.

The prefill_handler normalizes the sentinel value 2**32 - 1 (u32::MAX) to None at lines 121–122 before passing dp_rank to async_generate. However, decode_handler.py extracts dp_rank from routing at lines 138 and 180 without performing this normalization, then passes it directly to async_generate at lines 150 and 190.

Both handlers use the same async_generate interface and should normalize the sentinel value consistently. Add the same normalization check in decode_handler.py:

if dp_rank is not None and dp_rank == 2**32 - 1: dp_rank = None

This ensures the handlers follow the same defensive programming pattern.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py` around lines 117 - 122, decode_handler.py fails to normalize the sentinel u32::MAX value for dp_rank before calling async_generate; find the places where routing.get("dp_rank") is read (the dp_rank extractions around the decode handler branches that later call async_generate) and add the same check used in prefill_handler: if dp_rank is not None and dp_rank == 2**32 - 1: dp_rank = None so that dp_rank passed into async_generate is normalized consistently across handlers.

ishandhanani · 2026-03-13T01:02:01Z

I think we need to wait for new sglang version right? Otherwise LGTM

huitianbai · 2026-03-14T02:59:37Z

I think we need to wait for new sglang version right? Otherwise LGTM

Agree.

ishandhanani · 2026-03-24T12:19:08Z

@PeaBrane can you take a look at this?

When router_mode is not KV, the SimpleRouter path was hardcoding dp_rank to 0, causing prefill to always target the first data parallel rank. Use u32::MAX as a sentinel value instead, and treat it as None on the Python side so SGLang picks the correct rank. Cherry-picked from #7214. Co-Authored-By: huitian bai <baihuitian.bht@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When router_mode is not KV, the SimpleRouter path was hardcoding dp_rank to 0, causing prefill to always target the first data parallel rank. Use u32::MAX as a sentinel value instead, and treat it as None on the Python side so SGLang picks the correct rank. Cherry-picked from #7214. Co-Authored-By: huitian bai <baihuitian.bht@gmail.com>

huitianbai · 2026-03-26T03:17:45Z

merged with #6736

fix: sglang prefill handler should pass dp rank.

5627e16

Signed-off-by: baihuitian <baihuitian.bht@gmail.com>

huitianbai requested review from a team as code owners March 11, 2026 07:45

pull-request-size bot added the size/S label Mar 11, 2026

github-actions bot added fix external-contribution Pull request is from an external contributor backend::sglang Relates to the sglang backend router Relates to routing, KV-aware routing, etc. labels Mar 11, 2026

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

huitianbai mentioned this pull request Mar 24, 2026

feat(sglang): disagg DP rank routing + backwards-compatible network imports #6736

Merged

huitianbai closed this Mar 26, 2026

huitianbai deleted the fix_sglang_prefill_dp_rank branch March 26, 2026 08:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: sglang prefill handler should pass dp rank.#7214

fix: sglang prefill handler should pass dp rank.#7214
huitianbai wants to merge 1 commit intoai-dynamo:mainfrom
huitianbai:fix_sglang_prefill_dp_rank

huitianbai commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

huitianbai commented Mar 11, 2026

Uh oh!

coderabbitai bot commented Mar 11, 2026

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 11, 2026

Uh oh!

ishandhanani commented Mar 13, 2026

Uh oh!

huitianbai commented Mar 14, 2026

Uh oh!

ishandhanani commented Mar 24, 2026

Uh oh!

huitianbai commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

huitianbai commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

huitianbai commented Mar 11, 2026

Uh oh!

coderabbitai bot commented Mar 11, 2026

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

ishandhanani commented Mar 13, 2026

Uh oh!

huitianbai commented Mar 14, 2026

Uh oh!

ishandhanani commented Mar 24, 2026

Uh oh!

huitianbai commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

huitianbai commented Mar 11, 2026 •

edited by coderabbitai bot

Loading