Skip to content

fix: sglang prefill handler should pass dp rank.#7214

Closed
huitianbai wants to merge 1 commit intoai-dynamo:mainfrom
huitianbai:fix_sglang_prefill_dp_rank
Closed

fix: sglang prefill handler should pass dp rank.#7214
huitianbai wants to merge 1 commit intoai-dynamo:mainfrom
huitianbai:fix_sglang_prefill_dp_rank

Conversation

@huitianbai
Copy link
Copy Markdown
Contributor

@huitianbai huitianbai commented Mar 11, 2026

For sglang, prefill kv router select data parallel rank should pass into sglagn engine.

Also to avoid no kv router mode always select dp 0.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced routing data extraction and configuration handling for distributed request processing
    • Optimized data parallel rank normalization for distributed system deployments to address edge-case scenarios

Signed-off-by: baihuitian <baihuitian.bht@gmail.com>
@huitianbai huitianbai requested review from a team as code owners March 11, 2026 07:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi huitianbai! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added fix external-contribution Pull request is from an external contributor backend::sglang Relates to the sglang backend router Relates to routing, KV-aware routing, etc. labels Mar 11, 2026
@huitianbai
Copy link
Copy Markdown
Contributor Author

@ishandhanani hi, I made this small change to make data_parallel_rank can be used in sglang prefill engine.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 11, 2026

Walkthrough

The changes implement explicit data parallel rank extraction and normalization across the request handling pipeline. The Rust router now returns u32::MAX as a marker value for the data parallel rank, which the Python handler extracts, normalizes to None when appropriate, and passes to the engine via a new parameter.

Changes

Cohort / File(s) Summary
Data Parallel Rank Extraction and Normalization
lib/llm/src/kv_router/prefill_router.rs, components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
Rust router returns u32::MAX for dp_rank in SimpleRouter branch; Python handler extracts routing data, normalizes u32::MAX to None, and passes dp_rank to engine.async_generate via new data_parallel_rank parameter.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A rank flows down the rabbit hole,
From MAX to None, we reach our goal,
With routing data crisp and clear,
The prefill handler holds it dear,
Through engine gates, the data flows—
Where parallel ranks go, nobody knows! 🎯

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description is incomplete, missing required template sections (Overview, Details, Where to start, Related Issues) and contains grammatical issues. Expand description with all template sections: add Overview section, elaborate Details about routing changes, specify files to review, and include Related Issues section.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: passing data parallel rank from the prefill router to the sglang engine handler.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (1)

126-137: ⚠️ Potential issue | 🟠 Major

Update to use routed_dp_rank instead of deprecated data_parallel_rank parameter.

SGLang Engine accepts the data_parallel_rank parameter, but it is deprecated. SGLang recommends using routed_dp_rank instead to avoid deprecation warnings. Update line 135 to pass routed_dp_rank=dp_rank instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py` around
lines 126 - 137, The call to self.engine.async_generate is passing the
deprecated data_parallel_rank parameter; change the invocation in
prefill_handler.py to pass routed_dp_rank=dp_rank instead of
data_parallel_rank=dp_rank (keep all other kwargs the same and retain the helper
_priority_kwargs and existing parameters like sampling_params, stream,
bootstrap_host, bootstrap_port, bootstrap_room, external_trace_header, rid, and
trace_header).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py`:
- Around line 117-122: decode_handler.py fails to normalize the sentinel
u32::MAX value for dp_rank before calling async_generate; find the places where
routing.get("dp_rank") is read (the dp_rank extractions around the decode
handler branches that later call async_generate) and add the same check used in
prefill_handler: if dp_rank is not None and dp_rank == 2**32 - 1: dp_rank = None
so that dp_rank passed into async_generate is normalized consistently across
handlers.

---

Outside diff comments:
In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py`:
- Around line 126-137: The call to self.engine.async_generate is passing the
deprecated data_parallel_rank parameter; change the invocation in
prefill_handler.py to pass routed_dp_rank=dp_rank instead of
data_parallel_rank=dp_rank (keep all other kwargs the same and retain the helper
_priority_kwargs and existing parameters like sampling_params, stream,
bootstrap_host, bootstrap_port, bootstrap_room, external_trace_header, rid, and
trace_header).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 727419cc-0157-4aa8-8212-4da9f2546137

📥 Commits

Reviewing files that changed from the base of the PR and between e930526 and 5627e16.

📒 Files selected for processing (2)
  • components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
  • lib/llm/src/kv_router/prefill_router.rs

Comment on lines +117 to +122
routing = inner_request.get("routing") or {}
priority = routing.get("priority")
dp_rank = routing.get("dp_rank")

if dp_rank is not None and dp_rank == 2**32 - 1:
dp_rank = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check decode_handler.py for dp_rank handling
rg -n -C5 "dp_rank" components/src/dynamo/sglang/request_handlers/llm/decode_handler.py

Repository: ai-dynamo/dynamo

Length of output: 2097


🏁 Script executed:

# Check how async_generate uses data_parallel_rank
rg -n -B2 -A5 "data_parallel_rank" components/src/dynamo/sglang/request_handlers/llm/ --type py | head -100

Repository: ai-dynamo/dynamo

Length of output: 2735


🏁 Script executed:

# Check vllm/handlers.py to understand the lenient routing pattern mentioned in learnings
rg -n -B3 -A8 "_to_local_dp_rank" components/src/dynamo/vllm/handlers.py

Repository: ai-dynamo/dynamo

Length of output: 2187


🏁 Script executed:

# Check if there are other handlers that normalize dp_rank
fd -e py components/src/dynamo/sglang/request_handlers | xargs rg -l "dp_rank" | head -20

Repository: ai-dynamo/dynamo

Length of output: 1396


🏁 Script executed:

# Search for any shared normalization logic or base handler class
rg -n "def.*dp_rank" components/src/dynamo/sglang/request_handlers/ --type py

Repository: ai-dynamo/dynamo

Length of output: 42


🏁 Script executed:

# Check the base handler class if one exists
fd -e py . components/src/dynamo/sglang/request_handlers | xargs rg -l "class.*Handler" | head -5

Repository: ai-dynamo/dynamo

Length of output: 454


🏁 Script executed:

# Check when/if the normalization was added to prefill_handler
git log -p --all -S "2**32 - 1" -- components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py | head -150

Repository: ai-dynamo/dynamo

Length of output: 42


Apply the same u32::MAX normalization in decode_handler.py.

The prefill_handler normalizes the sentinel value 2**32 - 1 (u32::MAX) to None at lines 121–122 before passing dp_rank to async_generate. However, decode_handler.py extracts dp_rank from routing at lines 138 and 180 without performing this normalization, then passes it directly to async_generate at lines 150 and 190.

Both handlers use the same async_generate interface and should normalize the sentinel value consistently. Add the same normalization check in decode_handler.py:

if dp_rank is not None and dp_rank == 2**32 - 1:
    dp_rank = None

This ensures the handlers follow the same defensive programming pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py` around
lines 117 - 122, decode_handler.py fails to normalize the sentinel u32::MAX
value for dp_rank before calling async_generate; find the places where
routing.get("dp_rank") is read (the dp_rank extractions around the decode
handler branches that later call async_generate) and add the same check used in
prefill_handler: if dp_rank is not None and dp_rank == 2**32 - 1: dp_rank = None
so that dp_rank passed into async_generate is normalized consistently across
handlers.

@ishandhanani
Copy link
Copy Markdown
Contributor

I think we need to wait for new sglang version right? Otherwise LGTM

@huitianbai
Copy link
Copy Markdown
Contributor Author

I think we need to wait for new sglang version right? Otherwise LGTM

Agree.

@ishandhanani
Copy link
Copy Markdown
Contributor

@PeaBrane can you take a look at this?

ishandhanani added a commit that referenced this pull request Mar 25, 2026
When router_mode is not KV, the SimpleRouter path was hardcoding dp_rank
to 0, causing prefill to always target the first data parallel rank. Use
u32::MAX as a sentinel value instead, and treat it as None on the Python
side so SGLang picks the correct rank.

Cherry-picked from #7214.

Co-Authored-By: huitian bai <baihuitian.bht@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ishandhanani added a commit that referenced this pull request Mar 25, 2026
When router_mode is not KV, the SimpleRouter path was hardcoding dp_rank
to 0, causing prefill to always target the first data parallel rank. Use
u32::MAX as a sentinel value instead, and treat it as None on the Python
side so SGLang picks the correct rank.

Cherry-picked from #7214.

Co-Authored-By: huitian bai <baihuitian.bht@gmail.com>
@huitianbai
Copy link
Copy Markdown
Contributor Author

merged with #6736

@huitianbai huitianbai closed this Mar 26, 2026
@huitianbai huitianbai deleted the fix_sglang_prefill_dp_rank branch March 26, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend external-contribution Pull request is from an external contributor fix router Relates to routing, KV-aware routing, etc. size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants