Skip to content

[PD-Disagg] Fully support external DP dispatch w/ PD-disaggregation mode.#19268

Merged
hnyls2002 merged 11 commits intomainfrom
lsyin/external-dp-rank
Feb 25, 2026
Merged

[PD-Disagg] Fully support external DP dispatch w/ PD-disaggregation mode.#19268
hnyls2002 merged 11 commits intomainfrom
lsyin/external-dp-rank

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Feb 24, 2026

Summary

API: rename data_parallel_rankrouted_dp_rank, add disagg_prefill_dp_rank

  • Rename data_parallel_rankrouted_dp_rank across the request pipeline to clarify it is a routing directive from external routers, not an infrastructure property
  • Add disagg_prefill_dp_rank field for decode servers — external router can specify which prefill DP worker holds the KV cache, skipping bootstrap server queries
  • Keep data_parallel_rank as a deprecated alias in all public API surfaces with DeprecationWarning

Decode-side fix

  • Rename _resolve_dp_rank_resolve_prefill_dp_rank and remove incorrect data_parallel_rank check — the old code conflated decode-side DP routing rank with the prefill DP rank needed for KV transfer (never triggered because the field was always None)
  • _resolve_prefill_dp_rank now checks disagg_prefill_dp_rank first, then falls back to existing bootstrap server resolution ([PD-Disagg] Support query dp rank from bootstrap server. #19168)

Motivation: split an overloaded field into two

On main, data_parallel_rank is consumed by two places with different semantics:

  1. DataParallelController.maybe_external_dp_rank_routing — treats it as "which DP worker should handle this request" (routing)
  2. DecodePreallocQueue._resolve_dp_rank — treats it as "which prefill DP worker has the KV cache" (KV transfer)

Meanwhile, prefill_dp_rank only existed as an internal variable name inside the KV transfer layer (_create_receiver_and_enqueue), never as a request-level field.

This PR splits the single overloaded field into two with clear semantics:

  • routed_dp_rank — consumed only by DataParallelController for DP worker routing
  • disagg_prefill_dp_rank — consumed only by _resolve_prefill_dp_rank for KV transfer, now exposed as a public API field so external routers can specify it directly

Propagation

  • Thread routed_dp_rank + disagg_prefill_dp_rank through TokenizedGenerateReqInput, Req, tokenizer_manager, scheduler, encode_receiver
  • DataParallelController.maybe_external_dp_rank_routing uses req.routed_dp_rank

Backward compatibility

data_parallel_rank is preserved as a deprecated alias at every public API layer. Callers using the old field name (including sgl-model-gateway Rust/gRPC) continue to work without changes.

API surface File Compat mechanism
CompletionRequest protocol.py model_validator(mode="before") merges into routed_dp_rank + warns
ChatCompletionRequest protocol.py same
GenerateReqInput (/generate) io_struct.py normalize_batch_and_arguments() merges + warns
Engine.generate() engine.py function param kept, merged before use + warns
Engine.async_generate() engine.py same
EngineBase.generate() EngineBase.py abstract signature includes both old and new params

Internal structs (TokenizedGenerateReqInput, TokenizedEmbeddingReqInput, Req) are renamed directly — no alias needed since they are not public API.

Testing

  • Propagate scheduler's dp_rank into response meta_info so external routers can verify routing correctness
  • Add --test-external-dp-routing to mini-lb: randomly assigns routed_dp_rank / disagg_prefill_dp_rank, asserts decode response dp_rank matches (prefill correctness verified implicitly via KV transfer)
  • Add TestDisaggregationDPAttentionExternalRouting test class (currently skipped pending docker image update)

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hnyls2002
Copy link
Copy Markdown
Collaborator Author

/rerun-stage stage-c-test-8-gpu-h20

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

🔗 View workflow run

@hnyls2002
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

hnyls2002 and others added 2 commits February 24, 2026 16:55
Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>
@hnyls2002 hnyls2002 force-pushed the lsyin/external-dp-rank branch from 6f2567e to 9c4d6ad Compare February 25, 2026 03:51
@hnyls2002
Copy link
Copy Markdown
Collaborator Author

@hnyls2002 hnyls2002 merged commit 539f772 into main Feb 25, 2026
35 of 76 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/external-dp-rank branch February 25, 2026 03:58
@doujiang24
Copy link
Copy Markdown
Contributor

doujiang24 commented Mar 3, 2026

@hnyls2002 How about adding an optional HTTP header, i.e. X-data-parallel-rank, which has higher priority to specify the dp-rank than in the request-body payload?
It could be more friendly for external router? Thanks.

@hnyls2002
Copy link
Copy Markdown
Collaborator Author

@doujiang24 Please submit a PR, thanks.

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
…ode. (sgl-project#19268)

Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
…ode. (sgl-project#19268)

Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
…ode. (sgl-project#19268)

Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants