[PD-Disagg] Fully support external DP dispatch w/ PD-disaggregation mode.#19268
Merged
[PD-Disagg] Fully support external DP dispatch w/ PD-disaggregation mode.#19268
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Collaborator
Author
|
/rerun-stage stage-c-test-8-gpu-h20 |
Contributor
|
✅ Triggered |
Contributor
Collaborator
Author
|
/tag-and-rerun-ci |
Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>
6f2567e to
9c4d6ad
Compare
Collaborator
Author
|
All related CIs (except for B200s) passed: https://github.com/sgl-project/sglang/actions/runs/22377079442/job/64781128983 |
Contributor
|
@hnyls2002 How about adding an optional HTTP header, i.e. |
Collaborator
Author
|
@doujiang24 Please submit a PR, thanks. |
5 tasks
magicYang1573
pushed a commit
to magicYang1573/sglang
that referenced
this pull request
Mar 9, 2026
…ode. (sgl-project#19268) Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>
This was referenced Mar 12, 2026
Wangzheee
pushed a commit
to Wangzheee/sglang
that referenced
this pull request
Mar 21, 2026
…ode. (sgl-project#19268) Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>
JustinTong0323
pushed a commit
to JustinTong0323/sglang
that referenced
this pull request
Apr 7, 2026
…ode. (sgl-project#19268) Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
API: rename
data_parallel_rank→routed_dp_rank, adddisagg_prefill_dp_rankdata_parallel_rank→routed_dp_rankacross the request pipeline to clarify it is a routing directive from external routers, not an infrastructure propertydisagg_prefill_dp_rankfield for decode servers — external router can specify which prefill DP worker holds the KV cache, skipping bootstrap server queriesdata_parallel_rankas a deprecated alias in all public API surfaces withDeprecationWarningDecode-side fix
_resolve_dp_rank→_resolve_prefill_dp_rankand remove incorrectdata_parallel_rankcheck — the old code conflated decode-side DP routing rank with the prefill DP rank needed for KV transfer (never triggered because the field was alwaysNone)_resolve_prefill_dp_ranknow checksdisagg_prefill_dp_rankfirst, then falls back to existing bootstrap server resolution ([PD-Disagg] Support query dp rank from bootstrap server. #19168)Motivation: split an overloaded field into two
On main,
data_parallel_rankis consumed by two places with different semantics:DataParallelController.maybe_external_dp_rank_routing— treats it as "which DP worker should handle this request" (routing)DecodePreallocQueue._resolve_dp_rank— treats it as "which prefill DP worker has the KV cache" (KV transfer)Meanwhile,
prefill_dp_rankonly existed as an internal variable name inside the KV transfer layer (_create_receiver_and_enqueue), never as a request-level field.This PR splits the single overloaded field into two with clear semantics:
routed_dp_rank— consumed only byDataParallelControllerfor DP worker routingdisagg_prefill_dp_rank— consumed only by_resolve_prefill_dp_rankfor KV transfer, now exposed as a public API field so external routers can specify it directlyPropagation
routed_dp_rank+disagg_prefill_dp_rankthroughTokenizedGenerateReqInput,Req,tokenizer_manager,scheduler,encode_receiverDataParallelController.maybe_external_dp_rank_routingusesreq.routed_dp_rankBackward compatibility
data_parallel_rankis preserved as a deprecated alias at every public API layer. Callers using the old field name (including sgl-model-gateway Rust/gRPC) continue to work without changes.CompletionRequestprotocol.pymodel_validator(mode="before")merges intorouted_dp_rank+ warnsChatCompletionRequestprotocol.pyGenerateReqInput(/generate)io_struct.pynormalize_batch_and_arguments()merges + warnsEngine.generate()engine.pyEngine.async_generate()engine.pyEngineBase.generate()EngineBase.pyInternal structs (
TokenizedGenerateReqInput,TokenizedEmbeddingReqInput,Req) are renamed directly — no alias needed since they are not public API.Testing
dp_rankinto responsemeta_infoso external routers can verify routing correctness--test-external-dp-routingto mini-lb: randomly assignsrouted_dp_rank/disagg_prefill_dp_rank, asserts decode responsedp_rankmatches (prefill correctness verified implicitly via KV transfer)TestDisaggregationDPAttentionExternalRoutingtest class (currently skipped pending docker image update)