[PD-Disagg] Support query dp rank from bootstrap server. by hnyls2002 · Pull Request #19168 · sgl-project/sglang

hnyls2002 · 2026-02-23T00:15:49Z

Motivation

Based on PR #14726 by @changhuaixin. Allows prefill servers to use load balance methods other than follow_bootstrap_room (e.g., round_robin) in PD-disaggregation mode, which significantly reduces TTFT under load (see benchmarks in #14726).

When prefill uses non-deterministic load balancing, the decode server cannot infer which prefill DP rank processed a request. This PR adds per-request dp_rank synchronization through the bootstrap server.

Modifications

Bootstrap server: New POST /register_dp_rank and POST /query_dp_ranks endpoints with TTL-based cleanup. Parallel info now includes follow_bootstrap_room flag so decode knows the prefill's LB strategy.
Prefill sender: Registers dp_rank per request when not using follow_bootstrap_room.
Decode side: Resolves dp_rank before creating the KV receiver. Requests whose dp_rank is not yet available are held in a pending_reqs queue and batch-resolved each scheduler loop.
Connection-level init: Extracted ensure_parallel_info() on CommonKVManager to decouple parallel info fetching from receiver creation, shared by both DecodePreallocQueue and CommonKVReceiver.
Naming cleanup: Removed redundant target_dp_group alias; unified to prefill_dp_rank.

Design difference from #14726

dp_rank is resolved before receiver creation, not inside init(). This means:

No half-initialized receiver state or should_notify_dp_rank flag
init() signature unchanged — zero changes to mooncake/nixl/fake backends
pop_preallocated loop has no dp_rank awareness
When using follow_bootstrap_room, dp_rank is computed locally (bootstrap_room % dp_size) with no HTTP overhead — the original PR always queries the bootstrap server

Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>

gemini-code-assist · 2026-02-23T00:16:10Z

Summary of Changes

Hello @hnyls2002, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the PD-Disaggregation feature by introducing dynamic data parallel (DP) rank resolution capabilities. It allows decode instances to actively communicate with the bootstrap server to register and query DP ranks, moving beyond static assignments. The bootstrap server itself has been upgraded with new API endpoints and a crucial cleanup mechanism to maintain the integrity of registered entries. These changes aim to improve the robustness and flexibility of distributed KV cache management, particularly in scenarios involving complex load balancing strategies.

Highlights

Dynamic DP Rank Resolution: Implemented a mechanism for decode instances to query and register their data parallel (DP) ranks with the bootstrap server, allowing for more flexible load balancing beyond simple bootstrap_room modulo.
Bootstrap Server Enhancements: The bootstrap server now supports new API endpoints (/register_dp_rank, /query_dp_ranks) to facilitate DP rank management and includes a periodic cleanup task for expired entries.
Configuration and Documentation: Introduced a new environment variable (SGLANG_DISAGGREGATION_BOOTSTRAP_ENTRY_CLEANUP_INTERVAL) to control the cleanup frequency, and updated documentation.
Code Refactoring: Streamlined the KVSender and KVReceiver classes by making register_to_bootstrap a public abstract method and adjusting internal logic for parallel information retrieval.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/advanced_features/pd_disaggregation.md
- Added documentation for the new SGLANG_DISAGGREGATION_BOOTSTRAP_ENTRY_CLEANUP_INTERVAL environment variable.
python/sglang/srt/disaggregation/base/conn.py
- Added an abstract method register_to_bootstrap to BaseKVSender.
python/sglang/srt/disaggregation/common/conn.py
- Imported time and envs modules.
- Added server_args to KVSender initialization.
- Renamed _register_to_bootstrap to register_to_bootstrap and made it a public method.
- Included load_balance_method in the payload when registering to the bootstrap server.
- Added follow_bootstrap_room_table to KVSender to store load balancing information.
- Introduced _register_prefill_dp_rank method in KVReceiver to register the prefill DP rank with the bootstrap server.
- Modified _get_prefill_parallel_info_from_server to return the follow_bootstrap_room status.
- Refactored prefill_dp_rank assignment logic and introduced _setup_bootstrap_infos in KVReceiver.
- Updated the bootstrap_key generation to use prefill_dp_rank instead of target_dp_group.
- Modified calls to _get_bootstrap_info_from_server to use prefill_dp_rank.
- Updated CommonKVBootstrapServer constructor to accept dp_size.
- Added follow_bootstrap_room, room_to_dp_rank, and entry_cleanup_interval attributes to CommonKVBootstrapServer.
- Added new routes /register_dp_rank and /query_dp_ranks to the bootstrap server.
- Implemented _handle_register_dp_rank and _handle_query_dp_ranks handlers for the new routes.
- Included follow_bootstrap_room in the response from the /route endpoint.
- Implemented _cleanup_expired_entries as an asynchronous task for periodic cleanup of room_to_dp_rank entries.
- Added a static method query_prefill_dp_ranks to CommonKVReceiver for batch querying DP ranks.
python/sglang/srt/disaggregation/decode.py
- Added a pending_reqs list to DecodeScheduler to hold requests awaiting DP rank resolution.
- Introduced _resolve_dp_rank to determine the DP rank for a request.
- Created _create_receiver_and_enqueue to encapsulate receiver creation and request enqueuing.
- Modified the add method to use the new DP rank resolution and queuing logic.
- Added _resolve_pending_reqs to batch-resolve DP ranks for pending requests.
- Called _resolve_pending_reqs in pop_preallocated to process pending requests.
python/sglang/srt/environ.py
- Added SGLANG_DISAGGREGATION_BOOTSTRAP_ENTRY_CLEANUP_INTERVAL environment variable with a default value of 120 seconds.
python/sglang/srt/managers/disagg_service.py
- Passed the dp_size argument to the kv_bootstrap_server_class constructor.
python/sglang/srt/server_args.py
- Removed the backward compatibility warning for the round_robin load balancing method in PD prefill mode.

Activity

The pull request is currently marked as 'WIP' (Work In Progress), indicating that the author is still actively developing and refining the changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for querying the data parallelism (DP) rank from the bootstrap server, which is a key feature for enabling more flexible load balancing strategies like round_robin in a prefill/decode disaggregated setup. The changes include adding new API endpoints to the bootstrap server for registering and querying DP ranks, along with a cleanup mechanism for stale entries. The decode scheduler is updated to handle requests that need to query the DP rank by queueing them and processing them in batches. The changes are well-structured, but there are a couple of issues regarding a new dp_size parameter that is introduced but not correctly handled in the class hierarchy and is unused in the constructor.

python/sglang/srt/managers/disagg_service.py

python/sglang/srt/disaggregation/common/conn.py

hnyls2002 · 2026-02-23T00:44:46Z

/tag-and-rerun-ci

python/sglang/srt/disaggregation/common/conn.py

ShangmingCai

CI looks good. Failed amd tests are irrelevant.

…#19168) Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com> Co-authored-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>

…o xverse_moe * 'xverse_moe' of https://github.com/xiaobaicxy/sglang: (275 commits) fix: add missing blank line after docstring in serving_transcription.py (sgl-project#19206) Whisper model support & `/v1/audio/transcriptions` endpoint & benchmark (sgl-project#16983) fix: patch docker image fixes (sgl-project#19100) [PD-Disagg] Unify prefill info data transition flow, all with `PrefillServerInfo` (sgl-project#19195) [CI] Tiny enhance the dp attention load blance benchmark (sgl-project#19194) add new ci user (sgl-project#19133) [CI] fix the teardown output of disaggregation test (sgl-project#19193) [PD-Disagg] Support query dp rank from bootstrap server. (sgl-project#19168) [Kernel Slimming] Migrate AWQ marlin repack kernel to JIT (sgl-project#18949) [Diffusion] Match rotary_embedding module name style (sgl-project#19179) [Refactor] Split rotary_embedding.py into a modular package (sgl-project#19144) [NPU] bump sgl-kernel-npu to 2026.02.01.post2 (sgl-project#19178) Use single mma warp group for short q_len in FA to optimize decoding performance (sgl-project#18985) Reorganize topk logic to clean up code and expose logical experts (sgl-project#16945) [ROCm] Use unreg path for custom all-reduce during CUDA graph capture (sgl-project#19162) [diffusion] feat: detect Flux2 custom VAE path from component_paths (sgl-project#19170) [AMD] ENV flags tuning and cleanup (sgl-project#19176) Fix bench_one_batch_server by moving the print statements (sgl-project#19175) Update rocm7.2 Dockerfile to install amdsmi for QuickReduce Initialization (sgl-project#19091) Revert "Refactor graph input buffers (sgl-project#18991)" (sgl-project#19173) ...

The prefill handler was missing the data_parallel_rank parameter in its async_generate call, preventing DP rank-aware routing from working in disaggregated mode. The decode handler already passes this correctly. Extract dp_rank from the routing info (set by the KV router in prefill_router.rs) and forward it to SGLang's engine so the prefill scheduler directs work to the correct DP rank. This works in conjunction with sgl-project/sglang#19168, which adds per-request DP rank resolution on the SGLang side -- the decode worker can now resolve the prefill DP rank via the bootstrap server rather than relying on bootstrap_room % dp_size.

…#19168) Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com> Co-authored-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>

changhuaixin and others added 5 commits December 31, 2025 23:11

Use bootstrap server to notify prefill dp rank

08edc50

Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>

Merge branch 'main' into changhuaixin/poc_for_get_prefill_dp_rank

9fec5c4

add API definition

5947607

rename

c5181b1

refactor the impl

22aaef7

hnyls2002 requested review from ByronHsu, ShangmingCai, Ying1123, merrymercy and xiezhq-hermann as code owners February 23, 2026 00:15

github-actions bot added the documentation Improvements or additions to documentation label Feb 23, 2026

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

python/sglang/srt/managers/disagg_service.py Show resolved Hide resolved

python/sglang/srt/disaggregation/common/conn.py Show resolved Hide resolved

hnyls2002 added 6 commits February 22, 2026 16:18

Merge branch 'main' into lsyin/support-query-dp-rank

86dbd02

add comments

b6e9c80

remove unneeded if

860e66c

fix wrong dispatch

79b404f

rename

a55f526

fix baseclass protocol

a6d1971

github-actions bot added the run-ci label Feb 23, 2026

hnyls2002 added 3 commits February 22, 2026 22:13

fix

0b6cbfb

add round robin ci test

3f4957e

add comments

37f1a1a

hnyls2002 added the high priority label Feb 23, 2026

ShangmingCai reviewed Feb 23, 2026

View reviewed changes

python/sglang/srt/disaggregation/common/conn.py Outdated Show resolved Hide resolved

ShangmingCai reviewed Feb 23, 2026

View reviewed changes

python/sglang/srt/disaggregation/common/conn.py Show resolved Hide resolved

hnyls2002 added 2 commits February 23, 2026 01:19

init dp_size through __init__

d28611b

clean resources

f60a00c

ShangmingCai approved these changes Feb 23, 2026

View reviewed changes

hnyls2002 merged commit 2274bfe into main Feb 23, 2026
238 of 258 checks passed

hnyls2002 deleted the lsyin/support-query-dp-rank branch February 23, 2026 18:59

ishandhanani mentioned this pull request Feb 23, 2026

add remote dp rank for disaggregation. #18230

Closed

hnyls2002 mentioned this pull request Feb 24, 2026

[PD-Disagg] Fully support external DP dispatch w/ PD-disaggregation mode. #19268

Merged

This was referenced Feb 24, 2026

[Bug] Qwen3.5 + pddisagg: Error querying dp_ranks from bootstrap in all disagg configs #19271

Closed

[Bug][Qwen3.5] Tracking: Qwen3.5 model issues with PD disaggregation #19045

Open

ishandhanani mentioned this pull request Mar 1, 2026

feat(sglang): disagg DP rank routing + backwards-compatible network imports ai-dynamo/dynamo#6736

Merged

weireweire mentioned this pull request Mar 24, 2026

[Bug] PD disaggregation can hang with total_requests load balancing #21297

Closed

ShangmingCai mentioned this pull request Mar 30, 2026

[Roadmap] Prefill-Decode Disaggregation Roadmap (2026 Q2) #21703

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PD-Disagg] Support query dp rank from bootstrap server.#19168

[PD-Disagg] Support query dp rank from bootstrap server.#19168
hnyls2002 merged 16 commits intomainfrom
lsyin/support-query-dp-rank

hnyls2002 commented Feb 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

hnyls2002 commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

ShangmingCai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hnyls2002 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Design difference from #14726

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

hnyls2002 commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hnyls2002 commented Feb 23, 2026 •

edited

Loading