[PD] Unify dsv4 dispatch with swa by ispobock · Pull Request #24888 · sgl-project/sglang

ispobock · 2026-05-10T12:52:45Z

Motivation

PR #23882 introduced an independent state_type="dsv4" discriminator and a dedicated NIXL transport path (_send_state_pages_flat) for V4's heterogeneous state pool. PR #24878 then routed V4 mooncake through the existing ["swa", "nsa"] branch's _send_kvcache_generic, proving empirically that V4's heterogeneous state list (SWA + compress + indexer ring buffers) works correctly with the same generic transfer path used by SWA.

The independent state_type="dsv4" is therefore redundant. Its sole non-trivial consumer — NIXL's _send_state_pages_flat — also hard-asserts src_state_item_lens[i] == dst_state_item_lens[i] per entry, which doesn't hold under MTP (decode-side indexer pool carries an extra EAGLE draft layer). Removing the discriminator routes V4 + NIXL through the more permissive generic path on both backends.

Empirically this also fixes a silent V4 + NIXL + MTP regression (gsm8k: 0.890 → 0.970).

Accuracy

1P+1D V4-Flash, TP=4, gsm8k 200 examples.

backend	MTP	pre-cleanup	post-cleanup
mooncake	no	0.975	0.975
mooncake	yes	(parity)	0.985
nixl	no	0.985	0.980
nixl	yes	0.890	0.970

cc: @ShangmingCai @ch-wan @hnyls2002

gemini-code-assist

Code Review

This pull request unifies DeepSeek-V4 (dsv4) state handling with Sliding Window Attention (swa) by removing specialized dsv4 logic and types across the disaggregation modules. Feedback suggests clarifying a comment in mooncake/conn.py to specify that the restriction on different Tensor Parallel (TP) sizes applies only to non-MLA models, as the current wording is misleading following the unification.

ispobock · 2026-05-10T12:56:47Z

/tag-and-rerun-ci

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ShangmingCai

LGTM

ispobock · 2026-05-10T13:13:31Z

/rerun-test test/registered/disaggregation/test_disaggregation_basic.py::TestDisaggregationAccuracy test/registered/disaggregation/test_disaggregation_basic.py::TestDisaggregationMooncakeSpec test/registered/disaggregation/test_disaggregation_xpu.py::TestDisaggregationNixlBasic test/registered/distributed/test_disaggregation_different_tp.py test/registered/distributed/test_disaggregation_pp.py

github-actions · 2026-05-10T13:13:52Z

🚀 2-gpu-h100 (2 tests): ✅ View workflow run

cd test/ && python3 registered/disaggregation/test_disaggregation_basic.py TestDisaggregationAccuracy
cd test/ && python3 registered/disaggregation/test_disaggregation_basic.py TestDisaggregationMooncakeSpec

🚀 1-gpu-5090 (1 test): ✅ View workflow run

cd test/ && python3 registered/disaggregation/test_disaggregation_xpu.py TestDisaggregationNixlBasic

🚀 8-gpu-h20 (2 tests): ❌ View workflow run

cd test/ && python3 registered/distributed/test_disaggregation_different_tp.py
cd test/ && python3 registered/distributed/test_disaggregation_pp.py

* main: (87 commits) [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629) fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217) Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742) Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785) [NPU]Documentation update for communications quantization feature (sgl-project#24668) [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851) [Model] Add MiniCPM-V 4.6 support (sgl-project#24855) Support Intern-S2-Preview (sgl-project#24875) [PD] Unify dsv4 dispatch with swa (sgl-project#24888) Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775) Fix PD bootstrap failure handling (sgl-project#24772) [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881) [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878) [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859) [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696) [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826) [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456) [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854) [diffusion] CI: add cache-dit CI tests (sgl-project#19213) [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767) ... # Conflicts: # python/sglang/srt/utils/common.py

unify dsv4 dispatch with swa

1f54751

ispobock requested review from ByronHsu, ShangmingCai and hnyls2002 as code owners May 10, 2026 12:52

gemini-code-assist Bot reviewed May 10, 2026

View reviewed changes

Comment thread python/sglang/srt/disaggregation/mooncake/conn.py Outdated

github-actions Bot added the run-ci label May 10, 2026

update

9b75746

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ShangmingCai approved these changes May 10, 2026

View reviewed changes

ispobock merged commit 59faf98 into main May 10, 2026
111 of 147 checks passed

ispobock deleted the cleanup-dsv4-state-type branch May 10, 2026 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PD] Unify dsv4 dispatch with swa#24888

[PD] Unify dsv4 dispatch with swa#24888
ispobock merged 2 commits intomainfrom
cleanup-dsv4-state-type

ispobock commented May 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

ispobock commented May 10, 2026

Uh oh!

ShangmingCai left a comment

Uh oh!

ispobock commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ispobock commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Accuracy

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ispobock commented May 10, 2026

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

ispobock commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ispobock commented May 10, 2026 •

edited

Loading

github-actions Bot commented May 10, 2026 •

edited

Loading