Skip to content

[PD] Unify dsv4 dispatch with swa#24888

Merged
ispobock merged 2 commits intomainfrom
cleanup-dsv4-state-type
May 10, 2026
Merged

[PD] Unify dsv4 dispatch with swa#24888
ispobock merged 2 commits intomainfrom
cleanup-dsv4-state-type

Conversation

@ispobock
Copy link
Copy Markdown
Collaborator

@ispobock ispobock commented May 10, 2026

Motivation

PR #23882 introduced an independent state_type="dsv4" discriminator and a dedicated NIXL transport path (_send_state_pages_flat) for V4's heterogeneous state pool. PR #24878 then routed V4 mooncake through the existing ["swa", "nsa"] branch's _send_kvcache_generic, proving empirically that V4's heterogeneous state list (SWA + compress + indexer ring buffers) works correctly with the same generic transfer path used by SWA.

The independent state_type="dsv4" is therefore redundant. Its sole non-trivial consumer — NIXL's _send_state_pages_flat — also hard-asserts src_state_item_lens[i] == dst_state_item_lens[i] per entry, which doesn't hold under MTP (decode-side indexer pool carries an extra EAGLE draft layer). Removing the discriminator routes V4 + NIXL through the more permissive generic path on both backends.

Empirically this also fixes a silent V4 + NIXL + MTP regression (gsm8k: 0.890 → 0.970).

Accuracy

1P+1D V4-Flash, TP=4, gsm8k 200 examples.

backend MTP pre-cleanup post-cleanup
mooncake no 0.975 0.975
mooncake yes (parity) 0.985
nixl no 0.985 0.980
nixl yes 0.890 0.970

cc: @ShangmingCai @ch-wan @hnyls2002

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request unifies DeepSeek-V4 (dsv4) state handling with Sliding Window Attention (swa) by removing specialized dsv4 logic and types across the disaggregation modules. Feedback suggests clarifying a comment in mooncake/conn.py to specify that the restriction on different Tensor Parallel (TP) sizes applies only to non-MLA models, as the current wording is misleading following the unification.

Comment thread python/sglang/srt/disaggregation/mooncake/conn.py Outdated
@ispobock
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ispobock
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/disaggregation/test_disaggregation_basic.py::TestDisaggregationAccuracy test/registered/disaggregation/test_disaggregation_basic.py::TestDisaggregationMooncakeSpec test/registered/disaggregation/test_disaggregation_xpu.py::TestDisaggregationNixlBasic test/registered/distributed/test_disaggregation_different_tp.py test/registered/distributed/test_disaggregation_pp.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 10, 2026

🚀 2-gpu-h100 (2 tests): ✅ View workflow run

cd test/ && python3 registered/disaggregation/test_disaggregation_basic.py TestDisaggregationAccuracy
cd test/ && python3 registered/disaggregation/test_disaggregation_basic.py TestDisaggregationMooncakeSpec

🚀 1-gpu-5090 (1 test): ✅ View workflow run

cd test/ && python3 registered/disaggregation/test_disaggregation_xpu.py TestDisaggregationNixlBasic

🚀 8-gpu-h20 (2 tests): ❌ View workflow run

cd test/ && python3 registered/distributed/test_disaggregation_different_tp.py
cd test/ && python3 registered/distributed/test_disaggregation_pp.py

@ispobock ispobock merged commit 59faf98 into main May 10, 2026
111 of 147 checks passed
@ispobock ispobock deleted the cleanup-dsv4-state-type branch May 10, 2026 14:01
ltcs11 added a commit to ltcs11/sglang that referenced this pull request May 11, 2026
* main: (87 commits)
  [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629)
  fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217)
  Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742)
  Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785)
  [NPU]Documentation update for communications quantization feature (sgl-project#24668)
  [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851)
  [Model] Add MiniCPM-V 4.6 support (sgl-project#24855)
  Support Intern-S2-Preview (sgl-project#24875)
  [PD] Unify dsv4 dispatch with swa (sgl-project#24888)
  Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775)
  Fix PD bootstrap failure handling (sgl-project#24772)
  [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881)
  [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878)
  [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859)
  [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696)
  [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826)
  [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456)
  [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854)
  [diffusion] CI: add cache-dit CI tests (sgl-project#19213)
  [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767)
  ...

# Conflicts:
#	python/sglang/srt/utils/common.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants