[PD][MoRI] Align hybrid state transfer with per-component schema by maning00 · Pull Request #26539 · sgl-project/sglang

maning00 · 2026-05-28T05:43:50Z

Motivation

PR #24932 ([PD] Refactor hybrid state transfer) migrated KVArgs from a flat state layout (state_type: str, state_item_lens: List[int], state_dim_per_tensor: List[int]) to a per-component one (state_types: List[StateType], both *_item_lens / *_dim_per_tensor become List[List[int]]). Mooncake and NIXL were migrated to the new schema in the same PR, but MoRI was only partially migrated — the inner-loop in _register_local_buffers was updated, while _register_kv_args, send_state, send_metadata, TransferInfo, and KVArgsRegisterInfo were left on the old flat assumption.

For any model with a non-empty state pool (DeepSeek V4, GLM-5, Qwen3.5) this manifests as struct.error: required argument is not an integer at PD bootstrap (#26525), because _register_kv_args does struct.pack("I", item_len) on what is now a list. A flatten-on-send hack would silence that crash but still routes Mamba state buffers through the SWA/DSA contiguous-page logic on multi-component hybrids, so this change aligns MoRI with the per-component dispatch model Mooncake and NIXL already use.

Modifications

Wire format: switch state_item_lens / state_dim_per_tensor to pack_int_lists("I") / unpack_int_lists("I"), switch state_indices to pack_int_lists("i") / unpack_int_lists("i"), and add nested-msgpack helpers for List[List[MemoryDesc]].
Types: KVArgsRegisterInfo.dst_state_{mem_descs,item_lens,dim_per_tensor} and TransferInfo.dst_state_indices become List[List[...]] / List[np.ndarray].
MoriKVManager.state_mem_descs becomes List[List[MemoryDesc]]; _register_local_buffers builds it per-component.
send_state iterates state_types[i] and dispatches each component to _send_mamba_state or _send_swa_dsa_state independently (mirrors MooncakeKVManager.maybe_send_extra and NixlKVManager.maybe_send_extra).
_send_mamba_state / _send_swa_dsa_state accept a single component's slice instead of indexing into self.kv_args.* directly.
_normalize_state_indices_per_component ravels each component's payload to 1-D once at the API boundary, removing the 2-D single-component DSA edge case at the source.

Accuracy Tests

Cross-machine PD on AMD MI300X with --disaggregation-transfer-backend mori.

Qwen3-8B (pure transformer, validates non-hybrid path / empty state lists):

Setup	GSM8K (200q)	Errors
Single-machine, TP=2 + TP=2	94.50%	0
1P (TP=4) + 1D (TP=4) cross-machine	94.00%	0

Qwen3.5-122B-A10B (hybrid linear attention, exercises the per-component mamba state transfer path — decode logs show Mamba Cache is allocated with ssm_state 18.02GB / TP rank and Using hybrid linear attention backend for hybrid GDN models, and per-request mamba usage is non-zero):

Sample	GSM8K	Errors
30q	96.67%	0
100q	98.00%	0
300q @ concurrency 32	96.67%	0

No state-transfer-related errors in prefill or decode logs across all runs.

Speed Tests and Profiling

sglang.bench_serving --backend sglang-oai-chat against PD router
fronting 1P + 1D over RDMA:

Qwen3-8B (1P TP=4 + 1D TP=4):

in / out / concurrency	reqs	total throughput
1024 / 256 / 32	128 / 128	21.9k tok/s

Qwen3.5-122B-A10B (1P TP=8 + 1D TP=8):

in / out / concurrency	reqs	total throughput	mean E2E
1024 / 256 / 64	256 / 256	12.5k tok/s	6.02 s
2048 / 512 / 32	128 / 128	11.0k tok/s	6.96 s

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

cc @Duyi-Wang

CI States

Latest PR Test (Base): ⏳ Run #26624086997
Latest PR Test (Extra): ❌ Run #26624086820

gemini-code-assist · 2026-05-28T05:43:54Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ShangmingCai · 2026-05-28T12:11:20Z

-        state_type = getattr(self.kv_args, "state_type", "none")
-
-        if state_type == "none":
-            raise RuntimeError(
-                "PD state transfer failed: state_type is 'none' but state_indices were provided"
-            )
-
-        if not peer_info.dst_state_mem_descs:
+        state_types = getattr(self.kv_args, "state_types", None) or []


I think state_types = self.kv_args.state_types is enough. We have made sure this value will be set.

Thanks, updated as suggested

ShangmingCai

Others LGTM

ShangmingCai · 2026-05-28T12:12:52Z

CC: @HaiShaw

HaiShaw · 2026-05-28T15:53:12Z

/tag-and-rerun-ci

HaiShaw · 2026-05-29T07:19:24Z

@amd-bot ci-status

amd-bot · 2026-05-29T07:21:34Z

@HaiShaw

CI Status for PR #26539

PR: [PD][MoRI] Align hybrid state transfer with per-component schema
Changed files: python/sglang/srt/disaggregation/mori/conn.py (+157/-79), test/registered/amd/disaggregation/test_mori_transfer_engine_e2e.py (+25/-7)

AMD: 1 failure (0 likely related) | Others: 12 failures (0 related)

AMD CI Failures

Job	Test File	Test Function	Error	Related?	Explanation	Log
stage-a-test-1-gpu-small-amd (linux-mi325-1gpu-sglang)	`test/registered/attention/test_wave_attention_kernels.py`	(collection-time import)	`ModuleNotFoundError: No module named 'unittest.mock'` — caused by stray `test/registered/attention/unittest/__init__.py` shadowing stdlib `unittest`; also preceded by HF cache miss + private-registry timeout	🟢 Unlikely	PR only touches `disaggregation/mori/`; failing file is an attention kernel test that fails at `from unittest import SkipTest` in `sglang/srt/utils/common.py:74` because Python resolves `unittest` to a CI-side directory	Log

Other CI Failures

Job	Test File	Test Function	Error	Related?	Explanation	Log
base-b-test-1-gpu-large (3)	`test/registered/attention/test_chunk_gated_delta_rule.py`	import-time	`ImportError: cannot import name 'mock' from 'unittest' (...test/registered/attention/unittest/__init__.py)`	🟢 Unlikely	Same stdlib-`unittest` shadowing issue; PR unrelated	Log
base-b-test-1-gpu-large (6)	`test/registered/attention/test_triton_attention_backend.py`	import-time	`ImportError: cannot import name 'SkipTest' from 'unittest'`	🟢 Unlikely	Same issue	Log
base-b-test-1-gpu-large (8)	`test/registered/attention/test_deterministic.py`	import-time	`ImportError: cannot import name 'SkipTest' from 'unittest'`	🟢 Unlikely	Same issue	Log
base-b-test-1-gpu-small (5)	`test/registered/attention/test_create_kvindices.py`	import-time	`ImportError: cannot import name 'mock' from 'unittest'`	🟢 Unlikely	Same issue	Log
base-b-test-2-gpu-large (2)	`test/registered/attention/test_gemma4_swa_triton_oob_regression.py`	import-time	`ImportError: cannot import name 'SkipTest' from 'unittest'`	🟢 Unlikely	Same issue	Log
base-b-test-4-gpu-b200 (1)	`test/registered/attention/test_flash_attention_4.py`	import-time	`ImportError: cannot import name 'SkipTest' from 'unittest'`	🟢 Unlikely	Same issue	Log
stage-a-test-1-gpu-xpu	N/A	N/A	Docker build failed: `pip install torch==2.11.0+xpu ...` exit code 1 (XPU index installation failure)	🟢 Unlikely	XPU image build infra failure; PR doesn't touch XPU/Docker	Log
stage-b-test-1-npu-a2 (0)	`test/registered/ascend/basic_function/quant/test_npu_w8a8_quantization.py`	`test_gsm8k`	runtime error in `test_utils.py:2194` lambda	🟢 Unlikely	NPU-only quant test; PR touches no NPU code path	Log
multimodal-gen-test-1-npu-a3	`multimodal_gen/test/server/ascend/test_server_1_npu.py`	`test_diffusion_generation[wan2_1_t2v_1.3b_1_npu]`	Performance validation failed	🟢 Unlikely	NPU diffusion server perf test; unrelated to mori disaggregation	Log
multimodal-gen-test-2-npu-a3	`multimodal_gen/test/server/ascend/test_server_*`	diffusion perf	Performance validation failed	🟢 Unlikely	Same NPU diffusion perf cluster	Log
multimodal-gen-test-8-npu-a3	`multimodal_gen/test/server/ascend/...`	diffusion perf	Performance validation failed	🟢 Unlikely	Same NPU diffusion perf cluster	Log
finish	N/A	N/A	gate fail (downstream of XPU build)	🟢 Unlikely	Downstream finish gate; not a real test failure	Log

Details

None of the failures are related to this PR's changes (mori/conn.py + its e2e test).

Dominant cluster (8 of 13 failures): a stray test/registered/attention/unittest/ directory on the CI runners is being placed on sys.path and shadowing Python's stdlib unittest package. sglang/srt/utils/common.py:74 (from unittest import SkipTest) then fails to import. The correct local directory is test/registered/attention/unittests/ (with an s); the runner's checkout has somehow created or persisted a unittest/ variant. This is a CI infrastructure issue affecting every PR that hits these attention tests, not a regression caused by this PR.
XPU build (stage-a-test-1-gpu-xpu): failed during pip install torch==2.11.0+xpu from download.pytorch.org/whl/xpu — upstream index issue.
NPU jobs: diffusion server performance validation and a w8a8 quant gsm8k failure — NPU-only code paths, untouched by this PR.
AMD stage-a: same unittest shadowing issue, preceded by a private docker-registry timeout and HF cache miss — none touch mori code.

Verdict: the failures are unrelated to this PR. Safe to ignore from a correctness standpoint; the `unittest/` shadowing cluster is a CI-infra problem that needs separate cleanup of the runner workspace.

Generated by amd-bot using Claude Code CLI

HaiShaw

LGTM

…-project#26539)

github-actions Bot and others added 8 commits May 8, 2026 12:23

docs: sync LMSYS SGLang blog cards

2a7f8be

Merge branch 'sgl-project:main' into main

c76cb47

Merge branch 'sgl-project:main' into main

c2f5869

Merge branch 'sgl-project:main' into main

6604f2e

Merge branch 'sgl-project:main' into main

c029c20

Merge branch 'sgl-project:main' into main

6431ccc

Merge branch 'sgl-project:main' into main

819aa7f

[PD][MoRI] Migrate hybrid state transfer to per-component schema

4c622f4

maning00 changed the title ~~[PD][MoRI] Migrate hybrid state transfer to per-component schema~~ [PD][MoRI] Align hybrid state transfer with per-component schema May 28, 2026

maning00 added 2 commits May 28, 2026 09:34

[CI][MoRI] Add hybrid mamba state transfer regression test

393ea66

Merge branch 'main' into fix/mori-hybrid-state-component-aware

962b9ab

maning00 marked this pull request as ready for review May 28, 2026 09:38

maning00 requested review from ByronHsu, ShangmingCai and hnyls2002 as code owners May 28, 2026 09:38

maning00 mentioned this pull request May 28, 2026

[Bug] AMD glm5 & dsv4 MoRI disagg connector error: struct.error: required argument is not an integer #26525

Open

5 tasks

ShangmingCai reviewed May 28, 2026

View reviewed changes

Comment thread python/sglang/srt/disaggregation/mori/conn.py

ShangmingCai approved these changes May 28, 2026

View reviewed changes

ShangmingCai assigned HaiShaw and ShangmingCai May 28, 2026

github-actions Bot added the run-ci label May 28, 2026

Address review feedback

3e9b24f

Merge branch 'main' into fix/mori-hybrid-state-component-aware

73d8741

HaiShaw approved these changes May 29, 2026

View reviewed changes

HaiShaw merged commit 4d1163e into sgl-project:main May 29, 2026
66 of 104 checks passed

xjpang pushed a commit to xjpang/sglang that referenced this pull request Jun 2, 2026

[PD][MoRI] Align hybrid state transfer with per-component schema (sgl…

3168bd9

…-project#26539)

mqhc2020 pushed a commit to mqhc2020/sglang that referenced this pull request Jun 2, 2026

[PD][MoRI] Align hybrid state transfer with per-component schema (sgl…

80792e7

…-project#26539)

alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Jun 4, 2026

[PD][MoRI] Align hybrid state transfer with per-component schema (sgl…

043fd6e

…-project#26539)

jeynmann pushed a commit to jeynmann/sglang that referenced this pull request Jun 4, 2026

[PD][MoRI] Align hybrid state transfer with per-component schema (sgl…

97179a2

…-project#26539)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PD][MoRI] Align hybrid state transfer with per-component schema#26539

[PD][MoRI] Align hybrid state transfer with per-component schema#26539
HaiShaw merged 12 commits into
sgl-project:mainfrom
maning00:fix/mori-hybrid-state-component-aware

maning00 commented May 28, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented May 28, 2026

Uh oh!

ShangmingCai May 28, 2026 •

edited

Loading

Uh oh!

maning00 May 29, 2026

Uh oh!

Uh oh!

ShangmingCai left a comment

Uh oh!

ShangmingCai commented May 28, 2026

Uh oh!

HaiShaw commented May 28, 2026

Uh oh!

HaiShaw commented May 29, 2026

Uh oh!

amd-bot commented May 29, 2026

Uh oh!

HaiShaw left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

maning00 commented May 28, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

CI States

Uh oh!

gemini-code-assist Bot commented May 28, 2026

Uh oh!

ShangmingCai May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maning00 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

ShangmingCai commented May 28, 2026

Uh oh!

HaiShaw commented May 28, 2026

Uh oh!

HaiShaw commented May 29, 2026

Uh oh!

amd-bot commented May 29, 2026

CI Status for PR #26539

AMD CI Failures

Other CI Failures

Details

Verdict: the failures are unrelated to this PR. Safe to ignore from a correctness standpoint; the unittest/ shadowing cluster is a CI-infra problem that needs separate cleanup of the runner workspace.

Uh oh!

HaiShaw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maning00 commented May 28, 2026 •

edited by github-actions Bot

Loading

ShangmingCai May 28, 2026 •

edited

Loading

Verdict: the failures are unrelated to this PR. Safe to ignore from a correctness standpoint; the `unittest/` shadowing cluster is a CI-infra problem that needs separate cleanup of the runner workspace.