Skip to content

[Bug][PD][NIXL] always send aux on is_last; only expects_state when truthy#25699

Merged
ShangmingCai merged 2 commits into
sgl-project:mainfrom
ishandhanani:idhanani/nixl-disagg-dense-aux-fix
May 19, 2026
Merged

[Bug][PD][NIXL] always send aux on is_last; only expects_state when truthy#25699
ShangmingCai merged 2 commits into
sgl-project:mainfrom
ishandhanani:idhanani/nixl-disagg-dense-aux-fix

Conversation

@ishandhanani
Copy link
Copy Markdown
Collaborator

@ishandhanani ishandhanani commented May 18, 2026

Motivation

Fixes #25698.

Two asymmetries introduced in #24932 ([PD] Refactor hybrid state transfer) hang every NIXL P/D disagg request for dense models (no Mamba / SWA / NSA — i.e. LLaMA, Qwen3, Gemma, GPT-OSS, …) on v0.5.12. Bisects cleanly to commit d7f4761a4.

  1. NixlKVManager.transfer_worker gates the aux RDMA write inside if kv_chunk.is_last and kv_chunk.state_indices:. For dense models, state_indices is [] (falsy from the prefill scheduler's empty state_types), so the whole branch short-circuits and send_aux is never called. Decode never receives the {room}_aux notification → hangs.
  2. NixlKVReceiver.send_metadata flips expects_state=True whenever state_indices is not None. Decode receives state_indices=[] (non-None, empty) for dense models, so expects_state flips on. The corresponding prefill check at line ~692 is truthy (if kv_chunk.state_indices:) and (correctly) never sends a state notif, so is_done() waits forever for a notif that doesn't come.

Either bug alone hangs the request; both fixes are required for end-to-end disagg on dense models.

Mooncake is unaffectedmooncake/conn.py:1344-1357 already gates state and aux on independent if blocks, and mooncake has no expects_state field (per-chunk return-code polling instead). Mooncake users on v0.5.12 can keep running disagg; NIXL users currently can't.

Modifications

python/sglang/srt/disaggregation/nixl/conn.py:

  1. Split the is_last and state_indices gates in transfer_worker so the aux RDMA write always runs on is_last, and state only runs when state_indices is non-empty. Matches the v0.5.11 shape and matches mooncake/conn.py:1344-1357.
  2. Switch NixlKVReceiver.send_metadata from if state_indices is not None: to if state_indices: so expects_state only flips on when there's actually state to expect.

Accuracy Tests

End-to-end manual verification on Qwen/Qwen3-0.6B (dense Qwen3) with NIXL UCX backend on 2× L40S, single host, tp1+tp1:

13:53:59.769  prefill bootstrap_thread recv KVArgsRegister (17 frames)
13:53:59.774  decode  send_metadata
13:54:00.464  prefill bootstrap_thread recv TransferInfo (10 frames)
13:54:00.555  prefill add_transfer_request (nkv=1, aux_index=1)
13:54:00.640  prefill transfer_worker DONE n_handles=2 elapsed=66ms   ← KV + aux both sent
13:54:00.640  decode  notif tag=kv  → is_done=False (aux still needed)
13:54:00.641  decode  notif tag=aux → is_done=True                    ← unblocks

Before either fix n_handles=1 (aux skipped) and decode is_done stays False forever. After both fixes the request returns tokens cleanly in <2s.

I do not have access to a multi-node setup to exercise heterogeneous-TP staging here, but neither hunk changes any code path that runs only with enable_staging=True, state_indices truthy, MLA, or decode_tp_size != attn_tp_size — they only restore the dense-LLM path to its v0.5.11 behavior.

Benchmarking and Profiling

No perf change expected — same RDMA ops on the wire; no allocation or branch-count changes for the stateful paths (Mamba/SWA/NSA), and the dense-LLM path now actually completes instead of looping in pop_transferred until SGLANG_DISAGGREGATION_WAITING_TIMEOUT.

Checklist


CI States

Latest PR Test (Base): ❌ Run #26071445981
Latest PR Test (Extra): ❌ Blocked -- run-ci is required first.

…ruthy

Fixes sgl-project#25698. Two asymmetries introduced in sgl-project#24932 (PD hybrid state
refactor) hang NIXL disagg for every dense model (LLaMA, Qwen3, etc.):

1. NixlKVManager.transfer_worker gates the aux RDMA write inside
   `if kv_chunk.is_last and kv_chunk.state_indices:`. For dense models
   state_indices is [] (falsy) so the whole block short-circuits and
   send_aux is never called. Decode never gets the {room}_aux notif.
   Split into two: gate state on state_indices, gate aux on is_last only
   (matches v0.5.11 behavior and matches mooncake/conn.py:1344-1357).

2. NixlKVReceiver.send_metadata sets expects_state=True whenever
   state_indices is not None, but decode receives state_indices=[]
   (non-None, empty) for dense models, so expects_state flips on and
   is_done() waits forever for a state notif prefill never sends.
   Switch to a truthy check to match the prefill side at line ~692.

Verified end-to-end on Qwen/Qwen3-0.6B (dense), 2x L40S, tp1+tp1,
single host. Mooncake unaffected (independent if blocks; no
expects_state field).
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ispobock
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@ispobock ispobock added run-ci-extra format Auto Format Code and removed run-ci labels May 19, 2026
Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ispobock
Copy link
Copy Markdown
Collaborator

@ishandhanani could you fix the lint?

@ishandhanani
Copy link
Copy Markdown
Collaborator Author

@ishandhanani could you fix the lint?

Yep. Will do in an hour when I get online again

# Mark that we expect state data if state_indices was provided
if state_indices is not None:
# Mark that we expect state data if state_indices was provided.
# Match the prefill-side truthy check (line ~697): an empty list
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better not to include the code line in the comment, let me fix this and fix the lint

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid AI :)

@ShangmingCai
Copy link
Copy Markdown
Collaborator

@ishandhanani could you fix the lint?

Yep. Will do in an hour when I get online again

@ishandhanani Let me do it for you. Since nixl is not in the CI, we could just merge. And feel free to add a nixl test in test_disaggregation_basic.py (no HCA, requires TCP transport) or add a different TP test in H20 suite when you get online again.

@ShangmingCai ShangmingCai merged commit 87c3c96 into sgl-project:main May 19, 2026
73 of 83 checks passed
Kangyan-Zhou pushed a commit that referenced this pull request May 19, 2026
Co-authored-by: Shangming Cai <csmthu@gmail.com>
Shunkangz pushed a commit to Shunkangz/sglang that referenced this pull request May 27, 2026
…ruthy (sgl-project#25699)

Co-authored-by: Shangming Cai <csmthu@gmail.com>
alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

format Auto Format Code run-ci-extra

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][PD][NIXL] disagg hangs on dense models on v0.5.12 — state-gated aux + decode expects_state mismatch (#24932 regression)

3 participants