[Bugfix]: SP attention not enabling when _sp_plan hooks are not applied by wtomin · Pull Request #1704 · vllm-project/vllm-omni

wtomin · 2026-03-06T06:37:42Z

Purpose

This PR aims to fix one bug: SP attention not enabling when _sp_plan hooks are not applied. This bug exists in two cases:

In some models SP implementation, it does not use _sp_plan, e.g., LongCatImage [Bug]: LongCat Image Sequence Parallelism is Broken #1556 ;
In standalone SP unit test script, it does not use _sp_plan；

Although the first case has a quick fix merged (quick fix in #1631), it is not intended to expose fwd_context._sp_shard_depth to the developers. Developers can easily forget to set it manually.

Therefore, in this PR, it proposes to check _sp_shard_depth only when _sp_plan hooks are applied. If not applied, sp_active is only determined by sp_size in the configuration. This is beneficial to both manual SP implementation and standalone SP unit test.

Minor edits for SP UT:

seed_everything function is corrected, bug [Bug]: Test test_attention_sp.py failed: TypeError: 'NoneType' object is not callable when calling current_omni_platform.seed_everything #1705 ;
remove attn_backend because it is set via environmental variable;

Test Plan

Standalone SP UT

pytest -s -v tests/diffusion/attention/test_attention_sp.py

LongCatImage SP

cd examples/offline_inference/text_to_image
python text_to_image.py --model meituan-longcat/LongCat-Image --ulysses-degree 2

python text_to_image.py --model meituan-longcat/LongCat-Image --ulysses-degree 2 --ring-degree 2

Test Result

Standalone SP UT


[baseline (no SP)] ✓ Saved output with shape torch.Size([2, 16, 64]):
  - batch_size=2, seq_len=16
  - num_heads=8, head_size=8
  - dtype=torch.bfloat16, causal=False, use_sync=False

[SP (ulysses=2, ring=2)] ✓ Saved output with shape torch.Size([2, 16, 64]):
  - batch_size=2, seq_len=16
  - num_heads=8, head_size=8
  - dtype=torch.bfloat16, causal=False, use_sync=False

================================================================================
Comparing outputs between baseline and SP...
  Baseline output shape: torch.Size([2, 16, 64])
  SP output shape: torch.Size([2, 16, 64])

================================================================================
Output Difference Analysis:
  - Max absolute difference: 1.562500e-02
  - Mean absolute difference: 4.872084e-04
  - Max relative difference: 9.999897e-01
  - Mean relative difference: 2.934728e-03
  - Baseline output range: [-3.140625e+00, 3.359375e+00]
  - SP output range: [-3.140625e+00, 3.359375e+00]
================================================================================

✓ Test passed: SP output matches baseline within tolerance
======================================================================= 1 passed, 20 warnings in 64.19s (0:01:04) ========================================================================

LongCatImage SP

ulysses-degree	ring-degree	generation time	image
2	1	2.99s
2	2	3.68s

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

wtomin · 2026-03-06T06:41:19Z

In order to solve a bug existent in SP unit test #1705, I raised this PR.

#1692 tackles the LongCat Image SP problem, from the perspective of using _sp_plan instead of manual SP implementation.

Thus the two solutions are not exclusive. @alex-jw-brooks I still suggest you to support LongCat Image SP with _sp_plan.

wtomin · 2026-03-06T06:59:47Z

@ZJY0516 @gcanlin @hsliuustc0106 @SamitHuang Please give your comments. Thanks.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ecde3342b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

hsliuustc0106 · 2026-03-06T09:00:37Z

any perf comparison with sgl-d?

hsliuustc0106

Review

Rating: 8.5/10 | Verdict: ✅ Approved

Summary

Correct bugfix enabling SP attention when _sp_plan hooks are not applied (manual SP, standalone tests). Root cause identified and fix is minimal and targeted.

CI Gate Checks (Step 0)

✅ DCO: SUCCESS
✅ Pre-commit: SUCCESS
✅ Mergeable: MERGEABLE

Root Cause Analysis

Problem: sp_active property only checked _sp_shard_depth > 0, which is only meaningful within the _sp_plan hook mechanism. When hooks are not applied (manual SP, standalone tests), _sp_shard_depth stays at 0, causing SP attention to be incorrectly disabled.

Fix: Add sp_plan_hooks_applied flag to distinguish:

Hooks applied: use _sp_shard_depth (original behavior)
Hooks NOT applied: default to True when sequence_parallel_size > 1

Correctness Analysis

Scenario	Before	After	Status
_sp_plan hooks applied	✅ Use `_sp_shard_depth`	✅ Same	Preserved
Manual SP (no hooks)	❌ Always disabled	✅ Enabled when SP > 1	Fixed
Standalone tests	❌ Always disabled	✅ Enabled when SP > 1	Fixed

Highlights

✅ Minimal change (3 files, focused on root cause)
✅ Clear flag (sp_plan_hooks_applied) for state tracking
✅ Backward compatible (preserves hook behavior)
✅ Existing test modified to verify fix
✅ Error handling for missing omni_diffusion_config

Test Changes

Removed: attn_backend parameter (simplification)
Added: seed_everything() helper function
Modified: Test now works without _sp_plan hooks

Minor Suggestions (non-blocking)

Test coverage: The existing test is modified but no new test explicitly validates the "no hooks" scenario. Consider adding a comment in the test explaining it now exercises the new code path (hooks NOT applied).
Error message: Line 60-61 raises ValueError when omni_diffusion_config is None. Consider adding context: "omni_diffusion_config is not set when checking sp_active! Please call ..."
Flag initialization: sp_plan_hooks_applied defaults to False. Consider adding a class-level comment explaining the flag's purpose and when it's set.

Pitfalls Check

Directory	Pitfall	Status
`diffusion/forward_context.py`	State management	✅ New flag
`diffusion/registry.py`	Flag setting	✅ Correct
`tests/`	Regression test	✅ Modified

Recommendation

Ready to merge. Clean bugfix with good test coverage.

Reviewed by OpenClaw with vllm-omni-skills 🦐

Skill: vllm-omni-review (Bugfix)

alex-jw-brooks · 2026-03-06T16:01:29Z

 @pytest.mark.parametrize("head_size", [8])
 @pytest.mark.parametrize("causal", [False])
-@pytest.mark.parametrize("dtype", [torch.float16, torch.bfloat16])  # [torch.float16, torch.bfloat16]
+@pytest.mark.parametrize("dtype", [torch.bfloat16])


Is there a reason for removing fp16 here?

Due to #906, the default attention backend FA does not support fp16.

alex-jw-brooks

looks good to me, thanks!

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

Co-authored-by: Canlin Guo <961750412@qq.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

wtomin mentioned this pull request Mar 6, 2026

[Bug]: Test test_attention_sp.py failed: TypeError: 'NoneType' object is not callable when calling current_omni_platform.seed_everything #1705

Closed

1 task

wtomin marked this pull request as ready for review March 6, 2026 06:58

wtomin requested a review from hsliuustc0106 as a code owner March 6, 2026 06:58

chatgpt-codex-connector Bot reviewed Mar 6, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/registry.py

david6666666 linked an issue Mar 6, 2026 that may be closed by this pull request

[Bug]: Test test_attention_sp.py failed: TypeError: 'NoneType' object is not callable when calling current_omni_platform.seed_everything #1705

Closed

1 task

gcanlin reviewed Mar 6, 2026

View reviewed changes

Comment thread tests/diffusion/attention/test_attention_sp.py Outdated

hsliuustc0106 approved these changes Mar 6, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/forward_context.py

Comment thread vllm_omni/diffusion/forward_context.py Outdated

Comment thread vllm_omni/diffusion/registry.py

Comment thread tests/diffusion/attention/test_attention_sp.py

alex-jw-brooks suggested changes Mar 6, 2026

View reviewed changes

wtomin force-pushed the sp-test-re branch from 05faa89 to 55ebd46 Compare March 9, 2026 08:45

wtomin added the ready label to trigger buildkite CI label Mar 9, 2026

wtomin mentioned this pull request Mar 9, 2026

[Feat] support SP for FLUX.2-klein #1250

Merged

alex-jw-brooks approved these changes Mar 10, 2026

View reviewed changes

wtomin and others added 11 commits March 10, 2026 13:09

seed

462d9bb

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

seed everything

bbbacbb

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

use bf16 for test only

d46a186

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

attn_backend by env variable

a040a21

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

set sp_plan_hook_applied

3af3b98

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

update sp_plan_hooks_applied

247adda

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

sp_active

3545a58

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

Update tests/diffusion/attention/test_attention_sp.py

93470ad

Co-authored-by: Canlin Guo <961750412@qq.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

update forward context

44264c3

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

better doc and debugging

beed19a

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

update docs

363d816

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

wtomin force-pushed the sp-test-re branch from 30cd6cc to 363d816 Compare March 10, 2026 05:09

wtomin merged commit 7543f2f into vllm-project:main Mar 11, 2026
7 checks passed

wtomin mentioned this pull request Mar 11, 2026

[Refactor] Use SP Plan for LongCat Sequence Parallelism #1772

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix]: SP attention not enabling when _sp_plan hooks are not applied#1704

[Bugfix]: SP attention not enabling when _sp_plan hooks are not applied#1704
wtomin merged 11 commits intovllm-project:mainfrom
wtomin:sp-test-re

wtomin commented Mar 6, 2026 •

edited

Loading

Uh oh!

wtomin commented Mar 6, 2026 •

edited

Loading

Uh oh!

wtomin commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Mar 6, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alex-jw-brooks Mar 6, 2026

Uh oh!

wtomin Mar 9, 2026

Uh oh!

alex-jw-brooks left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wtomin commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

wtomin commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wtomin commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Mar 6, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Review

Summary

CI Gate Checks (Step 0)

Root Cause Analysis

Correctness Analysis

Highlights

Test Changes

Minor Suggestions (non-blocking)

Pitfalls Check

Recommendation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alex-jw-brooks Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

wtomin Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wtomin commented Mar 6, 2026 •

edited

Loading

wtomin commented Mar 6, 2026 •

edited

Loading