[Bugfix]: SP attention not enabling when _sp_plan hooks are not applied#1704
[Bugfix]: SP attention not enabling when _sp_plan hooks are not applied#1704wtomin merged 11 commits intovllm-project:mainfrom
Conversation
|
In order to solve a bug existent in SP unit test #1705, I raised this PR. #1692 tackles the LongCat Image SP problem, from the perspective of using Thus the two solutions are not exclusive. @alex-jw-brooks I still suggest you to support LongCat Image SP with |
|
@ZJY0516 @gcanlin @hsliuustc0106 @SamitHuang Please give your comments. Thanks. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ecde3342b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
any perf comparison with sgl-d? |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review
Rating: 8.5/10 | Verdict: ✅ Approved
Summary
Correct bugfix enabling SP attention when _sp_plan hooks are not applied (manual SP, standalone tests). Root cause identified and fix is minimal and targeted.
CI Gate Checks (Step 0)
- ✅ DCO: SUCCESS
- ✅ Pre-commit: SUCCESS
- ✅ Mergeable: MERGEABLE
Root Cause Analysis
Problem: sp_active property only checked _sp_shard_depth > 0, which is only meaningful within the _sp_plan hook mechanism. When hooks are not applied (manual SP, standalone tests), _sp_shard_depth stays at 0, causing SP attention to be incorrectly disabled.
Fix: Add sp_plan_hooks_applied flag to distinguish:
- Hooks applied: use
_sp_shard_depth(original behavior) - Hooks NOT applied: default to
Truewhensequence_parallel_size > 1
Correctness Analysis
| Scenario | Before | After | Status |
|---|---|---|---|
| _sp_plan hooks applied | ✅ Use _sp_shard_depth |
✅ Same | Preserved |
| Manual SP (no hooks) | ❌ Always disabled | ✅ Enabled when SP > 1 | Fixed |
| Standalone tests | ❌ Always disabled | ✅ Enabled when SP > 1 | Fixed |
Highlights
- ✅ Minimal change (3 files, focused on root cause)
- ✅ Clear flag (
sp_plan_hooks_applied) for state tracking - ✅ Backward compatible (preserves hook behavior)
- ✅ Existing test modified to verify fix
- ✅ Error handling for missing
omni_diffusion_config
Test Changes
Removed: attn_backend parameter (simplification)
Added: seed_everything() helper function
Modified: Test now works without _sp_plan hooks
Minor Suggestions (non-blocking)
-
Test coverage: The existing test is modified but no new test explicitly validates the "no hooks" scenario. Consider adding a comment in the test explaining it now exercises the new code path (hooks NOT applied).
-
Error message: Line 60-61 raises
ValueErrorwhenomni_diffusion_configis None. Consider adding context:"omni_diffusion_config is not set when checking sp_active! Please call ..." -
Flag initialization:
sp_plan_hooks_applieddefaults to False. Consider adding a class-level comment explaining the flag's purpose and when it's set.
Pitfalls Check
| Directory | Pitfall | Status |
|---|---|---|
diffusion/forward_context.py |
State management | ✅ New flag |
diffusion/registry.py |
Flag setting | ✅ Correct |
tests/ |
Regression test | ✅ Modified |
Recommendation
Ready to merge. Clean bugfix with good test coverage.
Reviewed by OpenClaw with vllm-omni-skills 🦐
Skill: vllm-omni-review (Bugfix)
| @pytest.mark.parametrize("head_size", [8]) | ||
| @pytest.mark.parametrize("causal", [False]) | ||
| @pytest.mark.parametrize("dtype", [torch.float16, torch.bfloat16]) # [torch.float16, torch.bfloat16] | ||
| @pytest.mark.parametrize("dtype", [torch.bfloat16]) |
There was a problem hiding this comment.
Is there a reason for removing fp16 here?
There was a problem hiding this comment.
Due to #906, the default attention backend FA does not support fp16.
alex-jw-brooks
left a comment
There was a problem hiding this comment.
looks good to me, thanks!
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Co-authored-by: Canlin Guo <961750412@qq.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Purpose
This PR aims to fix one bug: SP attention not enabling when _sp_plan hooks are not applied. This bug exists in two cases:
_sp_plan, e.g., LongCatImage [Bug]: LongCat Image Sequence Parallelism is Broken #1556 ;_sp_plan;Although the first case has a quick fix merged (quick fix in #1631), it is not intended to expose
fwd_context._sp_shard_depthto the developers. Developers can easily forget to set it manually.Therefore, in this PR, it proposes to check
_sp_shard_depthonly when _sp_plan hooks are applied. If not applied,sp_activeis only determined by sp_size in the configuration. This is beneficial to both manual SP implementation and standalone SP unit test.Minor edits for SP UT:
seed_everythingfunction is corrected, bug [Bug]: Test test_attention_sp.py failed: TypeError: 'NoneType' object is not callable when calling current_omni_platform.seed_everything #1705 ;attn_backendbecause it is set via environmental variable;Test Plan
pytest -s -v tests/diffusion/attention/test_attention_sp.pyTest Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)