[Bugfix] Fix Qwen-Image SP and TeaCache incompatibility#2101
[Bugfix] Fix Qwen-Image SP and TeaCache incompatibility#2101gcanlin merged 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 23bda666b7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # For zero_cond_t=True: timestep is doubled, modulate_index is created and | ||
| # sharded by the SequenceParallelSplitHook on modulate_index_prepare so that | ||
| # its sequence dimension matches the already-sharded hidden_states. | ||
| timestep, modulate_index = module.modulate_index_prepare(timestep, img_shapes) |
There was a problem hiding this comment.
Keep TeaCache extractor aligned with
zero_cond_t batching
For Qwen checkpoints with zero_cond_t=True (the image-edit variants), ModulateIndexPrepare.forward() doubles timestep here (qwen_image_transformer.py:141-154), so temb becomes 2 * batch. The rest of extract_qwen_context() still consumes that embedding as if it were batch: block.img_norm1(hidden_states, img_mod1) is still called without modulate_index, and postprocess() never chunks temb back down before module.norm_out, unlike QwenImageTransformer2DModel.forward() (qwen_image_transformer.py:1062-1065). With TeaCache enabled, Qwen edit models will therefore fail on the first forward with a batch-dimension mismatch instead of running the transformer.
Useful? React with 👍 / 👎.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Solving #2092
Root Cause:
In the
extract_qwen_context function, the TeaCache extractor directly invokesmodule.img_inandmodule.pos_embed, bypassing theimage_rope_preparemodule. This prevents the Sequence Parallel (SP)SequenceParallelSplitHookfrom being triggered, resulting inhidden_statesnot being properly sharded.When SP is enabled:
The
_sp_planregisters aSequenceParallelSplitHookonimage_rope_prepareto shardhidden_statesafter the forward passSolution
This ensures TeaCache is properly compatible with Sequence Parallelism (both Ulysses-SP and Ring Attention).
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)