Skip to content

[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding#2847

Merged
SamitHuang merged 7 commits into
vllm-project:mainfrom
david6666666:codex/issue-2794-qwen-image-max-seq-1024
Apr 17, 2026
Merged

[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding#2847
SamitHuang merged 7 commits into
vllm-project:mainfrom
david6666666:codex/issue-2794-qwen-image-max-seq-1024

Conversation

@david6666666
Copy link
Copy Markdown
Collaborator

@david6666666 david6666666 commented Apr 16, 2026

Summary

  • enforce max_sequence_length before the text encoder runs instead of relying on silent truncation or post-encoder slicing
  • make the default max_sequence_length=1024 effective across Qwen-Image, Qwen-Image-Layered, Qwen-Image-Edit, and Qwen-Image-Edit-Plus
  • validate Qwen edit/edit-plus on text prompt length before image token expansion so short edit prompts still work
  • make the Wan2.2 family (T2V / I2V / TI2V / VACE) reject overlong prompts before UMT5 encoding, while preserving the existing default limit of 512
  • add targeted tests for both Qwen-Image and Wan2.2 prompt-length enforcement and default propagation

Validation

  • python -m pytest -q tests/diffusion/models/qwen_image/test_qwen_image_max_sequence_length.py tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.py
  • python -m ruff check vllm_omni/diffusion/models/qwen_image/prompt_utils.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit_plus.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_layered.py tests/diffusion/models/qwen_image/test_qwen_image_max_sequence_length.py vllm_omni/diffusion/models/wan2_2/prompt_utils.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_i2v.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_ti2v.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_vace.py tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.py
  • E2E serve + curl with local snapshots:
    • Qwen-Image: short prompt succeeded at /v1/images/generations; a 5000-word prompt failed with got 5006 tokens, but \max_sequence_length` is 1024`
    • Qwen-Image-Edit: short prompt succeeded at /v1/images/edits; a 5000-word prompt failed with got 5009 tokens, but \max_sequence_length` is 1024`
    • Wan2.2-T2V-A14B-Diffusers: a num_inference_steps=1, num_frames=5 request completed successfully at /v1/videos; a 5000-word prompt failed with got 5001 tokens, but \max_sequence_length` is 512`

Closes #2794.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Blocking Issues

  1. [Gate Failure] pre-commit - The pre-commit check is failing. Please fix any formatting/linting issues.

VERDICT: REQUEST_CHANGES (gate must pass before code review)

Please fix the failing pre-commit check before proceeding with the review.

@david6666666 david6666666 changed the title [Fix] enforce Qwen-Image max_sequence_length before encoding [Fix] enforce max_sequence_length for Qwen-Image and Wan2.2 before encoding Apr 16, 2026
Copy link
Copy Markdown
Collaborator Author

Update: this PR now also covers the Wan2.2 family.

Added in the second patch:

  • Wan2.2-T2V / I2V / TI2V / VACE now validate the real prompt token length before UMT5 encoding instead of silently truncating
  • request paths now consistently fall back to the Wan default max_sequence_length=512
  • targeted Wan2.2 tests were added alongside the earlier Qwen-Image tests

Additional validation:

  • python -m pytest -q tests/diffusion/models/qwen_image/test_qwen_image_max_sequence_length.py tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.py
  • live serve + curl on local Wan2.2-T2V-A14B-Diffusers snapshot:
    • short prompt completed successfully at /v1/videos with num_inference_steps=1, num_frames=5
    • 5000-word prompt failed with: got 5001 tokens, but max_sequence_length is 512

@david6666666 david6666666 changed the title [Fix] enforce max_sequence_length for Qwen-Image and Wan2.2 before encoding [Fix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding Apr 16, 2026
@david6666666 david6666666 changed the title [Fix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding [Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding Apr 16, 2026
@david6666666 david6666666 added the ready label to trigger buildkite CI label Apr 16, 2026
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. The approach of validating before encoding (rather than relying on silent truncation) is the right call. The shared validate_prompt_sequence_lengths utility is clean and well-documented. Tests cover all pipeline variants and both default/explicit limit paths.

A few minor observations:

  1. length_offset parameter is unused. validate_prompt_sequence_lengths accepts length_offset: int = 0 but no caller ever passes it. If there is no planned use, consider removing it to keep the API surface minimal. Not blocking.

  2. Double tokenization in Qwen pipelines. The PR tokenizes the prompt once with truncation=False for validation, then the text encoder runs the same text through the tokenizer again internally (or the processor does for edit pipelines). This is a minor perf cost (extra tokenizer forward pass) but acceptable for correctness. If this ever becomes a bottleneck, the validated tokens could be reused directly.

  3. Qwen _get_qwen_prompt_embeds still slices [:max_sequence_length] after encoding (in encode_prompt for pipeline_qwen_image.py and pipeline_qwen_image_layered.py). Since validation now rejects overlong prompts, this slice is effectively a no-op for user text. However, the template suffix tokens can push the total sequence beyond max_sequence_length, so the post-encoding slice still serves a purpose for trimming template overhead. This is correct but worth a brief inline comment explaining why the slice is still needed.

  4. Test coverage is solid. The _RejectingTextEncoder pattern is a nice way to assert the encoder is never reached for rejected prompts. The boundary tests for template suffix exclusion and image placeholder exclusion are well thought out.

LGTM. Approving.

QwenImageLayeredPipeline,
)


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This UT looks missing cpu and core_model pymark.


def __call__(self, *args, **kwargs):
raise AssertionError("text encoder should not run for prompts that exceed max_sequence_length")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
@david6666666 david6666666 force-pushed the codex/issue-2794-qwen-image-max-seq-1024 branch from 3991ecb to 21851d6 Compare April 17, 2026 01:46
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SamitHuang SamitHuang merged commit 3079e94 into vllm-project:main Apr 17, 2026
8 checks passed
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
… before encoding (vllm-project#2847)

Signed-off-by: david6666666 <530634352@qq.com>
david6666666 added a commit that referenced this pull request Apr 20, 2026
 #2877 (#2878)

Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
david6666666 added a commit to david6666666/vllm-omni that referenced this pull request Apr 20, 2026
Signed-off-by: david6666666 <530634352@qq.com>
gcanlin pushed a commit that referenced this pull request Apr 20, 2026
Signed-off-by: david6666666 <530634352@qq.com>
nainiu258 pushed a commit to nainiu258/vllm-omni that referenced this pull request Apr 21, 2026
…vllm-project#2877)

Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: nainiu258 <cperfect02@163.com>
qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
… before encoding (vllm-project#2847)

Signed-off-by: david6666666 <530634352@qq.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
… before encoding (vllm-project#2847)

Signed-off-by: david6666666 <530634352@qq.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Qwen-Image-Edit OOM with 100,000‑token prompt [Bug]: Qwen-Image-Edit-2511 RoPE position encoding shape mismatch with 10000‑token prompt

5 participants