[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding by david6666666 · Pull Request #2847 · vllm-project/vllm-omni

david6666666 · 2026-04-16T09:08:11Z

Summary

enforce max_sequence_length before the text encoder runs instead of relying on silent truncation or post-encoder slicing
make the default max_sequence_length=1024 effective across Qwen-Image, Qwen-Image-Layered, Qwen-Image-Edit, and Qwen-Image-Edit-Plus
validate Qwen edit/edit-plus on text prompt length before image token expansion so short edit prompts still work
make the Wan2.2 family (T2V / I2V / TI2V / VACE) reject overlong prompts before UMT5 encoding, while preserving the existing default limit of 512
add targeted tests for both Qwen-Image and Wan2.2 prompt-length enforcement and default propagation

Validation

python -m pytest -q tests/diffusion/models/qwen_image/test_qwen_image_max_sequence_length.py tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.py
python -m ruff check vllm_omni/diffusion/models/qwen_image/prompt_utils.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit_plus.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_layered.py tests/diffusion/models/qwen_image/test_qwen_image_max_sequence_length.py vllm_omni/diffusion/models/wan2_2/prompt_utils.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_i2v.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_ti2v.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_vace.py tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.py
E2E serve + curl with local snapshots:
- Qwen-Image: short prompt succeeded at /v1/images/generations; a 5000-word prompt failed with got 5006 tokens, but \max_sequence_length` is 1024`
- Qwen-Image-Edit: short prompt succeeded at /v1/images/edits; a 5000-word prompt failed with got 5009 tokens, but \max_sequence_length` is 1024`
- Wan2.2-T2V-A14B-Diffusers: a num_inference_steps=1, num_frames=5 request completed successfully at /v1/videos; a 5000-word prompt failed with got 5001 tokens, but \max_sequence_length` is 512`

Closes #2794.

chatgpt-codex-connector · 2026-04-16T09:08:22Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-16T09:32:17Z

Blocking Issues

[Gate Failure] pre-commit - The pre-commit check is failing. Please fix any formatting/linting issues.

VERDICT: REQUEST_CHANGES (gate must pass before code review)

Please fix the failing pre-commit check before proceeding with the review.

david6666666 · 2026-04-16T09:35:48Z

Update: this PR now also covers the Wan2.2 family.

Added in the second patch:

Wan2.2-T2V / I2V / TI2V / VACE now validate the real prompt token length before UMT5 encoding instead of silently truncating
request paths now consistently fall back to the Wan default max_sequence_length=512
targeted Wan2.2 tests were added alongside the earlier Qwen-Image tests

Additional validation:

python -m pytest -q tests/diffusion/models/qwen_image/test_qwen_image_max_sequence_length.py tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.py
live serve + curl on local Wan2.2-T2V-A14B-Diffusers snapshot:
- short prompt completed successfully at /v1/videos with num_inference_steps=1, num_frames=5
- 5000-word prompt failed with: got 5001 tokens, but max_sequence_length is 512

lishunyang12

Looks good overall. The approach of validating before encoding (rather than relying on silent truncation) is the right call. The shared validate_prompt_sequence_lengths utility is clean and well-documented. Tests cover all pipeline variants and both default/explicit limit paths.

A few minor observations:

length_offset parameter is unused. validate_prompt_sequence_lengths accepts length_offset: int = 0 but no caller ever passes it. If there is no planned use, consider removing it to keep the API surface minimal. Not blocking.
Double tokenization in Qwen pipelines. The PR tokenizes the prompt once with truncation=False for validation, then the text encoder runs the same text through the tokenizer again internally (or the processor does for edit pipelines). This is a minor perf cost (extra tokenizer forward pass) but acceptable for correctness. If this ever becomes a bottleneck, the validated tokens could be reused directly.
Qwen _get_qwen_prompt_embeds still slices [:max_sequence_length] after encoding (in encode_prompt for pipeline_qwen_image.py and pipeline_qwen_image_layered.py). Since validation now rejects overlong prompts, this slice is effectively a no-op for user text. However, the template suffix tokens can push the total sequence beyond max_sequence_length, so the post-encoding slice still serves a purpose for trimming template overhead. This is correct but worth a brief inline comment explaining why the slice is still needed.
Test coverage is solid. The _RejectingTextEncoder pattern is a nice way to assert the encoder is never reached for rejected prompts. The boundary tests for template suffix exclusion and image placeholder exclusion are well thought out.

LGTM. Approving.

gcanlin · 2026-04-17T01:00:43Z

+    QwenImageLayeredPipeline,
+)
+
+


This UT looks missing cpu and core_model pymark.

gcanlin · 2026-04-17T01:01:02Z

+
+    def __call__(self, *args, **kwargs):
+        raise AssertionError("text encoder should not run for prompts that exceed max_sequence_length")
+


Signed-off-by: david6666666 <530634352@qq.com>

gcanlin

LGTM

… before encoding (vllm-project#2847) Signed-off-by: david6666666 <530634352@qq.com>

#2877 (#2878) Signed-off-by: david6666666 <530634352@qq.com> Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>

Signed-off-by: david6666666 <530634352@qq.com>

…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com> Signed-off-by: nainiu258 <cperfect02@163.com>

…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>

… before encoding (vllm-project#2847) Signed-off-by: david6666666 <530634352@qq.com>

…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>

… before encoding (vllm-project#2847) Signed-off-by: david6666666 <530634352@qq.com>

…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>

david6666666 requested a review from hsliuustc0106 as a code owner April 16, 2026 09:08

david6666666 mentioned this pull request Apr 16, 2026

[Bug]: Qwen-Image-Edit OOM with 100,000‑token prompt #2794

Closed

1 task

david6666666 changed the title ~~[Fix] enforce Qwen-Image max_sequence_length before encoding~~ [Fix] enforce max_sequence_length for Qwen-Image and Wan2.2 before encoding Apr 16, 2026

david6666666 changed the title ~~[Fix] enforce max_sequence_length for Qwen-Image and Wan2.2 before encoding~~ [Fix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding Apr 16, 2026

david6666666 linked an issue Apr 16, 2026 that may be closed by this pull request

[Bug]: Qwen-Image-Edit-2511 RoPE position encoding shape mismatch with 10000‑token prompt #2791

Closed

1 task

david6666666 changed the title ~~[Fix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding~~ [Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding Apr 16, 2026

david6666666 added the ready label to trigger buildkite CI label Apr 16, 2026

david6666666 mentioned this pull request Apr 16, 2026

[RFC][0.20.0]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#181

Closed

1 task

lishunyang12 approved these changes Apr 16, 2026

View reviewed changes

gcanlin reviewed Apr 17, 2026

View reviewed changes

david6666666 added 7 commits April 17, 2026 01:39

[Fix] enforce Qwen-Image max_sequence_length before encoding

adda9a6

Signed-off-by: david6666666 <530634352@qq.com>

[Fix] enforce Wan2.2 max_sequence_length before encoding

281e14a

Signed-off-by: david6666666 <530634352@qq.com>

[Fix] apply pre-commit formatting

66151f0

Signed-off-by: david6666666 <530634352@qq.com>

[Refactor] merge diffusion prompt length validators

1e8fa70

Signed-off-by: david6666666 <530634352@qq.com>

[Fix] exclude Qwen template overhead from prompt length checks

0a6d618

Signed-off-by: david6666666 <530634352@qq.com>

[Docs] annotate Qwen prompt length validation invariants

bd9bfaf

Signed-off-by: david6666666 <530634352@qq.com>

[Test] add pytest marks for max sequence UTs

21851d6

Signed-off-by: david6666666 <530634352@qq.com>

david6666666 force-pushed the codex/issue-2794-qwen-image-max-seq-1024 branch from 3991ecb to 21851d6 Compare April 17, 2026 01:46

gcanlin approved these changes Apr 17, 2026

View reviewed changes

SamitHuang merged commit 3079e94 into vllm-project:main Apr 17, 2026
8 checks passed

david6666666 mentioned this pull request Apr 17, 2026

[cherry-pick][release/v0.18.0.post1] cherry-pick #2847 #2780 #2840 #2876 #2877 #2878

Merged

lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026

[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series…

0b6bd8b

… before encoding (vllm-project#2847) Signed-off-by: david6666666 <530634352@qq.com>

david6666666 added a commit to david6666666/vllm-omni that referenced this pull request Apr 20, 2026

[Revert] drop Wan2.2 prompt-length enforcement from vllm-project#2847

9528def

Signed-off-by: david6666666 <530634352@qq.com>

gcanlin pushed a commit that referenced this pull request Apr 20, 2026

[Revert] drop Wan2.2 prompt-length enforcement from #2847 (#2877)

c2859e9

Signed-off-by: david6666666 <530634352@qq.com>

nainiu258 pushed a commit to nainiu258/vllm-omni that referenced this pull request Apr 21, 2026

[Revert] drop Wan2.2 prompt-length enforcement from vllm-project#2847 (…

9129fba

…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com> Signed-off-by: nainiu258 <cperfect02@163.com>

qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026

[Revert] drop Wan2.2 prompt-length enforcement from vllm-project#2847 (…

f0be6b4

…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series…

75bfb1e

… before encoding (vllm-project#2847) Signed-off-by: david6666666 <530634352@qq.com>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Revert] drop Wan2.2 prompt-length enforcement from vllm-project#2847 (…

5d250d3

…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series…

418a66b

… before encoding (vllm-project#2847) Signed-off-by: david6666666 <530634352@qq.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Revert] drop Wan2.2 prompt-length enforcement from vllm-project#2847 (…

076e208

…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding#2847

[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding#2847
SamitHuang merged 7 commits into
vllm-project:mainfrom
david6666666:codex/issue-2794-qwen-image-max-seq-1024

david6666666 commented Apr 16, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 16, 2026

Uh oh!

hsliuustc0106 commented Apr 16, 2026

Uh oh!

david6666666 commented Apr 16, 2026

Uh oh!

lishunyang12 left a comment

Uh oh!

gcanlin Apr 17, 2026

Uh oh!

gcanlin Apr 17, 2026

Uh oh!

gcanlin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		def __call__(self, args, *kwargs):
		raise AssertionError("text encoder should not run for prompts that exceed max_sequence_length")

Conversation

david6666666 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

chatgpt-codex-connector Bot commented Apr 16, 2026

Uh oh!

hsliuustc0106 commented Apr 16, 2026

Blocking Issues

Uh oh!

david6666666 commented Apr 16, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

gcanlin Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

david6666666 commented Apr 16, 2026 •

edited

Loading