[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding#2847
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Blocking Issues
VERDICT: REQUEST_CHANGES (gate must pass before code review) Please fix the failing pre-commit check before proceeding with the review. |
|
Update: this PR now also covers the Added in the second patch:
Additional validation:
|
lishunyang12
left a comment
There was a problem hiding this comment.
Looks good overall. The approach of validating before encoding (rather than relying on silent truncation) is the right call. The shared validate_prompt_sequence_lengths utility is clean and well-documented. Tests cover all pipeline variants and both default/explicit limit paths.
A few minor observations:
-
length_offsetparameter is unused.validate_prompt_sequence_lengthsacceptslength_offset: int = 0but no caller ever passes it. If there is no planned use, consider removing it to keep the API surface minimal. Not blocking. -
Double tokenization in Qwen pipelines. The PR tokenizes the prompt once with
truncation=Falsefor validation, then the text encoder runs the same text through the tokenizer again internally (or the processor does for edit pipelines). This is a minor perf cost (extra tokenizer forward pass) but acceptable for correctness. If this ever becomes a bottleneck, the validated tokens could be reused directly. -
Qwen
_get_qwen_prompt_embedsstill slices[:max_sequence_length]after encoding (inencode_promptforpipeline_qwen_image.pyandpipeline_qwen_image_layered.py). Since validation now rejects overlong prompts, this slice is effectively a no-op for user text. However, the template suffix tokens can push the total sequence beyondmax_sequence_length, so the post-encoding slice still serves a purpose for trimming template overhead. This is correct but worth a brief inline comment explaining why the slice is still needed. -
Test coverage is solid. The
_RejectingTextEncoderpattern is a nice way to assert the encoder is never reached for rejected prompts. The boundary tests for template suffix exclusion and image placeholder exclusion are well thought out.
LGTM. Approving.
| QwenImageLayeredPipeline, | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
This UT looks missing cpu and core_model pymark.
|
|
||
| def __call__(self, *args, **kwargs): | ||
| raise AssertionError("text encoder should not run for prompts that exceed max_sequence_length") | ||
|
|
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
3991ecb to
21851d6
Compare
… before encoding (vllm-project#2847) Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: david6666666 <530634352@qq.com>
…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com> Signed-off-by: nainiu258 <cperfect02@163.com>
…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>
… before encoding (vllm-project#2847) Signed-off-by: david6666666 <530634352@qq.com>
…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>
… before encoding (vllm-project#2847) Signed-off-by: david6666666 <530634352@qq.com>
…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>
Summary
max_sequence_lengthbefore the text encoder runs instead of relying on silent truncation or post-encoder slicingmax_sequence_length=1024effective across Qwen-Image, Qwen-Image-Layered, Qwen-Image-Edit, and Qwen-Image-Edit-PlusT2V/I2V/TI2V/VACE) reject overlong prompts before UMT5 encoding, while preserving the existing default limit of512Validation
python -m pytest -q tests/diffusion/models/qwen_image/test_qwen_image_max_sequence_length.py tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.pypython -m ruff check vllm_omni/diffusion/models/qwen_image/prompt_utils.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit_plus.py vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_layered.py tests/diffusion/models/qwen_image/test_qwen_image_max_sequence_length.py vllm_omni/diffusion/models/wan2_2/prompt_utils.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_i2v.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_ti2v.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_vace.py tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.pyQwen-Image: short prompt succeeded at/v1/images/generations; a 5000-word prompt failed withgot 5006 tokens, but \max_sequence_length` is 1024`Qwen-Image-Edit: short prompt succeeded at/v1/images/edits; a 5000-word prompt failed withgot 5009 tokens, but \max_sequence_length` is 1024`Wan2.2-T2V-A14B-Diffusers: anum_inference_steps=1,num_frames=5request completed successfully at/v1/videos; a 5000-word prompt failed withgot 5001 tokens, but \max_sequence_length` is 512`Closes #2794.