[Revert] drop Wan2.2 prompt-length enforcement from #2847#2877
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
BLOCKING ISSUE: This PR changes the default max_sequence_length from 512 to 2048, which is a breaking change that affects all Wan2.2 users. Issue #2874 is a CI test failure. The appropriate fix is to either:
Changing the default value increases memory usage for ALL users just to fix a CI test. This is not the right approach. Please reconsider the fix for #2874 without breaking existing user behavior. |
|
Raising max_sequence_length to 2048 is a 4x increase. This fixes the immediate CI failure (#2874), but consider:
|
|
Plan to temporarily skip it with #2883 after this one is updated |
|
Follow-up update for the short-prompt regression introduced after raising Wan2.2
Validation run on this branch:
This keeps the larger prompt-length support for |
6006c9b to
9abff4f
Compare
Signed-off-by: david6666666 <530634352@qq.com>
b0972ef to
9528def
Compare
|
Branch update:
Current branch head:
Validation on this branch:
So the branch now contains only the targeted Wan2.2 revert on top of the current |
Signed-off-by: david6666666 <530634352@qq.com>
gcanlin
left a comment
There was a problem hiding this comment.
LGTM. Please also revert 5be6ff5#diff-f53575bf88041d823a9739163c467042dace2b532fc93a825f9f3e89fc169315 this change in v0.18.0.post1.
…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com> Signed-off-by: nainiu258 <cperfect02@163.com>
…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>
…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>
…vllm-project#2877) Signed-off-by: david6666666 <530634352@qq.com>
Summary
max_sequence_lengthfrom512to2048Root cause
Issue #2874 is triggered because Wan2.2 pipelines defaulted
max_sequence_lengthto512, while the CI I2V prompt is 654 tokens long.After raising the limit,
encode_prompt()still usedpadding="max_length"withmax_length=max_sequence_length, which padded short prompts to2048and introduced an unnecessary text-encoding slowdown for short-prompt requests.Validation
python -m py_compile vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_i2v.py vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_ti2v.py tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.pypython -m pytest --noconftest -q tests/diffusion/models/wan2_2/test_wan22_max_sequence_length.pypre-commit run --all-files/mnt/data1/huggingface/hub/models--Wan-AI--Wan2.2-I2V-A14B-Diffusers/snapshots/596658fd9ca6b7b71d5057529bbf319ecbc61d74inference_time_s=114.31679305434227Wan22I2VPipeline.text_encoder.forwardback at roughly0.015sper short-prompt call