[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. #27760
[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. #27760ywang96 merged 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
There was a problem hiding this comment.
Code Review
This pull request provides a temporary fix for a torch.compile issue with the Qwen2.5 VL vision model. The change involves commenting out the @support_torch_compile decorator for the Qwen2_5_VisionBlock, which effectively disables compilation for this block and avoids the Unsupported: Dynamic slicing with Tensor arguments error. This is a reasonable and effective short-term solution to unblock users while a more permanent fix for the underlying issue is investigated. The change is correct and I approve it.
Signed-off-by: Roger Wang <hey@rogerw.io>
There was a problem hiding this comment.
cc @huachenheli I tried testing this pretty extensively, but it is my first major feature work in vLLM so I am not surprised I missed something
That said, I never observed this error in my testing so think more context is needed on this PR.
Specifically:
- What versions (torch/vllm primarily) are you using?
- What command are you running to get this error?
Once those are provided on the PR, please re-ping me so I can get to work on fixing :) Thanks!
Updated my PR description with more details. PTAL. |
You should be also able to do this without modifying the code by passing |
|
I should have a fix ready pretty soon (within the hour) - the issue here is that compile doesn't support slices with tensors yet (but @laithsakka has a PR supporting this on nightly - see pytorch/pytorch#165074) So for now, we can move this to a custom op, and once we upgrade torch version to include Laith's fix we can move this outside the custom op :) |
|
Please see #27764 @huachenheli @ywang96 |
|
That said we should land and cherrypick this PR into release - the compile integration on the VisionBlock specifically needs more hardening before we push it to general release |
|
@Lucaskabela @ywang96 I think we also need to a way to safe guard this as it requires a very new version of Pytorch with that specific PR to be able to handle the dynamic slicing. |
…rily. (vllm-project#27760) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
…rily. (vllm-project#27760) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
…rily. (vllm-project#27760) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
…rily. (vllm-project#27760) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
Purpose
After #23207, Qwen2.5 VL's vision model has dynamic slicing issue on cuda with torch.compile. Temporarily disabling it for now.
cc. @Lucaskabela
Repro:
Command:
with forced SDPA backend in layer.py:
vllm & torch versions:
Test Plan
local vllm
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.