[Model][Qwen3VL] Add torch.compile support for Qwen3VL#27741
[Model][Qwen3VL] Add torch.compile support for Qwen3VL#27741lgeiger wants to merge 1 commit intovllm-project:mainfrom
torch.compile support for Qwen3VL#27741Conversation
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
7576d4f to
29fb61c
Compare
|
I ran some benchmarks on a L40s and it looks like this change would increase memory usage. Previously I was able to run With this PR it seems like the maximum model length would decrease to @Lucaskabela Have you seen a similar behaviour for Qwen2.5 VL? Performance wise it also looks like throughput is worse: main: torch compiled: |
|
Hm I didn't observe the model length issues in my previous PR, as memory usage shouldn't increase during runtime (just compile time, unless we are doing some tricks here); the throughput decrease also seems odd to me since the Time per Output and ITL are both improving; seems the TTFT is dropping a bit here I wonder if there is some dimension we need to mark dynamic here? If we are recompiling, this could explain the higher TTFT/memory increase |
|
One way we can check is running tlparse and looking at the logs - can you try prefixing your command with
I will try to look at this tomorrow, but am also trying to get some vLLM changes into the 2.9.1 pytorch release so may not be able to get to it; will update after investigating on my end |
All good. I'm just documenting it here. I'll also have a look when I have time later this or next week. |
|
I also wonder if the FP8 extension could be contributing to this overhead? I haven't looked much into how this quantization interplays with compile |
|
Running a warmed up (run benchmark twice, take the second one) model, I got: vs I think this supports my idea the current integration may have some recompile happening first. I didn't observe the same size issues but couldn't run the command you provided on main so had to reduce my seq_len size to fit on my machine. Will investigate to see about recompiles with tlparse |
|
So I tried running |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Sometimes, the shape of tensor maybe related to image size and this will cause recompile. |
Purpose
This is a followup to #23207 and adds
torch.compilesupport to Qwen3VL. I'm keeping it as a draft PR until I had time to run some benchmarks and correctness tests later this week./cc @Lucaskabela
Test Plan
Test Result