Skip to content

[Bugfix]: resolve torch.compile cache conflict between mm_encoder_tp_modes#32842

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
HirokenOvo:fix/mm_encoder_torch_compile_hash
Jan 24, 2026
Merged

[Bugfix]: resolve torch.compile cache conflict between mm_encoder_tp_modes#32842
DarkLight1337 merged 1 commit intovllm-project:mainfrom
HirokenOvo:fix/mm_encoder_torch_compile_hash

Conversation

@HirokenOvo
Copy link
Contributor

@HirokenOvo HirokenOvo commented Jan 22, 2026

Purpose

PR #23207 introduced torch compile support for the ViT part of Qwen2.5-VL. This PR addresses an issue where enabling torch.compile for the vision encoder (--compilation-config '{"compile_mm_encoder": true}') caused crashes when switching between --mm-encoder-tp-mode "weights" and --mm-encoder-tp-mode "data".

The Problem:

vLLM uses VllmConfig.compute_hash() to identify unique configurations for caching compiled graphs. However, mm_encoder_tp_mode was missing from this hash calculation. As a result, running the model with weights mode generated a cache that data mode would try to reuse (or vice versa). Since these modes result in different tensor shapes/strides for the ViT, this caused an AssertionError in the generated Inductor kernels.

The Solution:

  1. Updated vllm/config/multimodal.py: Added mm_encoder_tp_mode to the factors used in MultiModalConfig.compute_hash().
  2. Updated vllm/config/vllm.py: Modified VllmConfig.compute_hash() to explicitly include the multimodal_config hash if and only if compile_mm_encoder is enabled. This ensures correct cache isolation without affecting the hash for non-compiled runs.

Related Error Log

vllm serve /data/models/qwen2_5vl-3B/ --compilation-config '{"compile_mm_encoder": true}' --tensor-parallel-size 2 --mm-encoder-tp-mode "data" --max-model-len 8192 --gpu-memory-utilization 0.5
vllm serve /data/models/qwen2_5vl-3B/ --compilation-config '{"compile_mm_encoder": true}' --tensor-parallel-size 2 --mm-encoder-tp-mode "weights" --max-model-len 8192 --gpu-memory-utilization 0.5

(Worker_TP0 pid=2909827) ERROR 01-22 16:56:04 [multiproc_executor.py:839]   File "/tmp/torchinductor_root/he/chehoqaxuhke4t2nqjka646fx2in2rwogdaqfv5djiz374ppmcpq.py", line 550, in call
(Worker_TP0 pid=2909827) ERROR 01-22 16:56:04 [multiproc_executor.py:839]     assert_size_stride(arg4_1, (3456, 1152), (1152, 1))
(Worker_TP0 pid=2909827) ERROR 01-22 16:56:04 [multiproc_executor.py:839] AssertionError: expected size 1728==3456, stride 1152==1152 at dim=0

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 24, 2026 12:51
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 24, 2026
@DarkLight1337
Copy link
Member

DarkLight1337 commented Jan 24, 2026

cc @ywang96 @Isotr0py

Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
auto-merge was automatically disabled January 24, 2026 12:52

Head branch was pushed to by a user without write access

@HirokenOvo HirokenOvo force-pushed the fix/mm_encoder_torch_compile_hash branch from f059fee to 4ff69ba Compare January 24, 2026 12:52
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 24, 2026 12:57
@DarkLight1337 DarkLight1337 merged commit 1209b78 into vllm-project:main Jan 24, 2026
48 checks passed
@HirokenOvo HirokenOvo deleted the fix/mm_encoder_torch_compile_hash branch January 24, 2026 15:02
ms1design pushed a commit to ms1design/vllm that referenced this pull request Jan 24, 2026
…modes (vllm-project#32842)

Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
cwazai pushed a commit to cwazai/vllm that referenced this pull request Jan 25, 2026
…modes (vllm-project#32842)

Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: 陈建华 <1647430658@qq.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…modes (vllm-project#32842)

Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants