Skip to content

[BugFix] fix 3vl dense model load quant weight#6100

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
shaopeng-666:fix_quant_3vl
Jan 22, 2026
Merged

[BugFix] fix 3vl dense model load quant weight#6100
wangxiyuan merged 1 commit intovllm-project:mainfrom
shaopeng-666:fix_quant_3vl

Conversation

@shaopeng-666
Copy link
Copy Markdown
Collaborator

@shaopeng-666 shaopeng-666 commented Jan 21, 2026

What this PR does / why we need it?

Fix Qwen3VL dense quant model load weights Error.

Does this PR introduce any user-facing change?

no

How was this patch tested?

The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned.

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a prefix mapping for the qwen3_vl_text model to fix a quantization weight loading issue. The change is a good step, but appears to be incomplete. To ensure correct quantization for packed modules, a corresponding entry for qwen3_vl_text is likely needed in packed_modules_model_mapping. I've added a comment with details on this potential issue.

Comment on lines +213 to +217
"qwen3_vl_text": {
"visual.": "model.visual.",
"language_model.lm_head.": "lm_head.",
"language_model.model.": "model.language_model.",
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While adding the prefix mapping for qwen3_vl_text is correct, a corresponding entry for this model type appears to be missing from packed_modules_model_mapping in this file (around line 312). Other dense Qwen VL models, such as qwen2_5_vl, have entries for packed modules like qkv_proj and gate_up_proj. If qwen3_vl_text also uses such packed modules, omitting it from packed_modules_model_mapping will likely cause a KeyError during quantization setup. Please consider adding an entry for qwen3_vl_text to packed_modules_model_mapping, similar to the one for qwen2_5_vl, to ensure correct handling of packed modules.

@weijinqian0 weijinqian0 added ready read for review ready-for-test start test by label for PR labels Jan 21, 2026
@wangxiyuan wangxiyuan merged commit 176bfc3 into vllm-project:main Jan 22, 2026
43 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 22, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits)
  [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032)
  [BugFix] fix 3vl dense model load quant weight (vllm-project#6100)
  [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641)
  [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145)
  [CI]Install clang in dokerfile for triton ascend (vllm-project#4409)
  [Main] Upgrade PTA to 2.9.0 (vllm-project#6112)
  [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721)
  [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124)
  [BugFix]converting pa get_workspace back to capturing (vllm-project#5833)
  [CI] optimize lint term (vllm-project#5986)
  [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042)
  [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015)
  [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097)
  [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110)
  [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758)
  [bugfix] adapt_remote_request_id (vllm-project#6051)
  [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143)
  [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702)
  [CI] Upgrade CANN to 8.5.0 (vllm-project#6070)
  Default enable MLAPO (vllm-project#5952)
  ...
yiz-liu pushed a commit that referenced this pull request Jan 22, 2026
…6103)

cherry pick from pr #6100 

What this PR does / why we need it?
Fix Qwen3VL dense quant model load weights Error.

Does this PR introduce any user-facing change?
no

How was this patch tested?
The Qwen3VL quantized model service initialized successfully. Inference
requests are processed correctly, and valid responses are returned.

vLLM version: v0.13.0
vLLM main:
vllm-project/vllm@d682094

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
…llm-project#6103)

cherry pick from pr vllm-project#6100 

What this PR does / why we need it?
Fix Qwen3VL dense quant model load weights Error.

Does this PR introduce any user-facing change?
no

How was this patch tested?
The Qwen3VL quantized model service initialized successfully. Inference
requests are processed correctly, and valid responses are returned.

vLLM version: v0.13.0
vLLM main:
vllm-project/vllm@d682094

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
…llm-project#6103)

cherry pick from pr vllm-project#6100 

What this PR does / why we need it?
Fix Qwen3VL dense quant model load weights Error.

Does this PR introduce any user-facing change?
no

How was this patch tested?
The Qwen3VL quantized model service initialized successfully. Inference
requests are processed correctly, and valid responses are returned.

vLLM version: v0.13.0
vLLM main:
vllm-project/vllm@d682094

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
Fix Qwen3VL dense quant model load weights Error.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
The Qwen3VL quantized model service initialized successfully. Inference
requests are processed correctly, and valid responses are returned.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
Fix Qwen3VL dense quant model load weights Error. 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
The Qwen3VL quantized model service initialized successfully. Inference
requests are processed correctly, and valid responses are returned.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
Fix Qwen3VL dense quant model load weights Error.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
The Qwen3VL quantized model service initialized successfully. Inference
requests are processed correctly, and valid responses are returned.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?
Fix Qwen3VL dense quant model load weights Error. 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
The Qwen3VL quantized model service initialized successfully. Inference
requests are processed correctly, and valid responses are returned.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:quantization ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants