[BugFix] fix 3vl dense model load quant weight by shaopeng-666 · Pull Request #6100 · vllm-project/vllm-ascend

shaopeng-666 · 2026-01-21T11:01:17Z

What this PR does / why we need it?

Fix Qwen3VL dense quant model load weights Error.

Does this PR introduce any user-facing change?

no

How was this patch tested?

The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned.

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@d682094

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>

github-actions · 2026-01-21T11:01:37Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds a prefix mapping for the qwen3_vl_text model to fix a quantization weight loading issue. The change is a good step, but appears to be incomplete. To ensure correct quantization for packed modules, a corresponding entry for qwen3_vl_text is likely needed in packed_modules_model_mapping. I've added a comment with details on this potential issue.

gemini-code-assist · 2026-01-21T11:02:49Z

+    "qwen3_vl_text": {
+        "visual.": "model.visual.",
+        "language_model.lm_head.": "lm_head.",
+        "language_model.model.": "model.language_model.",
+    },


While adding the prefix mapping for qwen3_vl_text is correct, a corresponding entry for this model type appears to be missing from packed_modules_model_mapping in this file (around line 312). Other dense Qwen VL models, such as qwen2_5_vl, have entries for packed modules like qkv_proj and gate_up_proj. If qwen3_vl_text also uses such packed modules, omitting it from packed_modules_model_mapping will likely cause a KeyError during quantization setup. Please consider adding an entry for qwen3_vl_text to packed_modules_model_mapping, similar to the one for qwen2_5_vl, to ensure correct handling of packed modules.

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits) [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032) [BugFix] fix 3vl dense model load quant weight (vllm-project#6100) [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641) [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145) [CI]Install clang in dokerfile for triton ascend (vllm-project#4409) [Main] Upgrade PTA to 2.9.0 (vllm-project#6112) [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721) [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124) [BugFix]converting pa get_workspace back to capturing (vllm-project#5833) [CI] optimize lint term (vllm-project#5986) [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042) [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015) [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097) [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110) [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758) [bugfix] adapt_remote_request_id (vllm-project#6051) [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143) [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702) [CI] Upgrade CANN to 8.5.0 (vllm-project#6070) Default enable MLAPO (vllm-project#5952) ...

…6103) cherry pick from pr #6100 What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. Does this PR introduce any user-facing change? no How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. vLLM version: v0.13.0 vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>

…llm-project#6103) cherry pick from pr vllm-project#6100 What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. Does this PR introduce any user-facing change? no How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. vLLM version: v0.13.0 vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>

### What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>

### What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>

fix 3vl dense model load quant weight

71c2a7b

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>

shaopeng-666 requested a review from wangxiyuan as a code owner January 21, 2026 11:01

github-actions bot added the module:quantization label Jan 21, 2026

gemini-code-assist bot reviewed Jan 21, 2026

View reviewed changes

weijinqian0 added ready read for review ready-for-test start test by label for PR labels Jan 21, 2026

wangxiyuan approved these changes Jan 22, 2026

View reviewed changes

shaopeng-666 mentioned this pull request Jan 22, 2026

[0.13.0][cherry-pick][BugFix] fix 3vl dense model load quant weight #6103

Merged

wangxiyuan merged commit 176bfc3 into vllm-project:main Jan 22, 2026
43 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] fix 3vl dense model load quant weight#6100

[BugFix] fix 3vl dense model load quant weight#6100
wangxiyuan merged 1 commit intovllm-project:mainfrom
shaopeng-666:fix_quant_3vl

shaopeng-666 commented Jan 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shaopeng-666 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shaopeng-666 commented Jan 21, 2026 •

edited

Loading