[BugFix] fix 3vl dense model load quant weight#6100
[BugFix] fix 3vl dense model load quant weight#6100wangxiyuan merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request adds a prefix mapping for the qwen3_vl_text model to fix a quantization weight loading issue. The change is a good step, but appears to be incomplete. To ensure correct quantization for packed modules, a corresponding entry for qwen3_vl_text is likely needed in packed_modules_model_mapping. I've added a comment with details on this potential issue.
| "qwen3_vl_text": { | ||
| "visual.": "model.visual.", | ||
| "language_model.lm_head.": "lm_head.", | ||
| "language_model.model.": "model.language_model.", | ||
| }, |
There was a problem hiding this comment.
While adding the prefix mapping for qwen3_vl_text is correct, a corresponding entry for this model type appears to be missing from packed_modules_model_mapping in this file (around line 312). Other dense Qwen VL models, such as qwen2_5_vl, have entries for packed modules like qkv_proj and gate_up_proj. If qwen3_vl_text also uses such packed modules, omitting it from packed_modules_model_mapping will likely cause a KeyError during quantization setup. Please consider adding an entry for qwen3_vl_text to packed_modules_model_mapping, similar to the one for qwen2_5_vl, to ensure correct handling of packed modules.
…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits) [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032) [BugFix] fix 3vl dense model load quant weight (vllm-project#6100) [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641) [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145) [CI]Install clang in dokerfile for triton ascend (vllm-project#4409) [Main] Upgrade PTA to 2.9.0 (vllm-project#6112) [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721) [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124) [BugFix]converting pa get_workspace back to capturing (vllm-project#5833) [CI] optimize lint term (vllm-project#5986) [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042) [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015) [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097) [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110) [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758) [bugfix] adapt_remote_request_id (vllm-project#6051) [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143) [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702) [CI] Upgrade CANN to 8.5.0 (vllm-project#6070) Default enable MLAPO (vllm-project#5952) ...
…6103) cherry pick from pr #6100 What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. Does this PR introduce any user-facing change? no How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. vLLM version: v0.13.0 vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
…llm-project#6103) cherry pick from pr vllm-project#6100 What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. Does this PR introduce any user-facing change? no How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. vLLM version: v0.13.0 vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
…llm-project#6103) cherry pick from pr vllm-project#6100 What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. Does this PR introduce any user-facing change? no How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. vLLM version: v0.13.0 vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
### What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
### What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Fix Qwen3VL dense quant model load weights Error. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
What this PR does / why we need it?
Fix Qwen3VL dense quant model load weights Error.
Does this PR introduce any user-facing change?
no
How was this patch tested?
The Qwen3VL quantized model service initialized successfully. Inference requests are processed correctly, and valid responses are returned.