Fix AttributeError in Qwen3.5 GDN layers with quantized models#37448
Fix AttributeError in Qwen3.5 GDN layers with quantized models#37448mgoin merged 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
The changes correctly resolve the AttributeError encountered in Qwen3.5 GDN layers when using quantized models. By replacing the access to .weight.shape[0] with sum(.output_sizes), the pull request ensures compatibility with MergedColumnParallelLinear layers that do not expose a .weight attribute under quantization. This is a direct and effective fix for the identified issue.
Replace self.in_proj_qkvz.weight.shape[0] and self.in_proj_ba.weight.shape[0] with sum(self.in_proj_qkvz.output_sizes) and sum(self.in_proj_ba.output_sizes). MergedColumnParallelLinear does not expose a .weight attribute when using quantization methods like compressed-tensors/AWQ, causing an AttributeError during the forward pass. The output_sizes attribute is always available on MergedColumnParallelLinear and provides the same information. Fixes vllm-project#37444 Signed-off-by: Jim Smith <jim@joshua8.ai>
76be9c9 to
dbc8248
Compare
|
Thanks for the fix! Sorry to break AWQ models. |
|
I got the following error when using fp8 models |
|
JaheimLee, Were those errors in the existing code or after the PR I submitted? |
It's also caused by #36795 in the existing code |
.github/workflows/bc-lint.yml
Outdated
| @@ -0,0 +1,29 @@ | |||
| name: BC Lint | |||
There was a problem hiding this comment.
Is this workflow needed for this fix?
|
@JaheimLee I don't see an error to run fp8 model. Which model did you run? Thanks! |
I use the official 27B fp8 model. |
Signed-off-by: mgoin <mgoin64@gmail.com>
@JaheimLee Thanks for reporting this! This is because of shape mismatch. Need to change to: |
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>
…project#37448) Signed-off-by: Jim Smith <jim@joshua8.ai> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
…project#37448) Signed-off-by: Jim Smith <jim@joshua8.ai> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
…project#37448) Signed-off-by: Jim Smith <jim@joshua8.ai> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Summary
self.in_proj_qkvz.weight.shape[0]andself.in_proj_ba.weight.shape[0]withsum(self.in_proj_qkvz.output_sizes)andsum(self.in_proj_ba.output_sizes)in bothqwen3_5.pyandqwen3_next.pyMergedColumnParallelLineardoes not expose a.weightattribute when using quantization methods like compressed-tensors/AWQ, causing anAttributeErrorduring the forward passoutput_sizesattribute is always available onMergedColumnParallelLinearand provides the same total output dimension needed by thegdn_in_projcustom op for shape tracingMotivation
This fixes a regression introduced in #36795 where the new
gdn_in_projcustom op accessesself.in_proj_qkvz.weight.shape[0]andself.in_proj_ba.weight.shape[0]. With quantized models (e.g.,cyankiwi/Qwen3.5-9B-AWQ-4bitusing compressed-tensors), theMergedColumnParallelLinearlayer does not have a.weightattribute — the weight is managed by the quantization kernel. This causes:Fixes #37444
Test plan
cyankiwi/Qwen3.5-9B-AWQ-4bitloads and runs inference without error🤖 Generated with Claude Code