[Bugfix][LoRA] Fix Qwen35 LoRA#36976
Conversation
There was a problem hiding this comment.
Code Review
This pull request aims to fix LoRA support for Qwen3.5 models. The main change is to split the fused in_proj_qkvz layer into separate in_proj_qkv and in_proj_z layers when LoRA is enabled. This required modifications to layer initialization, the forward pass, weight loading, and the packed_modules_mapping for LoRA. While the overall approach is sound, I've identified a critical issue in Qwen3_5ForConditionalGeneration where the packed_modules_mapping is not correctly initialized, which would likely prevent LoRA from working correctly for that model. I have provided a specific code suggestion to address this.
musab-mk
left a comment
There was a problem hiding this comment.
I tested this PR and I can confirm that this fixes the IndexError in _capture_cudagraphs for LoRA adapter on Qwen/Qwen3.5-397B-A17B-FP8 as a base.
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
@jeejeelee I think we can proceed with merging in this fix thanks! |
|
@dcmaddix I think we are waiting for @sighingnow to approve |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
0dcdf60 to
1a1c491
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
Will this PR be included in v0.18? Installing vLLM from source is somewhat cumbersome |
|
No, we have cut the branch for v0.18 a few days ago already |
|
Got it. Thanks anyway |
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
I tried building from source with but the build does not complete successfully: `ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279629; error : Unexpected instruction types specified for 'mma'
|
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.