[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled#37912
[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled#37912Isotr0py wants to merge 5 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors LoRA handling for Qwen3.5 and Qwen3-Next models. Key changes include introducing an expand_packed_lora method to flexibly handle LoRA adapter groups that don't match the number of slices, and unifying the input projection logic in Qwen3.5 attention by removing LoRA-specific conditional paths. The create_dummy_lora function in model_manager.py contains a 'HACK' comment, which should either be replaced with a detailed explanation of the necessary logic or improved with a more robust solution.
| # HACK: overrides replacements for qkvz = qkv + z case. | ||
| # Any better methods to handle this case? | ||
| if n_slices != len(replacements): | ||
| replacements = [f"slice_{i}" for i in range(n_slices)] |
There was a problem hiding this comment.
The use of a 'HACK' comment here is concerning as it suggests the solution is not robust and could lead to future maintenance issues. Code with 'HACK' comments is often difficult to understand and easy to break.
If this logic is indeed the correct and necessary approach for handling dummy LoRA creation for packed modules like in_proj_qkvz, please replace the 'HACK' comment with a more detailed explanation. The explanation should clarify:
- Why there's a mismatch between
n_slicesandlen(replacements). - Why generating generic
slice_inames is the appropriate solution for creating dummy LoRAs in this scenario. - How this interacts with the loading of real LoRA weights.
A clear explanation will improve code maintainability and prevent future confusion.
Alternatively, if a more robust, less 'hacky' solution is possible (perhaps by making the relationship between packed modules and slices more explicit in the model configuration), that would be preferable.
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
Test Plan
Test Result
Tests pass at both TP=2 and TP=4
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.