[BugFix] LoRA: Support loading base_layer of experts#31104
[BugFix] LoRA: Support loading base_layer of experts#31104jeejeelee merged 2 commits intovllm-project:mainfrom
Conversation
d822bed to
210bc7b
Compare
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in weight loading for FusedMoE layers when LoRA is enabled. The changes correctly handle the base_layer component in weight names. The core logic is adjusted in make_expert_params_mapping, and this fix is propagated by adding an is_lora_enabled flag to this function, which is then passed from various model definitions. The overall approach is sound and the widespread changes are necessary boilerplate to support the fix. I have one suggestion to improve the robustness of the string formatting to prevent potential issues with certain model configurations.
00c09c7 to
f9008c9
Compare
hmellor
left a comment
There was a problem hiding this comment.
We should not be duplicating this code in every model. It should be abstracted to a util.
Also, please make sure that the fix is also applied to
f9008c9 to
5c39293
Compare
5c39293 to
d70645e
Compare
|
@hmellor Thanks for reviewing, now this is changed as requested! cc: @jeejeelee |
c1022bd to
abc19bf
Compare
|
Current CI failures don't seem to be caused by this PR. |
|
I appreciate that it works. What I don't like is that it means that every MoE model has to have this line added to it just for VeRL+LoRA. I'll have a think to see what can be done. |
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Head branch was pushed to by a user without write access
c336933 to
0b4bd65
Compare
|
To use Codex here, create a Codex account and connect to github. |
There was a problem hiding this comment.
Code Review
This pull request fixes an issue with loading LoRA weights for experts by correctly handling the base_layer path component. The core logic change is in vllm/model_executor/layers/fused_moe/layer.py, where make_expert_params_mapping is updated to detect if LoRA is active and adjust weight paths accordingly. The other changes are mechanical updates to pass the model instance to this method across various model files.
My main feedback is on the method used to detect if LoRA is enabled. The current implementation iterates over all model parameters to check for the presence of .base_layer. in their names. This is not only inefficient but also brittle, as it could be triggered incorrectly by models that happen to use this string in parameter names for other reasons. I've suggested a more robust and performant approach that directly checks the LoRA configuration, which is a more reliable indicator of LoRA being active.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 35 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b4bd65a4b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
@hmellor I think I just managed to remove this requirement by modifying cc: @jeejeelee |
|
@HollowMan6 I was away when you found this solution, I like it! Thanks for figuring this out |
This PR fixes Qwen3.5 LoRA loading for `in_proj_qkvz` in vLLM and enable `base_layer` for experts (same as vllm-project#31104). For Qwen3.5, the underlying merged projection has 4 physical output slices: - `q` - `k` - `v` - `z` but `packed_modules_mapping` only exposes 2 logical LoRA modules: - `in_proj_qkv` - `in_proj_z` vLLM currently misaligns these two representations during LoRA initialization and dummy adapter setup, which causes startup failures in the dummy LoRA path. There are two mismatches in the current implementation: 1. In `column_parallel_linear.py`, this layer is incorrectly routed to `MergedColumnParallelLinearWithLoRA`, which assumes the LoRA tensors are already aligned with `self.n_slices=4` and reads `lora_b` accordingly. 2. In `model_manager.py`, the dummy LoRA path only constructs `lora_b` for the 2 logical packed modules, and the shapes are derived from only the first two physical slices. As a result, during startup dummy runs, the `lora_b` list length and slice shapes do not match the underlying 4-slice layer layout, and the flow eventually fails in `slice_lora_b` with `IndexError`. - Route `MergedColumnParallelLinear` layers with 3+ physical output slices to the variable-slice LoRA implementation. - Build dummy LoRA weights using grouped logical output dimensions, and expand the 2 logical LoRA groups into the 4 physical slices during `set_lora`. End to end tests Now LoRA support for Qwen3.5 can be enabled without errors like: ```log File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 660, in set_lora super().set_lora(index, lora_a, lora_b) File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 265, in set_lora lora_b = self.slice_lora_b(lora_b) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 249, in slice_lora_b if (lora_b_i := lora_b[i]) is not None: ~~~~~~^^^ IndexError: list index out of range ``` Signed-off-by: Hollow Man <hollowman@opensuse.org>
This PR extends vllm-project#31104 to the remaining model-specific MoE loaders that still hardcode expert parameter names without `.base_layer` during weight loading. `vllm-project#31104` fixed the shared LoRA expert-loading path, but these loaders still build their own expert remapping tables: - `Qwen3.5` - `Qwen3.5 MTP` - `Qwen3-VL MoE` - `Step3 Text` - `Step3.5` - `Step3.5 MTP` - Detect whether the local parameter set contains `.base_layer.` expert parameters. - Conditionally insert `base_layer.` into the expert remapping entries for the affected loaders. - Keep the non-LoRA path unchanged when `base_layer` is absent. This preserves existing checkpoint-loading behavior for regular models while allowing LoRA-wrapped expert weights to resolve correctly. Signed-off-by: Hollow Man <hollowman@opensuse.org>
Purpose
This PR fixes weight loading when LoRA is enabled, i.e., we have
base_layeradded to the:model.layers.0.mlp.experts.0.up_proj.weight->model.layers.0.mlp.experts.0.up_proj.base_layer.weightCurrently before this fix, the patched code will handled this as:
model.layers.0.mlp.experts.w13_base_layer.weight, which is wrong andit should actually be
model.layers.0.mlp.experts.base_layer.w13_weightTest Plan
Test on Qwen3 30B A3B
Test Result
Looks good.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.