Skip to content

[BugFix] LoRA: Support loading base_layer of experts#31104

Merged
jeejeelee merged 2 commits intovllm-project:mainfrom
HollowMan6:lora_base_layer
Jan 7, 2026
Merged

[BugFix] LoRA: Support loading base_layer of experts#31104
jeejeelee merged 2 commits intovllm-project:mainfrom
HollowMan6:lora_base_layer

Conversation

@HollowMan6
Copy link
Copy Markdown
Contributor

@HollowMan6 HollowMan6 commented Dec 22, 2025

Purpose

This PR fixes weight loading when LoRA is enabled, i.e., we have base_layer added to the:

model.layers.0.mlp.experts.0.up_proj.weight -> model.layers.0.mlp.experts.0.up_proj.base_layer.weight

Currently before this fix, the patched code will handled this as:
model.layers.0.mlp.experts.w13_base_layer.weight, which is wrong and
it should actually be model.layers.0.mlp.experts.base_layer.w13_weight

Test Plan

Test on Qwen3 30B A3B

Test Result

Looks good.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.

@mergify mergify bot added deepseek Related to DeepSeek models llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models speculative-decoding labels Dec 22, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in weight loading for FusedMoE layers when LoRA is enabled. The changes correctly handle the base_layer component in weight names. The core logic is adjusted in make_expert_params_mapping, and this fix is propagated by adding an is_lora_enabled flag to this function, which is then passed from various model definitions. The overall approach is sound and the widespread changes are necessary boilerplate to support the fix. I have one suggestion to improve the robustness of the string formatting to prevent potential issues with certain model configurations.

@jeejeelee jeejeelee self-assigned this Dec 22, 2025
@HollowMan6 HollowMan6 force-pushed the lora_base_layer branch 2 times, most recently from 00c09c7 to f9008c9 Compare December 22, 2025 12:36
@HollowMan6 HollowMan6 changed the title [BugFix] LoRA: FusedMoE make_expert_params_mapping supports base_layer [BugFix] LoRA: Support loading base_layer of experts Dec 22, 2025
Copy link
Copy Markdown
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be duplicating this code in every model. It should be abstracted to a util.

Also, please make sure that the fix is also applied to

@HollowMan6
Copy link
Copy Markdown
Contributor Author

@hmellor Thanks for reviewing, now this is changed as requested!

cc: @jeejeelee

@HollowMan6 HollowMan6 requested a review from hmellor January 2, 2026 19:59
@HollowMan6
Copy link
Copy Markdown
Contributor Author

Current CI failures don't seem to be caused by this PR.

@hmellor
Copy link
Copy Markdown
Member

hmellor commented Jan 2, 2026

I appreciate that it works.

What I don't like is that it means that every MoE model has to have this line added to it just for VeRL+LoRA.

I'll have a think to see what can be done.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
auto-merge was automatically disabled January 3, 2026 22:01

Head branch was pushed to by a user without write access

@HollowMan6 HollowMan6 requested a review from Copilot January 3, 2026 22:01
@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue with loading LoRA weights for experts by correctly handling the base_layer path component. The core logic change is in vllm/model_executor/layers/fused_moe/layer.py, where make_expert_params_mapping is updated to detect if LoRA is active and adjust weight paths accordingly. The other changes are mechanical updates to pass the model instance to this method across various model files.

My main feedback is on the method used to detect if LoRA is enabled. The current implementation iterates over all model parameters to check for the presence of .base_layer. in their names. This is not only inefficient but also brittle, as it could be triggered incorrectly by models that happen to use this string in parameter names for other reasons. I've suggested a more robust and performant approach that directly checks the LoRA configuration, which is a more reliable indicator of LoRA being active.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 35 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b4bd65a4b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@HollowMan6
Copy link
Copy Markdown
Contributor Author

HollowMan6 commented Jan 3, 2026

What I don't like is that it means that every MoE model has to have this line added to it just for VeRL+LoRA.

@hmellor I think I just managed to remove this requirement by modifying make_expert_params_mapping only, so that now we determine them by checking if any parameter name contains .base_layer., which suggests that LoRA is enabled. I have tested this with LoRA enabled, either with weight initialization loading or weight refit in verl-project/verl#4632, and they work fine. Please let me know your comments.

cc: @jeejeelee

@HollowMan6 HollowMan6 requested a review from jeejeelee January 3, 2026 22:15
@jeejeelee jeejeelee merged commit 4829148 into vllm-project:main Jan 7, 2026
58 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in gpt-oss Issues & Enhancements Jan 7, 2026
@HollowMan6 HollowMan6 deleted the lora_base_layer branch January 7, 2026 07:00
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
@hmellor
Copy link
Copy Markdown
Member

hmellor commented Jan 27, 2026

@HollowMan6 I was away when you found this solution, I like it! Thanks for figuring this out

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
HollowMan6 added a commit to HollowMan6/vllm that referenced this pull request Mar 15, 2026
This PR fixes Qwen3.5 LoRA loading for `in_proj_qkvz` in vLLM and enable `base_layer` for experts (same as vllm-project#31104).

For Qwen3.5, the underlying merged projection has 4 physical output slices:

- `q`
- `k`
- `v`
- `z`

but `packed_modules_mapping` only exposes 2 logical LoRA modules:

- `in_proj_qkv`
- `in_proj_z`

vLLM currently misaligns these two representations during LoRA initialization and dummy adapter setup, which causes startup failures in the dummy LoRA path.

There are two mismatches in the current implementation:

1. In `column_parallel_linear.py`, this layer is incorrectly routed to `MergedColumnParallelLinearWithLoRA`, which assumes the LoRA tensors are already aligned with `self.n_slices=4` and reads `lora_b` accordingly.

2. In `model_manager.py`, the dummy LoRA path only constructs `lora_b` for the 2 logical packed modules, and the shapes are derived from only the first two physical slices. As a result, during startup dummy runs, the `lora_b` list length and slice shapes do not match the underlying 4-slice layer layout, and the flow eventually fails in `slice_lora_b` with `IndexError`.

- Route `MergedColumnParallelLinear` layers with 3+ physical output slices to the variable-slice LoRA implementation.
- Build dummy LoRA weights using grouped logical output dimensions, and expand the 2 logical LoRA groups into the 4 physical slices during `set_lora`.

End to end tests

Now LoRA support for Qwen3.5 can be enabled without errors like:

```log
  File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 660, in set_lora
    super().set_lora(index, lora_a, lora_b)
  File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 265, in set_lora
    lora_b = self.slice_lora_b(lora_b)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 249, in slice_lora_b
    if (lora_b_i := lora_b[i]) is not None:
                    ~~~~~~^^^
IndexError: list index out of range
```

Signed-off-by: Hollow Man <hollowman@opensuse.org>
HollowMan6 added a commit to HollowMan6/vllm that referenced this pull request Mar 16, 2026
This PR extends vllm-project#31104 to the remaining
model-specific MoE loaders that still hardcode expert
parameter names without `.base_layer` during weight loading.

`vllm-project#31104` fixed the shared LoRA expert-loading path, but these loaders
still build their own expert remapping tables:

- `Qwen3.5`
- `Qwen3.5 MTP`
- `Qwen3-VL MoE`
- `Step3 Text`
- `Step3.5`
- `Step3.5 MTP`

- Detect whether the local parameter set contains `.base_layer.` expert parameters.
- Conditionally insert `base_layer.` into the expert remapping entries for the affected loaders.
- Keep the non-LoRA path unchanged when `base_layer` is absent.

This preserves existing checkpoint-loading behavior for regular models while allowing LoRA-wrapped expert weights to resolve correctly.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models gpt-oss Related to GPT-OSS models llama Related to Llama models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants