[BugFix] LoRA: Support loading base_layer of experts by HollowMan6 · Pull Request #31104 · vllm-project/vllm

HollowMan6 · 2025-12-22T00:44:31Z

Purpose

This PR fixes weight loading when LoRA is enabled, i.e., we have base_layer added to the:

model.layers.0.mlp.experts.0.up_proj.weight -> model.layers.0.mlp.experts.0.up_proj.base_layer.weight

Currently before this fix, the patched code will handled this as:
model.layers.0.mlp.experts.w13_base_layer.weight, which is wrong and
it should actually be model.layers.0.mlp.experts.base_layer.w13_weight

Test Plan

Test on Qwen3 30B A3B

Test Result

Looks good.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

_{✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.}

gemini-code-assist

Code Review

This pull request addresses a bug in weight loading for FusedMoE layers when LoRA is enabled. The changes correctly handle the base_layer component in weight names. The core logic is adjusted in make_expert_params_mapping, and this fix is propagated by adding an is_lora_enabled flag to this function, which is then passed from various model definitions. The overall approach is sound and the widespread changes are necessary boilerplate to support the fix. I have one suggestion to improve the robustness of the string formatting to prevent potential issues with certain model configurations.

vllm/model_executor/layers/fused_moe/layer.py

hmellor

We should not be duplicating this code in every model. It should be abstracted to a util.

Also, please make sure that the fix is also applied to

vllm/vllm/model_executor/layers/fused_moe/layer.py

Line 1366 in 73cfb7a

def load_weights(

HollowMan6 · 2025-12-23T19:01:48Z

@hmellor Thanks for reviewing, now this is changed as requested!

cc: @jeejeelee

HollowMan6 · 2026-01-02T20:03:01Z

Current CI failures don't seem to be caused by this PR.

hmellor · 2026-01-02T20:37:41Z

I appreciate that it works.

What I don't like is that it means that every MoE model has to have this line added to it just for VeRL+LoRA.

I'll have a think to see what can be done.

Signed-off-by: Hollow Man <hollowman@opensuse.org>

chatgpt-codex-connector · 2026-01-03T22:02:13Z

To use Codex here, create a Codex account and connect to github.

gemini-code-assist

Code Review

This pull request fixes an issue with loading LoRA weights for experts by correctly handling the base_layer path component. The core logic change is in vllm/model_executor/layers/fused_moe/layer.py, where make_expert_params_mapping is updated to detect if LoRA is active and adjust weight paths accordingly. The other changes are mechanical updates to pass the model instance to this method across various model files.

My main feedback is on the method used to detect if LoRA is enabled. The current implementation iterates over all model parameters to check for the presence of .base_layer. in their names. This is not only inefficient but also brittle, as it could be triggered incorrectly by models that happen to use this string in parameter names for other reasons. I've suggested a more robust and performant approach that directly checks the LoRA configuration, which is a more reliable indicator of LoRA being active.

vllm/model_executor/layers/fused_moe/layer.py

Copilot

Pull request overview

Copilot reviewed 35 out of 35 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b4bd65a4b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vllm/model_executor/layers/fused_moe/layer.py

HollowMan6 · 2026-01-03T22:15:22Z

What I don't like is that it means that every MoE model has to have this line added to it just for VeRL+LoRA.

@hmellor I think I just managed to remove this requirement by modifying make_expert_params_mapping only, so that now we determine them by checking if any parameter name contains .base_layer., which suggests that LoRA is enabled. I have tested this with LoRA enabled, either with weight initialization loading or weight refit in verl-project/verl#4632, and they work fine. Please let me know your comments.

cc: @jeejeelee

) Signed-off-by: Hollow Man <hollowman@opensuse.org>

) Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

hmellor · 2026-01-27T20:35:36Z

@HollowMan6 I was away when you found this solution, I like it! Thanks for figuring this out

) Signed-off-by: Hollow Man <hollowman@opensuse.org>

This PR fixes Qwen3.5 LoRA loading for `in_proj_qkvz` in vLLM and enable `base_layer` for experts (same as vllm-project#31104). For Qwen3.5, the underlying merged projection has 4 physical output slices: - `q` - `k` - `v` - `z` but `packed_modules_mapping` only exposes 2 logical LoRA modules: - `in_proj_qkv` - `in_proj_z` vLLM currently misaligns these two representations during LoRA initialization and dummy adapter setup, which causes startup failures in the dummy LoRA path. There are two mismatches in the current implementation: 1. In `column_parallel_linear.py`, this layer is incorrectly routed to `MergedColumnParallelLinearWithLoRA`, which assumes the LoRA tensors are already aligned with `self.n_slices=4` and reads `lora_b` accordingly. 2. In `model_manager.py`, the dummy LoRA path only constructs `lora_b` for the 2 logical packed modules, and the shapes are derived from only the first two physical slices. As a result, during startup dummy runs, the `lora_b` list length and slice shapes do not match the underlying 4-slice layer layout, and the flow eventually fails in `slice_lora_b` with `IndexError`. - Route `MergedColumnParallelLinear` layers with 3+ physical output slices to the variable-slice LoRA implementation. - Build dummy LoRA weights using grouped logical output dimensions, and expand the 2 logical LoRA groups into the 4 physical slices during `set_lora`. End to end tests Now LoRA support for Qwen3.5 can be enabled without errors like: ```log File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 660, in set_lora super().set_lora(index, lora_a, lora_b) File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 265, in set_lora lora_b = self.slice_lora_b(lora_b) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/column_parallel_linear.py", line 249, in slice_lora_b if (lora_b_i := lora_b[i]) is not None: ~~~~~~^^^ IndexError: list index out of range ``` Signed-off-by: Hollow Man <hollowman@opensuse.org>

This PR extends vllm-project#31104 to the remaining model-specific MoE loaders that still hardcode expert parameter names without `.base_layer` during weight loading. `vllm-project#31104` fixed the shared LoRA expert-loading path, but these loaders still build their own expert remapping tables: - `Qwen3.5` - `Qwen3.5 MTP` - `Qwen3-VL MoE` - `Step3 Text` - `Step3.5` - `Step3.5 MTP` - Detect whether the local parameter set contains `.base_layer.` expert parameters. - Conditionally insert `base_layer.` into the expert remapping entries for the affected loaders. - Keep the non-LoRA path unchanged when `base_layer` is absent. This preserves existing checkpoint-loading behavior for regular models while allowing LoRA-wrapped expert weights to resolve correctly. Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 requested review from hmellor, luccafong, mgoin, patrickvonplaten, pavanimajety and sighingnow as code owners December 22, 2025 00:44

mergify bot added deepseek Related to DeepSeek models llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models speculative-decoding labels Dec 22, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Dec 22, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Dec 22, 2025

HollowMan6 force-pushed the lora_base_layer branch from d822bed to 210bc7b Compare December 22, 2025 00:47

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Show resolved Hide resolved

HollowMan6 mentioned this pull request Dec 22, 2025

[megatron] feat: LoRA adapter only refit (TensorLoRARequest) verl-project/verl#4632

Merged

7 tasks

jeejeelee reviewed Dec 22, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Show resolved Hide resolved

jeejeelee self-assigned this Dec 22, 2025

HollowMan6 force-pushed the lora_base_layer branch 2 times, most recently from 00c09c7 to f9008c9 Compare December 22, 2025 12:36

HollowMan6 changed the title ~~[BugFix] LoRA: FusedMoE make_expert_params_mapping supports base_layer~~ [BugFix] LoRA: Support loading base_layer of experts Dec 22, 2025

hmellor previously requested changes Dec 23, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to In progress in gpt-oss Issues & Enhancements Dec 23, 2025

HollowMan6 force-pushed the lora_base_layer branch from f9008c9 to 5c39293 Compare December 23, 2025 18:55

HollowMan6 requested a review from 22quinn as a code owner December 23, 2025 18:55

HollowMan6 force-pushed the lora_base_layer branch from 5c39293 to d70645e Compare December 23, 2025 18:59

HollowMan6 requested review from hmellor and jeejeelee December 23, 2025 19:01

HollowMan6 force-pushed the lora_base_layer branch from c1022bd to abc19bf Compare December 24, 2025 16:52

HollowMan6 requested a review from hmellor January 2, 2026 19:59

[BugFix] LoRA: Support loading base_layer of experts

0b4bd65

Signed-off-by: Hollow Man <hollowman@opensuse.org>

auto-merge was automatically disabled January 3, 2026 22:01
Head branch was pushed to by a user without write access

HollowMan6 force-pushed the lora_base_layer branch from c336933 to 0b4bd65 Compare January 3, 2026 22:01

HollowMan6 requested a review from Copilot January 3, 2026 22:01

Copilot started reviewing on behalf of HollowMan6 January 3, 2026 22:01 View session

gemini-code-assist bot reviewed Jan 3, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Show resolved Hide resolved

Copilot AI reviewed Jan 3, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Jan 3, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Show resolved Hide resolved

HollowMan6 requested a review from jeejeelee January 3, 2026 22:15

Merge branch 'main' into lora_base_layer

701b68e

jeejeelee merged commit 4829148 into vllm-project:main Jan 7, 2026
58 checks passed

github-project-automation bot moved this from In progress to Done in gpt-oss Issues & Enhancements Jan 7, 2026

HollowMan6 deleted the lora_base_layer branch January 7, 2026 07:00

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[BugFix] LoRA: Support loading base_layer of experts (vllm-project#31104

090fe91

) Signed-off-by: Hollow Man <hollowman@opensuse.org>

lkm2835 mentioned this pull request Jan 12, 2026

[BugFix] fix FusedMoE.make_expert_params_mapping in EXAONE-MoE #32196

Merged

5 tasks

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[BugFix] LoRA: Support loading base_layer of experts (vllm-project#31104

4a5f20f

) Signed-off-by: Hollow Man <hollowman@opensuse.org>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[BugFix] LoRA: Support loading base_layer of experts (vllm-project#31104

8f2cdc1

) Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ai-infos mentioned this pull request Jan 25, 2026

[Doc]: Share Working / Failed Models nlzy/vllm-gfx906#29

Open

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[BugFix] LoRA: Support loading base_layer of experts (vllm-project#31104

acfb029

) Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 mentioned this pull request Mar 15, 2026

[Bugfix] LoRA: extend expert base_layer loading to Qwen3.5 and Step3.x #37114

Open

5 tasks

Uh oh!

Conversation

HollowMan6 commented Dec 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

HollowMan6 commented Dec 23, 2025

Uh oh!

HollowMan6 commented Jan 2, 2026

Uh oh!

hmellor commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Jan 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

HollowMan6 commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hmellor commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HollowMan6 commented Dec 22, 2025 •

edited by github-actions bot

Loading

hmellor commented Jan 2, 2026 •

edited

Loading

HollowMan6 commented Jan 3, 2026 •

edited

Loading