Skip to content

[Bugfix] LoRA: extend expert base_layer loading to Qwen3.5 and Step3.x#37114

Open
HollowMan6 wants to merge 1 commit intovllm-project:mainfrom
HollowMan6:lora_qwen3.5
Open

[Bugfix] LoRA: extend expert base_layer loading to Qwen3.5 and Step3.x#37114
HollowMan6 wants to merge 1 commit intovllm-project:mainfrom
HollowMan6:lora_qwen3.5

Conversation

@HollowMan6
Copy link
Contributor

@HollowMan6 HollowMan6 commented Mar 15, 2026

Purpose

This PR extends #31104 to the remaining
model-specific MoE loaders that still hardcode expert
parameter names without .base_layer during weight loading.

#31104 fixed the shared LoRA expert-loading path, but these loaders
still build their own expert remapping tables:

  • Qwen3.5

  • Qwen3.5 MTP

  • Qwen3-VL MoE

  • Step3 Text

  • Step3.5

  • Step3.5 MTP

  • Detect whether the local parameter set contains .base_layer. expert parameters.

  • Conditionally insert base_layer. into the expert remapping entries for the affected loaders.

  • Keep the non-LoRA path unchanged when base_layer is absent.

This preserves existing checkpoint-loading behavior for regular models while allowing LoRA-wrapped expert weights to resolve correctly.

Test Plan

End to end tests together with verl-project/verl#5599

Test Result

Weight sync successfully.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copilot AI review requested due to automatic review settings March 15, 2026 17:45
@mergify mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 15, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix for LoRA support in Qwen3.5 models, specifically addressing issues with GDN layers and enabling base_layer for experts. The changes involve routing layers with multiple output slices to a more flexible LoRA implementation and adjusting how dummy LoRA weights are created to handle mismatches between logical and physical layer structures. The approach is sound and effectively resolves the described issues. I have one suggestion to improve the maintainability of the new logic for creating dummy LoRA weights.

@mergify
Copy link

mergify bot commented Mar 15, 2026

Hi @HollowMan6, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes LoRA initialization for Qwen3.5 GDN merged projections where the underlying layer has 3+ physical output slices (e.g., q/k/v/z) but checkpoints expose fewer logical packed LoRA modules (e.g., in_proj_qkv, in_proj_z). It also updates multiple MoE model weight loaders to correctly handle expert weights when LoRA wraps experts behind a base_layer.

Changes:

  • Route MergedColumnParallelLinear layers with 3+ internal slices to the variable-slice LoRA implementation and expand grouped packed LoRA lists to per-slice tensors during set_lora.
  • Fix dummy LoRA creation for packed modules by deriving output dims from grouped output_sizes (so dummy adapters match the merged layer’s physical slice layout).
  • Update Step3/Step3.5/Qwen3(.5)/Qwen3-VL MoE weight loading mappings to support expert base_layer parameter names.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
vllm/model_executor/models/step3p5_mtp.py Adjust expert weight mapping to optionally include .base_layer. in parameter names.
vllm/model_executor/models/step3p5.py Same expert base_layer-aware mapping for Step3.5 main model weight loading.
vllm/model_executor/models/step3_text.py Same expert base_layer-aware mapping for Step3 text model weight loading.
vllm/model_executor/models/qwen3_vl_moe.py Make fused expert mapping support experts.base_layer.* parameter naming.
vllm/model_executor/models/qwen3_5_mtp.py Make fused expert mapping support experts.base_layer.* parameter naming.
vllm/model_executor/models/qwen3_5.py Make fused expert mapping support experts.base_layer.* parameter naming.
vllm/lora/model_manager.py Build dummy packed LoRA weights using grouped output dims derived from base_layer.output_sizes.
vllm/lora/layers/column_parallel_linear.py Prefer variable-slice LoRA for 3+ slice merged layers and expand grouped packed LoRA lists into per-slice tensors in set_lora.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jeejeelee
Copy link
Collaborator

jeejeelee commented Mar 16, 2026

Please see: #36976 and #37019

@jeejeelee
Copy link
Collaborator

We are working on fixing Qwen35+LoRA. In order not to waste your valuable time, please stop further development on this PR. Thank you.

(".moe.experts.w2_weight", ".moe.down_proj.weight", "w2"),
(f".moe.experts.{base_layer}w13_weight", ".moe.gate_proj.weight", "w1"),
(f".moe.experts.{base_layer}w13_weight", ".moe.up_proj.weight", "w3"),
(f".moe.experts.{base_layer}w2_weight", ".moe.down_proj.weight", "w2"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these changes are unrelated. If needed, please submit a separate PR. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jeejeelee for letting me know #36976 and #37019! I just removed all the Qwen3.5 LoRA fix in this PR and focus on the changes you mentioned here, those are extensions to #31104, please let me know what you think, thanks!

This PR extends vllm-project#31104 to the remaining
model-specific MoE loaders that still hardcode expert
parameter names without `.base_layer` during weight loading.

`vllm-project#31104` fixed the shared LoRA expert-loading path, but these loaders
still build their own expert remapping tables:

- `Qwen3.5`
- `Qwen3.5 MTP`
- `Qwen3-VL MoE`
- `Step3 Text`
- `Step3.5`
- `Step3.5 MTP`

- Detect whether the local parameter set contains `.base_layer.` expert parameters.
- Conditionally insert `base_layer.` into the expert remapping entries for the affected loaders.
- Keep the non-LoRA path unchanged when `base_layer` is absent.

This preserves existing checkpoint-loading behavior for regular models while allowing LoRA-wrapped expert weights to resolve correctly.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
@HollowMan6 HollowMan6 changed the title [Bugfix] LoRA support for Qwen3.5 (GDN layers) [Bugfix] LoRA: extend expert base_layer loading to Qwen3.5 and Step3.x Mar 16, 2026
@HollowMan6 HollowMan6 requested review from Copilot and jeejeelee March 16, 2026 13:03
@HollowMan6
Copy link
Contributor Author

@codex review

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the model-specific MoE weight-loading remapping logic to correctly resolve LoRA-wrapped expert weights that include an intermediate .base_layer. namespace (following the earlier shared-path fix in #31104), while preserving existing behavior for non-LoRA checkpoints.

Changes:

  • Detect presence of .base_layer. parameters at load time and conditionally prefix expert parameter names with base_layer. during remapping.
  • Apply the conditional remapping to the remaining loaders that hardcoded expert parameter names for Qwen3.5 / Qwen3-VL MoE and Step3.x variants.
  • Keep the non-LoRA loading path unchanged when .base_layer. is not present.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
vllm/model_executor/models/step3p5_mtp.py Adds conditional base_layer. prefix to Step3.5 MTP expert remapping entries.
vllm/model_executor/models/step3p5.py Adds conditional base_layer. prefix to Step3.5 expert remapping entries (packed 3D expert weights).
vllm/model_executor/models/step3_text.py Adds conditional base_layer. prefix to Step3 Text expert remapping entries.
vllm/model_executor/models/qwen3_vl_moe.py Adds conditional base_layer. prefix for Qwen3-VL MoE fused expert remapping entries.
vllm/model_executor/models/qwen3_5_mtp.py Adds conditional base_layer. prefix for Qwen3.5 MTP fused expert remapping entries.
vllm/model_executor/models/qwen3_5.py Adds conditional base_layer. prefix for Qwen3.5 fused expert remapping entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends LoRA expert weight loading to several Qwen and Step models, which is a necessary bugfix. The implementation correctly detects the presence of .base_layer. in parameter names and dynamically adjusts the expert weight mappings. The changes are consistent and well-targeted across all affected model files. The approach is clean and effectively resolves the issue for LoRA-wrapped expert weights.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants