[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled by Isotr0py · Pull Request #37912 · vllm-project/vllm

Isotr0py · 2026-03-23T18:12:33Z

Purpose

There're 2 forwarding code path for Qwen3.5 after [Bugfix][LoRA] Fix Qwen35 LoRA #36976.
This PR unifies them by adapting the LoRA layer implementation.

Test Plan

pytest -s -v tests/lora/test_qwen35_densemodel_lora.py

Test Result

Tests pass at both TP=2 and TP=4

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Signed-off-by: Isotr0py <Isotr0py@outlook.com>

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

gemini-code-assist

Code Review

This pull request refactors LoRA handling for Qwen3.5 and Qwen3-Next models. Key changes include introducing an expand_packed_lora method to flexibly handle LoRA adapter groups that don't match the number of slices, and unifying the input projection logic in Qwen3.5 attention by removing LoRA-specific conditional paths. The create_dummy_lora function in model_manager.py contains a 'HACK' comment, which should either be replaced with a detailed explanation of the necessary logic or improved with a more robust solution.

gemini-code-assist · 2026-03-23T18:17:59Z

vllm/lora/model_manager.py

+                # HACK: overrides replacements for qkvz = qkv + z case.
+                # Any better methods to handle this case?
+                if n_slices != len(replacements):
+                    replacements = [f"slice_{i}" for i in range(n_slices)]


The use of a 'HACK' comment here is concerning as it suggests the solution is not robust and could lead to future maintenance issues. Code with 'HACK' comments is often difficult to understand and easy to break.

If this logic is indeed the correct and necessary approach for handling dummy LoRA creation for packed modules like in_proj_qkvz, please replace the 'HACK' comment with a more detailed explanation. The explanation should clarify:

Why there's a mismatch between n_slices and len(replacements).

Why generating generic slice_i names is the appropriate solution for creating dummy LoRAs in this scenario.

How this interacts with the loading of real LoRA weights.

A clear explanation will improve code maintainability and prevent future confusion.

Alternatively, if a more robust, less 'hacky' solution is possible (perhaps by making the relationship between packed modules and slices more explicit in the model configuration), that would be preferable.

mergify · 2026-03-26T04:02:07Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Isotr0py.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Isotr0py and others added 5 commits March 23, 2026 02:11

draft

689d563

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Merge remote-tracking branch 'upstream/main' into fuse-qwen3_5-lora

074f2bf

clean

26d5566

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

clean

9c68c62

Signed-off-by: Isotr0py <Isotr0py@outlook.com>

clean

42f1d1b

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py requested review from jeejeelee and sighingnow as code owners March 23, 2026 18:12

mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 23, 2026

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

mergify bot added the needs-rebase label Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled#37912

[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled#37912
Isotr0py wants to merge 5 commits intovllm-project:mainfrom
Isotr0py:fuse-qwen3_5-lora

Isotr0py commented Mar 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 23, 2026

Uh oh!

mergify bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Isotr0py commented Mar 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Isotr0py commented Mar 23, 2026 •

edited by github-actions bot

Loading