Skip to content

[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled#37912

Open
Isotr0py wants to merge 5 commits intovllm-project:mainfrom
Isotr0py:fuse-qwen3_5-lora
Open

[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled#37912
Isotr0py wants to merge 5 commits intovllm-project:mainfrom
Isotr0py:fuse-qwen3_5-lora

Conversation

@Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Mar 23, 2026

Purpose

Test Plan

pytest -s -v tests/lora/test_qwen35_densemodel_lora.py

Test Result

Tests pass at both TP=2 and TP=4


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Isotr0py and others added 5 commits March 23, 2026 02:11
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@mergify mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 23, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors LoRA handling for Qwen3.5 and Qwen3-Next models. Key changes include introducing an expand_packed_lora method to flexibly handle LoRA adapter groups that don't match the number of slices, and unifying the input projection logic in Qwen3.5 attention by removing LoRA-specific conditional paths. The create_dummy_lora function in model_manager.py contains a 'HACK' comment, which should either be replaced with a detailed explanation of the necessary logic or improved with a more robust solution.

Comment on lines +559 to +562
# HACK: overrides replacements for qkvz = qkv + z case.
# Any better methods to handle this case?
if n_slices != len(replacements):
replacements = [f"slice_{i}" for i in range(n_slices)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of a 'HACK' comment here is concerning as it suggests the solution is not robust and could lead to future maintenance issues. Code with 'HACK' comments is often difficult to understand and easy to break.

If this logic is indeed the correct and necessary approach for handling dummy LoRA creation for packed modules like in_proj_qkvz, please replace the 'HACK' comment with a more detailed explanation. The explanation should clarify:

  1. Why there's a mismatch between n_slices and len(replacements).
  2. Why generating generic slice_i names is the appropriate solution for creating dummy LoRAs in this scenario.
  3. How this interacts with the loading of real LoRA weights.

A clear explanation will improve code maintainability and prevent future confusion.

Alternatively, if a more robust, less 'hacky' solution is possible (perhaps by making the relationship between packed modules and slices more explicit in the model configuration), that would be preferable.

@mergify
Copy link

mergify bot commented Mar 26, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Isotr0py.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant