[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping by hallerite · Pull Request #36825 · vllm-project/vllm

hallerite · 2026-03-11T22:00:24Z

Summary

Fixes IndexError: list index out of range when enabling LoRA with Qwen3.5 models (Qwen3_5ForCausalLMBase and Qwen3_5ForConditionalGeneration).

Root cause: Qwen3.5's create_qkvz_proj overrides the parent (Qwen3Next) to use 4 output_sizes [key_dim, key_dim, value_dim, value_dim] for correct per-slice TP sharding. However, packed_modules_mapping only lists 2 entries ["in_proj_qkv", "in_proj_z"]. During LoRA initialization, MergedColumnParallelLinearWithLoRA sets n_slices = len(output_sizes) (4) but only creates len(packed_modules) (2) adapters, so accessing lora_a[2]/lora_a[3] crashes.

Fix:

Expand packed_modules_mapping for in_proj_qkvz from 2 to 4 entries: ["in_proj_q", "in_proj_k", "in_proj_v", "in_proj_z"] — matching the 4 output_sizes
Generalize MergedColumnParallelLinearWithLoRA.can_replace_layer from len(packed_modules_list) == 2 to len(packed_modules_list) == len(source_layer.output_sizes) — so it works for any N-way merged column parallel linear, not just 2-way

This works for any TP size because each of the 4 packed modules maps to one output_size, preserving correct per-slice sharding.

Note: The parent class Qwen3Next doesn't have this issue because it uses output_sizes=[sum(key_dim, key_dim, value_dim, value_dim)] (1 entry) with packed_modules=["in_proj_qkvz"] (1 entry) — they match.

Note: This may not be the globally optimal solution. The 4 packed module names (in_proj_q, in_proj_k, in_proj_v, in_proj_z) are synthetic — the actual HF weight names are in_proj_qkv (fused Q+K+V) and in_proj_z. This means LoRA adapter weights targeting the GDN projections by their real HF names wouldn't be found during loading. In practice this isn't an issue today because nobody LoRAs the GDN layers — only standard attention and MLP layers are targeted. A more complete fix would be to support M packed modules mapping to N output sizes (2 weights → 4 sharding slices) in MergedColumnParallelLinearWithLoRA, but that's a larger refactor.

Related: #36372, #36478

Test plan

Verified LoRA training (TP=1) completes successfully with Qwen3.5-9B on 2x RTX PRO 6000 Blackwell GPUs using prime-rl
Test with TP=2 and TP=4

gemini-code-assist

Code Review

This pull request addresses an IndexError that occurs when using LoRA with Qwen3.5 models. The fix is two-fold: first, it correctly aligns the packed_modules_mapping for in_proj_qkvz in Qwen3.5 models to have four entries, which now matches the layer's four output_sizes. Second, it generalizes the layer replacement logic in MergedColumnParallelLinearWithLoRA to be more robust by dynamically checking against the number of output sizes instead of a hardcoded value. These changes are well-reasoned, directly fix the bug, and improve the code's maintainability.

mergify · 2026-03-11T22:11:08Z

Hi @hallerite, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

alvinttang

This is a correct two-part fix: the packed_modules_mapping was misrepresenting in_proj_qkvz as 2 sub-modules when it actually has 4 (matching the 4 output_sizes in create_qkvz_proj), and can_replace_layer was hardcoded to len == 2 instead of dynamically checking against the layer's actual output_sizes. The dynamic check in can_replace_layer is the more important improvement since it makes the validation self-consistent and prevents future regressions if the packing changes again. One thing worth double-checking: are there any serialized/saved LoRA adapters in the wild that used the old 2-module mapping that would now silently fail to load against this updated definition? Overall this is a well-reasoned fix and both changes are necessary together.

devlup · 2026-03-12T04:26:43Z

this has unblocked the seq len error but stopped working for me when i load lora with 4bit quant bitsandbytes

hallerite · 2026-03-12T18:33:28Z

this has unblocked the seq len error but stopped working for me when i load lora with 4bit quant bitsandbytes

could you give me a stack trace & ideally a way to reproduce this?

devlup · 2026-03-13T17:26:48Z

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/lora_with_quantization_inference.py, you can run this example with base model as qwen 3.5, 9b, any weight size

Signed-off-by: hallerite <git@hallerite.com>

Revert packed_modules_mapping to real HF weight names (in_proj_qkv, in_proj_z) to fix bitsandbytes quant state stacking, and extend MergedColumnParallelLinearVariableSliceWithLoRA to handle the mismatch between packed module count (2) and output_sizes count (4) in GDN layers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: hallerite <git@hallerite.com>

hallerite · 2026-03-13T21:20:33Z

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/lora_with_quantization_inference.py, you can run this example with base model as qwen 3.5, 9b, any weight size

Pushed a potential fix and at least worked locally for me.The issue was that the packed_modules_mapping change from real HF weight names (in_proj_qkv, in_proj_z) to synthetic names (in_proj_q, in_proj_k, in_proj_v, in_proj_z) broke the bitsandbytes loader.

Feel free to try again and tell me if it works now.

mergify · 2026-03-13T21:22:47Z

Hi @hallerite, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: hallerite <git@hallerite.com>

Assert that consumed dimensions exactly match lora_b's shape after greedily matching output_sizes. Prevents silent data corruption if dimensions don't align. Signed-off-by: hallerite <git@hallerite.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hallerite · 2026-03-15T22:23:00Z

Closing in favor of #37019 which is a superset of this PR — it includes all the changes here (VariableSlice can_replace_layer generalization, greedy expansion in set_lora) plus generalizes slice_lora_a in the sharded LoRA path for any N subloras.

hallerite requested review from jeejeelee and sighingnow as code owners March 11, 2026 22:00

mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 11, 2026

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

hallerite force-pushed the fix-qwen35-lora branch from d179859 to e38dbbd Compare March 11, 2026 22:03

alvinttang reviewed Mar 12, 2026

View reviewed changes

hallerite force-pushed the fix-qwen35-lora branch from 27fb13a to 67a242a Compare March 12, 2026 03:51

hallerite and others added 3 commits March 13, 2026 21:17

fix qwen3.5 lora slicing

cbe1227

Signed-off-by: hallerite <git@hallerite.com>

fix pre-commit

b9284d9

Signed-off-by: hallerite <git@hallerite.com>

hallerite force-pushed the fix-qwen35-lora branch from 40cf2b0 to eb7439f Compare March 13, 2026 21:18

run precommit

6005bc5

Signed-off-by: hallerite <git@hallerite.com>

hallerite mentioned this pull request Mar 13, 2026

fix: generalize LoRA layer handling for N-way fused projections #37019

Open

8 tasks

hallerite closed this Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping#36825

[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping#36825
hallerite wants to merge 5 commits intovllm-project:mainfrom
hallerite:fix-qwen35-lora

hallerite commented Mar 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

alvinttang left a comment

Uh oh!

devlup commented Mar 12, 2026

Uh oh!

hallerite commented Mar 12, 2026 •

edited

Loading

Uh oh!

devlup commented Mar 13, 2026

Uh oh!

hallerite commented Mar 13, 2026

Uh oh!

mergify bot commented Mar 13, 2026

Uh oh!

hallerite commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hallerite commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

alvinttang left a comment

Choose a reason for hiding this comment

Uh oh!

devlup commented Mar 12, 2026

Uh oh!

hallerite commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devlup commented Mar 13, 2026

Uh oh!

hallerite commented Mar 13, 2026

Uh oh!

mergify bot commented Mar 13, 2026

Uh oh!

hallerite commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hallerite commented Mar 11, 2026 •

edited

Loading

hallerite commented Mar 12, 2026 •

edited

Loading