Skip to content

[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping#36825

Closed
hallerite wants to merge 5 commits intovllm-project:mainfrom
hallerite:fix-qwen35-lora
Closed

[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping#36825
hallerite wants to merge 5 commits intovllm-project:mainfrom
hallerite:fix-qwen35-lora

Conversation

@hallerite
Copy link
Copy Markdown
Contributor

@hallerite hallerite commented Mar 11, 2026

Summary

Fixes IndexError: list index out of range when enabling LoRA with Qwen3.5 models (Qwen3_5ForCausalLMBase and Qwen3_5ForConditionalGeneration).

Root cause: Qwen3.5's create_qkvz_proj overrides the parent (Qwen3Next) to use 4 output_sizes [key_dim, key_dim, value_dim, value_dim] for correct per-slice TP sharding. However, packed_modules_mapping only lists 2 entries ["in_proj_qkv", "in_proj_z"]. During LoRA initialization, MergedColumnParallelLinearWithLoRA sets n_slices = len(output_sizes) (4) but only creates len(packed_modules) (2) adapters, so accessing lora_a[2]/lora_a[3] crashes.

Fix:

  1. Expand packed_modules_mapping for in_proj_qkvz from 2 to 4 entries: ["in_proj_q", "in_proj_k", "in_proj_v", "in_proj_z"] — matching the 4 output_sizes
  2. Generalize MergedColumnParallelLinearWithLoRA.can_replace_layer from len(packed_modules_list) == 2 to len(packed_modules_list) == len(source_layer.output_sizes) — so it works for any N-way merged column parallel linear, not just 2-way

This works for any TP size because each of the 4 packed modules maps to one output_size, preserving correct per-slice sharding.

Note: The parent class Qwen3Next doesn't have this issue because it uses output_sizes=[sum(key_dim, key_dim, value_dim, value_dim)] (1 entry) with packed_modules=["in_proj_qkvz"] (1 entry) — they match.

Note: This may not be the globally optimal solution. The 4 packed module names (in_proj_q, in_proj_k, in_proj_v, in_proj_z) are synthetic — the actual HF weight names are in_proj_qkv (fused Q+K+V) and in_proj_z. This means LoRA adapter weights targeting the GDN projections by their real HF names wouldn't be found during loading. In practice this isn't an issue today because nobody LoRAs the GDN layers — only standard attention and MLP layers are targeted. A more complete fix would be to support M packed modules mapping to N output sizes (2 weights → 4 sharding slices) in MergedColumnParallelLinearWithLoRA, but that's a larger refactor.

Related: #36372, #36478

Test plan

  • Verified LoRA training (TP=1) completes successfully with Qwen3.5-9B on 2x RTX PRO 6000 Blackwell GPUs using prime-rl
  • Test with TP=2 and TP=4

@mergify mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an IndexError that occurs when using LoRA with Qwen3.5 models. The fix is two-fold: first, it correctly aligns the packed_modules_mapping for in_proj_qkvz in Qwen3.5 models to have four entries, which now matches the layer's four output_sizes. Second, it generalizes the layer replacement logic in MergedColumnParallelLinearWithLoRA to be more robust by dynamically checking against the number of output sizes instead of a hardcoded value. These changes are well-reasoned, directly fix the bug, and improve the code's maintainability.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 11, 2026

Hi @hallerite, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Copy Markdown
Contributor

@alvinttang alvinttang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct two-part fix: the packed_modules_mapping was misrepresenting in_proj_qkvz as 2 sub-modules when it actually has 4 (matching the 4 output_sizes in create_qkvz_proj), and can_replace_layer was hardcoded to len == 2 instead of dynamically checking against the layer's actual output_sizes. The dynamic check in can_replace_layer is the more important improvement since it makes the validation self-consistent and prevents future regressions if the packing changes again. One thing worth double-checking: are there any serialized/saved LoRA adapters in the wild that used the old 2-module mapping that would now silently fail to load against this updated definition? Overall this is a well-reasoned fix and both changes are necessary together.

@devlup
Copy link
Copy Markdown

devlup commented Mar 12, 2026

this has unblocked the seq len error but stopped working for me when i load lora with 4bit quant bitsandbytes

@hallerite
Copy link
Copy Markdown
Contributor Author

hallerite commented Mar 12, 2026

this has unblocked the seq len error but stopped working for me when i load lora with 4bit quant bitsandbytes

could you give me a stack trace & ideally a way to reproduce this?

@devlup
Copy link
Copy Markdown

devlup commented Mar 13, 2026

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/lora_with_quantization_inference.py, you can run this example with base model as qwen 3.5, 9b, any weight size

hallerite and others added 3 commits March 13, 2026 21:17
Signed-off-by: hallerite <git@hallerite.com>
Signed-off-by: hallerite <git@hallerite.com>
Revert packed_modules_mapping to real HF weight names (in_proj_qkv,
in_proj_z) to fix bitsandbytes quant state stacking, and extend
MergedColumnParallelLinearVariableSliceWithLoRA to handle the mismatch
between packed module count (2) and output_sizes count (4) in GDN layers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: hallerite <git@hallerite.com>
@hallerite
Copy link
Copy Markdown
Contributor Author

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/lora_with_quantization_inference.py, you can run this example with base model as qwen 3.5, 9b, any weight size

Pushed a potential fix and at least worked locally for me.The issue was that the packed_modules_mapping change from real HF weight names (in_proj_qkv, in_proj_z) to synthetic names (in_proj_q, in_proj_k, in_proj_v, in_proj_z) broke the bitsandbytes loader.

Feel free to try again and tell me if it works now.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 13, 2026

Hi @hallerite, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: hallerite <git@hallerite.com>
Assert that consumed dimensions exactly match lora_b's shape after
greedily matching output_sizes. Prevents silent data corruption if
dimensions don't align.

Signed-off-by: hallerite <git@hallerite.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hallerite
Copy link
Copy Markdown
Contributor Author

Closing in favor of #37019 which is a superset of this PR — it includes all the changes here (VariableSlice can_replace_layer generalization, greedy expansion in set_lora) plus generalizes slice_lora_a in the sharded LoRA path for any N subloras.

@hallerite hallerite closed this Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants