[Bugfix][LoRA] Fix Qwen35 LoRA by jeejeelee · Pull Request #36976 · vllm-project/vllm

jeejeelee · 2026-03-13T11:19:28Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request aims to fix LoRA support for Qwen3.5 models. The main change is to split the fused in_proj_qkvz layer into separate in_proj_qkv and in_proj_z layers when LoRA is enabled. This required modifications to layer initialization, the forward pass, weight loading, and the packed_modules_mapping for LoRA. While the overall approach is sound, I've identified a critical issue in Qwen3_5ForConditionalGeneration where the packed_modules_mapping is not correctly initialized, which would likely prevent LoRA from working correctly for that model. I have provided a specific code suggestion to address this.

vllm/model_executor/models/qwen3_5.py

musab-mk

I tested this PR and I can confirm that this fixes the IndexError in _capture_cudagraphs for LoRA adapter on Qwen/Qwen3.5-397B-A17B-FP8 as a base.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

dcmaddix · 2026-03-17T17:36:19Z

@jeejeelee I think we can proceed with merging in this fix thanks!

nole70 · 2026-03-17T19:53:14Z

@dcmaddix I think we are waiting for @sighingnow to approve

mergify · 2026-03-18T10:53:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify · 2026-03-20T00:10:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

hjh0119 · 2026-03-20T03:38:32Z

Will this PR be included in v0.18? Installing vLLM from source is somewhat cumbersome

DarkLight1337 · 2026-03-20T03:42:21Z

No, we have cut the branch for v0.18 a few days ago already

hjh0119 · 2026-03-20T03:44:40Z

Got it. Thanks anyway

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

nikhilesh-csa · 2026-03-23T15:58:50Z

I tried building from source with git clone https://github.com/vllm-project/vllm.git cd vllm pip install -e . # This may take 5-10 minutes.

but the build does not complete successfully:

`ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279629; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279757; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279761; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279765; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279769; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279773; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279777; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279781; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279785; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas fatal : Ptx assembly aborted due to errors
1499.2 [7/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_fe2m1f_float16.cu.o
1499.2 [8/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_fe4m3fn_float16.cu.o
1499.2 [9/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_s8_u4_float16.cu.o
1499.2 [10/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_s8_u4b8_float16.cu.o
1499.2 [11/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_u4_float16.cu.o
1499.2 [12/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_u4b8_float16.cu.o
1499.2 [13/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_fe4m3fn_bfloat16.cu.o
1499.2 [14/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu.o
1499.2 [15/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_quant_entry.cu.o
1499.2 [16/175] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_view.cu.o
1499.2 [17/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/int8/per_token_group_quant.cu.o
1499.2 [18/175] Building CUDA object CMakeFiles/_C.dir/csrc/pos_encoding_kernels.cu.o
1499.2 [19/175] Building CUDA object CMakeFiles/_C.dir/csrc/attention/vertical_slash_index.cu.o
1499.2 [20/175] Building CUDA object CMakeFiles/_C.dir/csrc/attention/merge_attn_states.cu.o
1499.2 [21/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/awq/gemm_kernels.cu.o
1499.2 [22/175] Building CUDA object CMakeFiles/_C.dir/csrc/sparse/cutlass/sparse_scaled_mm_entry.cu.o
1499.2 [23/175] Building CUDA object CMakeFiles/_C.dir/csrc/topk.cu.o
1499.2 [24/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_scaled_mm_entry.cu.o
1499.2 [25/175] Building CUDA object CMakeFiles/_C.dir/csrc/custom_all_reduce.cu.o
1499.2 [26/175] Building CUDA object CMakeFiles/_C.dir/csrc/activation_kernels.cu.o
1499.2 [27/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_u8b128_float16.cu.o
1499.2 [28/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_fe2m1f_bfloat16.cu.o
1499.2 [29/175] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels_fused.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~~~~~~~
1499.2 [30/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_s8_u4_bfloat16.cu.o
1499.2 [31/175] Building CUDA object CMakeFiles/_C.dir/csrc/fused_qknorm_rope_kernel.cu.o
1499.2 [32/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/activation_kernels.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~
1499.2 [33/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_s8_u4b8_bfloat16.cu.o
1499.2 [34/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/int8/scaled_quant.cu.o
1499.2 [35/175] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_quant_kernels.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~
1499.2 [36/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/fp8/common.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~
1499.2 [37/175] Building CUDA object CMakeFiles/_C.dir/csrc/sampler.cu.o
1499.2 [38/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/fp8/per_token_group_quant.cu.o
1499.2 [39/175] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_kernels.cu.o
1499.2 [40/175] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
1499.2 [41/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o
1499.2 [42/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fused_kernels/fused_layernorm_dynamic_per_token_quant.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~~~~~~~~~
1499.2 [43/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_u4_bfloat16.cu.o
1499.2 [44/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gguf/gguf_kernel.cu.o
1499.2 [45/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_u4b8_bfloat16.cu.o
1499.2 [46/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_u8b128_bfloat16.cu.o
1499.2 [47/175] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/mamba_ssm/selective_scan_fwd.cu.o
1499.2 [48/175] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v1.cu.o
1499.2 [49/175] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v2.cu.o
1499.2 ninja: build stopped: subcommand failed.
1499.2 Traceback (most recent call last):
1499.2 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in
1499.2 main()
1499.2 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
1499.2 json_out["return_val"] = hook(hook_input["kwargs"])
1499.2 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 303, in build_editable
1499.2 return hook(wheel_directory, config_settings, metadata_directory)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 472, in build_editable
1499.2 return self._build_with_temp_dir(
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 408, in _build_with_temp_dir
1499.2 self.run_setup()
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 317, in run_setup
1499.2 exec(code, locals())
1499.2 File "", line 976, in
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/init.py", line 117, in setup
1499.2 return distutils.core.setup(attrs) # type: ignore[return-value]
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 186, in setup
1499.2 return run_commands(dist)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 202, in run_commands
1499.2 dist.run_commands()
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
1499.2 self.run_command(cmd)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1107, in run_command
1499.2 super().run_command(command)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1021, in run_command
1499.2 cmd_obj.run()
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 140, in run
1499.2 self._create_wheel_file(bdist_wheel)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 350, in _create_wheel_file
1499.2 files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 273, in _run_build_commands
1499.2 self._run_build_subcommands()
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 300, in _run_build_subcommands
1499.2 self.run_command(name)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 357, in run_command
1499.2 self.distribution.run_command(command)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1107, in run_command
1499.2 super().run_command(command)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1021, in run_command
1499.2 cmd_obj.run()
1499.2 File "", line 286, in run
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 97, in run
1499.2 _build_ext.run(self)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 368, in run
1499.2 self.build_extensions()
1499.2 File "", line 255, in build_extensions
1499.2 File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
1499.2 raise CalledProcessError(retcode, cmd)
1499.2 subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=48', '--target=_moe_C', '--target=cumem_allocator', '--target=triton_kernels', '--target=_vllm_fa2_C', '--target=_vllm_fa4_cutedsl_C', '--target=_C', '--target=_C_stable_libtorch']' returned non-zero exit status 255.
1499.2 [end of output]
1499.2
1499.2 note: This error originates from a subprocess, and is likely not a problem with pip.
1499.2 ERROR: Failed building editable for vllm
1499.2 Failed to build vllm
1499.3 error: failed-wheel-build-for-install
1499.3
1499.3 × Failed to build installable wheels for some pyproject.toml based projects
1499.3 ╰─> vllm

Dockerfile-outputter:41`

any suggestions? @DarkLight1337 @jeejeelee ?

jeejeelee added 2 commits March 13, 2026 11:17

Init

449ed9e

Merge branch 'vllm-project:main' into fix-qwen35-lora

80ae3b0

jeejeelee requested a review from sighingnow as a code owner March 13, 2026 11:19

jeejeelee marked this pull request as draft March 13, 2026 11:19

mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 13, 2026

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

vllm/model_executor/models/qwen3_5.py Show resolved Hide resolved

jeejeelee mentioned this pull request Mar 13, 2026

[BugFix] Fix Qwen3.5 LoRA IndexError in GDN fused projections #36309

Closed

musab-mk approved these changes Mar 14, 2026

View reviewed changes

jeejeelee added 3 commits March 14, 2026 17:29

Move forward

f79deed

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Fix format

736b71a

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into fix-qwen35-lora

f0d725a

hallerite mentioned this pull request Mar 15, 2026

fix: generalize LoRA layer handling for N-way fused projections #37019

Open

8 tasks

jeejeelee mentioned this pull request Mar 16, 2026

[Bugfix] LoRA: extend expert base_layer loading to Qwen3.5 and Step3.x #37114

Open

5 tasks

This was referenced Mar 16, 2026

[megatron] fix: Qwen3.5 LoRA & MTP support (with Megatron-Bridge) verl-project/verl#5599

Open

LoRA bridge & merge for Qwen3.5 NVIDIA-NeMo/Megatron-Bridge#2736

Merged

Merge branch 'main' into fix-qwen35-lora

840cf61

mergify bot added the needs-rebase label Mar 18, 2026

Nero10578 mentioned this pull request Mar 19, 2026

Very small LoRA adapter size training Qwen3.5 MoE models axolotl-ai-cloud/axolotl#3514

Closed

8 tasks

Add dense model testing

5941f90

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify bot added the ci/build label Mar 19, 2026

Address conflict

035f1a8

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee marked this pull request as ready for review March 19, 2026 16:46

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2026

mergify bot removed the needs-rebase label Mar 19, 2026

jeejeelee added 2 commits March 19, 2026 17:45

Fix

49c7735

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Fix

1a1c491

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee force-pushed the fix-qwen35-lora branch from 0dcdf60 to 1a1c491 Compare March 19, 2026 18:04

Merge branch 'main' into fix-qwen35-lora

3447cde

mergify bot added the needs-rebase label Mar 20, 2026

Fix

e198400

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify bot removed the needs-rebase label Mar 20, 2026

jeejeelee requested review from DarkLight1337 and Isotr0py March 20, 2026 03:08

DarkLight1337 approved these changes Mar 20, 2026

View reviewed changes

DarkLight1337 merged commit 8fbe3f3 into vllm-project:main Mar 20, 2026
60 checks passed

jeejeelee deleted the fix-qwen35-lora branch March 20, 2026 03:25

chooper26 pushed a commit to intellistream/vllm-hust that referenced this pull request Mar 21, 2026

[Bugfix][LoRA] Fix Qwen35 LoRA (vllm-project#36976)

4502e1f

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee mentioned this pull request Mar 21, 2026

[Bug]: Qwen3.5-MoE failed with enable_lora #35286

Closed

1 task

Isotr0py mentioned this pull request Mar 23, 2026

[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled #37912

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][LoRA] Fix Qwen35 LoRA#36976

[Bugfix][LoRA] Fix Qwen35 LoRA#36976
DarkLight1337 merged 12 commits intovllm-project:mainfrom
jeejeelee:fix-qwen35-lora

jeejeelee commented Mar 13, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

musab-mk left a comment

Uh oh!

dcmaddix commented Mar 17, 2026

Uh oh!

nole70 commented Mar 17, 2026

Uh oh!

mergify bot commented Mar 18, 2026

Uh oh!

mergify bot commented Mar 20, 2026

Uh oh!

Uh oh!

hjh0119 commented Mar 20, 2026

Uh oh!

DarkLight1337 commented Mar 20, 2026

Uh oh!

hjh0119 commented Mar 20, 2026

Uh oh!

nikhilesh-csa commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

jeejeelee commented Mar 13, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

musab-mk left a comment

Choose a reason for hiding this comment

Uh oh!

dcmaddix commented Mar 17, 2026

Uh oh!

nole70 commented Mar 17, 2026

Uh oh!

mergify bot commented Mar 18, 2026

Uh oh!

mergify bot commented Mar 20, 2026

Uh oh!

Uh oh!

hjh0119 commented Mar 20, 2026

Uh oh!

DarkLight1337 commented Mar 20, 2026

Uh oh!

hjh0119 commented Mar 20, 2026

Uh oh!

nikhilesh-csa commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jeejeelee commented Mar 13, 2026 •

edited by github-actions bot

Loading