Skip to content

[Bugfix][LoRA] Fix Qwen35 LoRA#36976

Merged
DarkLight1337 merged 12 commits intovllm-project:mainfrom
jeejeelee:fix-qwen35-lora
Mar 20, 2026
Merged

[Bugfix][LoRA] Fix Qwen35 LoRA#36976
DarkLight1337 merged 12 commits intovllm-project:mainfrom
jeejeelee:fix-qwen35-lora

Conversation

@jeejeelee
Copy link
Collaborator

@jeejeelee jeejeelee commented Mar 13, 2026

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@jeejeelee jeejeelee requested a review from sighingnow as a code owner March 13, 2026 11:19
@jeejeelee jeejeelee marked this pull request as draft March 13, 2026 11:19
@mergify mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 13, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix LoRA support for Qwen3.5 models. The main change is to split the fused in_proj_qkvz layer into separate in_proj_qkv and in_proj_z layers when LoRA is enabled. This required modifications to layer initialization, the forward pass, weight loading, and the packed_modules_mapping for LoRA. While the overall approach is sound, I've identified a critical issue in Qwen3_5ForConditionalGeneration where the packed_modules_mapping is not correctly initialized, which would likely prevent LoRA from working correctly for that model. I have provided a specific code suggestion to address this.

Copy link

@musab-mk musab-mk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR and I can confirm that this fixes the IndexError in _capture_cudagraphs for LoRA adapter on Qwen/Qwen3.5-397B-A17B-FP8 as a base.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@dcmaddix
Copy link
Contributor

@jeejeelee I think we can proceed with merging in this fix thanks!

@nole70
Copy link

nole70 commented Mar 17, 2026

@dcmaddix I think we are waiting for @sighingnow to approve

@mergify
Copy link

mergify bot commented Mar 18, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@mergify mergify bot added the ci/build label Mar 19, 2026
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@jeejeelee jeejeelee marked this pull request as ready for review March 19, 2026 16:46
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2026
@mergify mergify bot removed the needs-rebase label Mar 19, 2026
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@mergify
Copy link

mergify bot commented Mar 20, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 20, 2026
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@DarkLight1337 DarkLight1337 merged commit 8fbe3f3 into vllm-project:main Mar 20, 2026
60 checks passed
@jeejeelee jeejeelee deleted the fix-qwen35-lora branch March 20, 2026 03:25
@hjh0119
Copy link
Contributor

hjh0119 commented Mar 20, 2026

Will this PR be included in v0.18? Installing vLLM from source is somewhat cumbersome

@DarkLight1337
Copy link
Member

No, we have cut the branch for v0.18 a few days ago already

@hjh0119
Copy link
Contributor

hjh0119 commented Mar 20, 2026

Got it. Thanks anyway

chooper26 pushed a commit to intellistream/vllm-hust that referenced this pull request Mar 21, 2026
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@nikhilesh-csa
Copy link

I tried building from source with git clone https://github.com/vllm-project/vllm.git cd vllm pip install -e . # This may take 5-10 minutes.

but the build does not complete successfully:

`ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279629; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279757; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279761; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279765; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279769; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279773; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279777; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279781; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas /tmp/tmpxft_00000521_00000000-6_sm89_kernel_fe4m3fn_u4_bfloat16.ptx, line 279785; error : Unexpected instruction types specified for 'mma'
1499.2 ptxas fatal : Ptx assembly aborted due to errors
1499.2 [7/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_fe2m1f_float16.cu.o
1499.2 [8/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_fe4m3fn_float16.cu.o
1499.2 [9/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_s8_u4_float16.cu.o
1499.2 [10/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_s8_u4b8_float16.cu.o
1499.2 [11/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_u4_float16.cu.o
1499.2 [12/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_u4b8_float16.cu.o
1499.2 [13/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_fe4m3fn_bfloat16.cu.o
1499.2 [14/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu.o
1499.2 [15/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_quant_entry.cu.o
1499.2 [16/175] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_view.cu.o
1499.2 [17/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/int8/per_token_group_quant.cu.o
1499.2 [18/175] Building CUDA object CMakeFiles/_C.dir/csrc/pos_encoding_kernels.cu.o
1499.2 [19/175] Building CUDA object CMakeFiles/_C.dir/csrc/attention/vertical_slash_index.cu.o
1499.2 [20/175] Building CUDA object CMakeFiles/_C.dir/csrc/attention/merge_attn_states.cu.o
1499.2 [21/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/awq/gemm_kernels.cu.o
1499.2 [22/175] Building CUDA object CMakeFiles/_C.dir/csrc/sparse/cutlass/sparse_scaled_mm_entry.cu.o
1499.2 [23/175] Building CUDA object CMakeFiles/_C.dir/csrc/topk.cu.o
1499.2 [24/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_scaled_mm_entry.cu.o
1499.2 [25/175] Building CUDA object CMakeFiles/_C.dir/csrc/custom_all_reduce.cu.o
1499.2 [26/175] Building CUDA object CMakeFiles/_C.dir/csrc/activation_kernels.cu.o
1499.2 [27/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_float16_u8b128_float16.cu.o
1499.2 [28/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_fe2m1f_bfloat16.cu.o
1499.2 [29/175] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels_fused.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~~~~~~~~~~~~~~~
1499.2 [30/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_s8_u4_bfloat16.cu.o
1499.2 [31/175] Building CUDA object CMakeFiles/_C.dir/csrc/fused_qknorm_rope_kernel.cu.o
1499.2 [32/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/activation_kernels.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~~~~~~~~~~~~~~~
1499.2 [33/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_s8_u4b8_bfloat16.cu.o
1499.2 [34/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/int8/scaled_quant.cu.o
1499.2 [35/175] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_quant_kernels.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~~~~~~~~~~~~~~~
1499.2 [36/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/fp8/common.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~~~~~~~~~~~~~~~
1499.2 [37/175] Building CUDA object CMakeFiles/_C.dir/csrc/sampler.cu.o
1499.2 [38/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/fp8/per_token_group_quant.cu.o
1499.2 [39/175] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_kernels.cu.o
1499.2 [40/175] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
1499.2 [41/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o
1499.2 [42/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fused_kernels/fused_layernorm_dynamic_per_token_quant.cu.o
1499.2 /vllm/csrc/quantization/utils.cuh:41:1: warning: ‘host’ attribute directive ignored [-Wattributes]
1499.2 41 | MAYBE_HOST_DEVICE static constexpr T quant_type_max_v =
1499.2 | ^~~~~~~~~~~~~~~~
1499.2 [43/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_u4_bfloat16.cu.o
1499.2 [44/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gguf/gguf_kernel.cu.o
1499.2 [45/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_u4b8_bfloat16.cu.o
1499.2 [46/175] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sm80_kernel_bfloat16_u8b128_bfloat16.cu.o
1499.2 [47/175] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/mamba_ssm/selective_scan_fwd.cu.o
1499.2 [48/175] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v1.cu.o
1499.2 [49/175] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v2.cu.o
1499.2 ninja: build stopped: subcommand failed.
1499.2 Traceback (most recent call last):
1499.2 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in
1499.2 main()
1499.2 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
1499.2 json_out["return_val"] = hook(**hook_input["kwargs"])
1499.2 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 303, in build_editable
1499.2 return hook(wheel_directory, config_settings, metadata_directory)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 472, in build_editable
1499.2 return self._build_with_temp_dir(
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 408, in _build_with_temp_dir
1499.2 self.run_setup()
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 317, in run_setup
1499.2 exec(code, locals())
1499.2 File "", line 976, in
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/init.py", line 117, in setup
1499.2 return distutils.core.setup(**attrs) # type: ignore[return-value]
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 186, in setup
1499.2 return run_commands(dist)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 202, in run_commands
1499.2 dist.run_commands()
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
1499.2 self.run_command(cmd)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1107, in run_command
1499.2 super().run_command(command)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1021, in run_command
1499.2 cmd_obj.run()
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 140, in run
1499.2 self._create_wheel_file(bdist_wheel)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 350, in _create_wheel_file
1499.2 files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 273, in _run_build_commands
1499.2 self._run_build_subcommands()
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 300, in _run_build_subcommands
1499.2 self.run_command(name)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 357, in run_command
1499.2 self.distribution.run_command(command)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1107, in run_command
1499.2 super().run_command(command)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1021, in run_command
1499.2 cmd_obj.run()
1499.2 File "", line 286, in run
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 97, in run
1499.2 _build_ext.run(self)
1499.2 File "/tmp/pip-build-env-e4e96b9x/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 368, in run
1499.2 self.build_extensions()
1499.2 File "", line 255, in build_extensions
1499.2 File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
1499.2 raise CalledProcessError(retcode, cmd)
1499.2 subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=48', '--target=_moe_C', '--target=cumem_allocator', '--target=triton_kernels', '--target=_vllm_fa2_C', '--target=_vllm_fa4_cutedsl_C', '--target=_C', '--target=_C_stable_libtorch']' returned non-zero exit status 255.
1499.2 [end of output]
1499.2
1499.2 note: This error originates from a subprocess, and is likely not a problem with pip.
1499.2 ERROR: Failed building editable for vllm
1499.2 Failed to build vllm
1499.3 error: failed-wheel-build-for-install
1499.3
1499.3 × Failed to build installable wheels for some pyproject.toml based projects
1499.3 ╰─> vllm

Dockerfile-outputter:41`

any suggestions? @DarkLight1337 @jeejeelee ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/build qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants