Skip to content

[BugFix]Fix precision issue for LoRA feature#4141

Merged
paulyu12 merged 5 commits intovllm-project:mainfrom
hukongyi:lora_fix
Dec 19, 2025
Merged

[BugFix]Fix precision issue for LoRA feature#4141
paulyu12 merged 5 commits intovllm-project:mainfrom
hukongyi:lora_fix

Conversation

@hukongyi
Copy link
Copy Markdown
Contributor

@hukongyi hukongyi commented Nov 12, 2025

vLLM version: v0.11.0
vLLM main: vllm-project/vllm

What this PR does / why we need it?

Fix the precision issue of the LoRA feature in vllm-ascend.

Does this PR introduce any user-facing change?

How was this patch tested?

pytest tests/lora/test_llama_tp.py::test_llama_lora -s
lora_test

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a precision issue with the LoRA feature. The change in vllm_ascend/lora/punica_npu.py correctly casts an input tensor to float32 to match the kernel's expectation, resolving a data type mismatch.

However, the changes across the four C++ kernel files (bgmv_expand.cpp, bgmv_shrink.cpp, sgmv_expand.cpp, sgmv_shrink.cpp) introduce a critical issue. By commenting out the #if (__CCE_AICORE__ >= 220) directives at the kernel call sites, you are making the bfloat16_t kernel calls unconditional. But the kernel declarations themselves remain inside the conditional compilation blocks. This will lead to compilation errors on any platform where __CCE_AICORE__ < 220. I have left specific comments on each file with details on how to resolve this. These issues must be addressed to avoid breaking builds for other hardware targets.

Comment thread csrc/kernels/bgmv_expand.cpp Outdated
Comment thread csrc/kernels/bgmv_shrink.cpp Outdated
Comment thread csrc/kernels/sgmv_expand.cpp Outdated
Comment thread csrc/kernels/sgmv_shrink.cpp Outdated
@paulyu12
Copy link
Copy Markdown
Collaborator

LGTM. This PR can fix 2 bugs:

@paulyu12 paulyu12 added ready read for review ready-for-test start test by label for PR labels Nov 13, 2025
@paulyu12 paulyu12 added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Nov 14, 2025
@hukongyi hukongyi force-pushed the lora_fix branch 2 times, most recently from a04fe60 to 32563d6 Compare December 2, 2025 09:47
@paulyu12 paulyu12 added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR labels Dec 2, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 3, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@hukongyi hukongyi force-pushed the lora_fix branch 3 times, most recently from 9c25df1 to b3485ac Compare December 9, 2025 01:20
@paulyu12 paulyu12 added ready read for review and removed ready read for review ready-for-test start test by label for PR labels Dec 15, 2025
@paulyu12 paulyu12 added the ready-for-test start test by label for PR label Dec 15, 2025
@paulyu12 paulyu12 added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR labels Dec 16, 2025
@paulyu12 paulyu12 self-requested a review December 16, 2025 11:16
Copy link
Copy Markdown
Collaborator

@paulyu12 paulyu12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Actually, we worked on this PR together.

…n vllm-ascend.

Co-authored-by: liuchenbing <chenliumail@163.com>
Co-authored-by: guanyuzhu <zhuguanyu@huawei.com>
vLLM version: v0.11.0
vLLM main: vllm-project/vllm
signed-off-by: hukongyi <hukongyi@cmbchina.com>

Signed-off-by: hukongyi <hukongyi@cmbchina.com>
…n vllm-ascend

Co-authored-by: liuchenbing <chenliumail@163.com>
Co-authored-by: guanyuzhu <zhuguanyu@huawei.com>
vLLM version: v0.11.0
vLLM main: vllm-project/vllm
signed-off-by: hukongyi <hukongyi@cmbchina.com>

Signed-off-by: hukongyi <hukongyi@cmbchina.com>
…n vllm-ascend.

Co-authored-by: liuchenbing <chenliumail@163.com>
Co-authored-by: guanyuzhu <zhuguanyu@huawei.com>
vLLM version: v0.11.0
vLLM main: vllm-project/vllm
signed-off-by: hukongyi <hukongyi@cmbchina.com>

Signed-off-by: hukongyi <hukongyi@cmbchina.com>
Signed-off-by: hukongyi <hukongyi@cmbchina.com>
…M_OP_EXCLUDE

Signed-off-by: hukongyi <hukongyi@cmbchina.com>
@paulyu12 paulyu12 merged commit ea8f544 into vllm-project:main Dec 19, 2025
20 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 19, 2025
…to eplb_refactor

* 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits)
  [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084)
  [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818)
  [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171)
  [CI] Improve CI (vllm-project#5078)
  [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160)
  Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167)
  [Doc] Add a perf tune section (vllm-project#5127)
  [Image] Refactor image build (vllm-project#5175)
  [refactor] refactor weight trans nz and transpose (vllm-project#4878)
  [BugFix]Fix precision issue for LoRA feature (vllm-project#4141)
  【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827)
  support basic long_seq feature st (vllm-project#5140)
  [Bugfix] install trition for test_custom_op (vllm-project#5112)
  [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130)
  [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156)
  [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131)
  [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172)
  [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165)
  [Doc] Refact benchmark doc (vllm-project#5173)
  [Nightly]  Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174)
  ...

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
vLLM version: v0.11.0
vLLM main: vllm-project/vllm

### What this PR does / why we need it?
   Fix the precision issue of the LoRA feature in vllm-ascend.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```bash
pytest tests/lora/test_llama_tp.py::test_llama_lora -s
```
<img width="1319" height="879" alt="lora_test"
src="https://github.com/user-attachments/assets/2a0b2325-5b05-4bbc-ac03-a7c9f0ad9d4c"
/>


- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: hukongyi <hukongyi@cmbchina.com>
@Yikun
Copy link
Copy Markdown
Member

Yikun commented Dec 28, 2025

Thanks for your first contributions! Your awesome first PR has been included in vLLM Ascend v0.13.0rc1 release.

[1] https://github.com/vllm-project/vllm-ascend/releases/tag/v0.13.0rc1
[2] https://mp.weixin.qq.com/s/3Psz3mYFTLktgSEDGqM9wQ

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
vLLM version: v0.11.0
vLLM main: vllm-project/vllm

### What this PR does / why we need it?
   Fix the precision issue of the LoRA feature in vllm-ascend.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```bash
pytest tests/lora/test_llama_tp.py::test_llama_lora -s
```
<img width="1319" height="879" alt="lora_test"
src="https://github.com/user-attachments/assets/2a0b2325-5b05-4bbc-ac03-a7c9f0ad9d4c"
/>

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: hukongyi <hukongyi@cmbchina.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
vLLM version: v0.11.0
vLLM main: vllm-project/vllm

### What this PR does / why we need it?
   Fix the precision issue of the LoRA feature in vllm-ascend.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```bash
pytest tests/lora/test_llama_tp.py::test_llama_lora -s
```
<img width="1319" height="879" alt="lora_test"
src="https://github.com/user-attachments/assets/2a0b2325-5b05-4bbc-ac03-a7c9f0ad9d4c"
/>

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: hukongyi <hukongyi@cmbchina.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants