[BugFix]Fix precision issue for LoRA feature by hukongyi · Pull Request #4141 · vllm-project/vllm-ascend

hukongyi · 2025-11-12T04:05:16Z

vLLM version: v0.11.0
vLLM main: vllm-project/vllm

What this PR does / why we need it?

Fix the precision issue of the LoRA feature in vllm-ascend.

Does this PR introduce any user-facing change?

How was this patch tested?

pytest tests/lora/test_llama_tp.py::test_llama_lora -s

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

github-actions · 2025-11-12T04:05:25Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to fix a precision issue with the LoRA feature. The change in vllm_ascend/lora/punica_npu.py correctly casts an input tensor to float32 to match the kernel's expectation, resolving a data type mismatch.

However, the changes across the four C++ kernel files (bgmv_expand.cpp, bgmv_shrink.cpp, sgmv_expand.cpp, sgmv_shrink.cpp) introduce a critical issue. By commenting out the #if (__CCE_AICORE__ >= 220) directives at the kernel call sites, you are making the bfloat16_t kernel calls unconditional. But the kernel declarations themselves remain inside the conditional compilation blocks. This will lead to compilation errors on any platform where __CCE_AICORE__ < 220. I have left specific comments on each file with details on how to resolve this. These issues must be addressed to avoid breaking builds for other hardware targets.

paulyu12 · 2025-11-13T06:10:29Z

LGTM. This PR can fix 2 bugs:

The accuracy issue when we add Llama-2-7b-hf LoRA e2e testcase.
LoRA custom operators do not support dtype bfloat16, which is also mentioned at [Bug]: LoRA not working in v0.11.0rc0 #3668 (comment)

github-actions · 2025-12-03T11:48:25Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

paulyu12

LGTM. Actually, we worked on this PR together.

…n vllm-ascend. Co-authored-by: liuchenbing <chenliumail@163.com> Co-authored-by: guanyuzhu <zhuguanyu@huawei.com> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <hukongyi@cmbchina.com> Signed-off-by: hukongyi <hukongyi@cmbchina.com>

…n vllm-ascend Co-authored-by: liuchenbing <chenliumail@163.com> Co-authored-by: guanyuzhu <zhuguanyu@huawei.com> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <hukongyi@cmbchina.com> Signed-off-by: hukongyi <hukongyi@cmbchina.com>

…n vllm-ascend. Co-authored-by: liuchenbing <chenliumail@163.com> Co-authored-by: guanyuzhu <zhuguanyu@huawei.com> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <hukongyi@cmbchina.com> Signed-off-by: hukongyi <hukongyi@cmbchina.com>

Signed-off-by: hukongyi <hukongyi@cmbchina.com>

…M_OP_EXCLUDE Signed-off-by: hukongyi <hukongyi@cmbchina.com>

…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits) [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084) [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818) [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171) [CI] Improve CI (vllm-project#5078) [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160) Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167) [Doc] Add a perf tune section (vllm-project#5127) [Image] Refactor image build (vllm-project#5175) [refactor] refactor weight trans nz and transpose (vllm-project#4878) [BugFix]Fix precision issue for LoRA feature (vllm-project#4141) 【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827) support basic long_seq feature st (vllm-project#5140) [Bugfix] install trition for test_custom_op (vllm-project#5112) [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130) [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156) [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131) [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172) [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165) [Doc] Refact benchmark doc (vllm-project#5173) [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174) ... Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

vLLM version: v0.11.0 vLLM main: vllm-project/vllm ### What this PR does / why we need it? Fix the precision issue of the LoRA feature in vllm-ascend. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ```bash pytest tests/lora/test_llama_tp.py::test_llama_lora -s ``` <img width="1319" height="879" alt="lora_test" src="https://github.com/user-attachments/assets/2a0b2325-5b05-4bbc-ac03-a7c9f0ad9d4c" /> - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: hukongyi <hukongyi@cmbchina.com>

Yikun · 2025-12-28T00:47:07Z

Thanks for your first contributions! Your awesome first PR has been included in vLLM Ascend v0.13.0rc1 release.

[1] https://github.com/vllm-project/vllm-ascend/releases/tag/v0.13.0rc1
[2] https://mp.weixin.qq.com/s/3Psz3mYFTLktgSEDGqM9wQ

vLLM version: v0.11.0 vLLM main: vllm-project/vllm ### What this PR does / why we need it? Fix the precision issue of the LoRA feature in vllm-ascend. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ```bash pytest tests/lora/test_llama_tp.py::test_llama_lora -s ``` <img width="1319" height="879" alt="lora_test" src="https://github.com/user-attachments/assets/2a0b2325-5b05-4bbc-ac03-a7c9f0ad9d4c" /> - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: hukongyi <hukongyi@cmbchina.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

Comment thread csrc/kernels/bgmv_expand.cpp Outdated

Comment thread csrc/kernels/bgmv_shrink.cpp Outdated

Comment thread csrc/kernels/sgmv_expand.cpp Outdated

Comment thread csrc/kernels/sgmv_shrink.cpp Outdated

paulyu12 mentioned this pull request Nov 13, 2025

[BugFix]This PR aims to fix the precision issue of the LoRA feature i… #4046

Closed

hukongyi force-pushed the lora_fix branch from 95f9f7e to 25534b7 Compare November 13, 2025 03:23

paulyu12 added ready read for review ready-for-test start test by label for PR labels Nov 13, 2025

hukongyi force-pushed the lora_fix branch from 12fbd58 to 5c4a97c Compare November 13, 2025 11:07

paulyu12 added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Nov 14, 2025

hukongyi force-pushed the lora_fix branch 2 times, most recently from a04fe60 to 32563d6 Compare December 2, 2025 09:47

paulyu12 added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR labels Dec 2, 2025

hukongyi force-pushed the lora_fix branch from 2b0f9be to 5cee7a8 Compare December 3, 2025 03:55

github-actions bot added module:core merge-conflicts labels Dec 3, 2025

github-actions bot removed the merge-conflicts label Dec 3, 2025

hukongyi force-pushed the lora_fix branch from 91ca003 to 73a2720 Compare December 4, 2025 01:16

github-actions bot removed the module:core label Dec 4, 2025

hukongyi force-pushed the lora_fix branch 3 times, most recently from 9c25df1 to b3485ac Compare December 9, 2025 01:20

hukongyi force-pushed the lora_fix branch from b3485ac to d92d9e4 Compare December 12, 2025 07:03

paulyu12 added ready read for review and removed ready read for review ready-for-test start test by label for PR labels Dec 15, 2025

paulyu12 added the ready-for-test start test by label for PR label Dec 15, 2025

hukongyi force-pushed the lora_fix branch from d92d9e4 to 1442a6a Compare December 16, 2025 02:48

paulyu12 added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR labels Dec 16, 2025

paulyu12 force-pushed the lora_fix branch from 1442a6a to a25dbf9 Compare December 16, 2025 11:15

paulyu12 self-requested a review December 16, 2025 11:16

paulyu12 approved these changes Dec 19, 2025

View reviewed changes

hukongyi added 5 commits December 19, 2025 14:17

310p compile

2b156bb

Signed-off-by: hukongyi <hukongyi@cmbchina.com>

Remove non-LoRa operators from the kernel folder in VLLM_ASCEND_CUSTO…

ee3fafa

…M_OP_EXCLUDE Signed-off-by: hukongyi <hukongyi@cmbchina.com>

paulyu12 force-pushed the lora_fix branch from 770835b to ee3fafa Compare December 19, 2025 06:17

paulyu12 merged commit ea8f544 into vllm-project:main Dec 19, 2025
20 checks passed

Yikun mentioned this pull request Dec 28, 2025

[Bug]: bf16 lora don't work with 0.11.0rc2 #5021

Closed

Yikun mentioned this pull request Jan 4, 2026

[Bug]: Launch model with lora, cannot access to the original model, and have accuracy issue #5031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix]Fix precision issue for LoRA feature#4141

[BugFix]Fix precision issue for LoRA feature#4141
paulyu12 merged 5 commits intovllm-project:mainfrom
hukongyi:lora_fix

hukongyi commented Nov 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paulyu12 commented Nov 13, 2025

Uh oh!

github-actions bot commented Dec 3, 2025

Uh oh!

paulyu12 left a comment

Uh oh!

Uh oh!

Yikun commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hukongyi commented Nov 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paulyu12 commented Nov 13, 2025

Uh oh!

github-actions bot commented Dec 3, 2025

Uh oh!

paulyu12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Yikun commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hukongyi commented Nov 12, 2025 •

edited by github-actions bot

Loading