[BugFix]Fix precision issue for LoRA feature#4141
[BugFix]Fix precision issue for LoRA feature#4141paulyu12 merged 5 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request aims to fix a precision issue with the LoRA feature. The change in vllm_ascend/lora/punica_npu.py correctly casts an input tensor to float32 to match the kernel's expectation, resolving a data type mismatch.
However, the changes across the four C++ kernel files (bgmv_expand.cpp, bgmv_shrink.cpp, sgmv_expand.cpp, sgmv_shrink.cpp) introduce a critical issue. By commenting out the #if (__CCE_AICORE__ >= 220) directives at the kernel call sites, you are making the bfloat16_t kernel calls unconditional. But the kernel declarations themselves remain inside the conditional compilation blocks. This will lead to compilation errors on any platform where __CCE_AICORE__ < 220. I have left specific comments on each file with details on how to resolve this. These issues must be addressed to avoid breaking builds for other hardware targets.
|
LGTM. This PR can fix 2 bugs:
|
a04fe60 to
32563d6
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
9c25df1 to
b3485ac
Compare
paulyu12
left a comment
There was a problem hiding this comment.
LGTM. Actually, we worked on this PR together.
…n vllm-ascend. Co-authored-by: liuchenbing <chenliumail@163.com> Co-authored-by: guanyuzhu <zhuguanyu@huawei.com> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <hukongyi@cmbchina.com> Signed-off-by: hukongyi <hukongyi@cmbchina.com>
…n vllm-ascend Co-authored-by: liuchenbing <chenliumail@163.com> Co-authored-by: guanyuzhu <zhuguanyu@huawei.com> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <hukongyi@cmbchina.com> Signed-off-by: hukongyi <hukongyi@cmbchina.com>
…n vllm-ascend. Co-authored-by: liuchenbing <chenliumail@163.com> Co-authored-by: guanyuzhu <zhuguanyu@huawei.com> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <hukongyi@cmbchina.com> Signed-off-by: hukongyi <hukongyi@cmbchina.com>
Signed-off-by: hukongyi <hukongyi@cmbchina.com>
…M_OP_EXCLUDE Signed-off-by: hukongyi <hukongyi@cmbchina.com>
…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits) [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084) [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818) [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171) [CI] Improve CI (vllm-project#5078) [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160) Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167) [Doc] Add a perf tune section (vllm-project#5127) [Image] Refactor image build (vllm-project#5175) [refactor] refactor weight trans nz and transpose (vllm-project#4878) [BugFix]Fix precision issue for LoRA feature (vllm-project#4141) 【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827) support basic long_seq feature st (vllm-project#5140) [Bugfix] install trition for test_custom_op (vllm-project#5112) [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130) [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156) [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131) [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172) [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165) [Doc] Refact benchmark doc (vllm-project#5173) [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174) ... Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
vLLM version: v0.11.0 vLLM main: vllm-project/vllm ### What this PR does / why we need it? Fix the precision issue of the LoRA feature in vllm-ascend. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ```bash pytest tests/lora/test_llama_tp.py::test_llama_lora -s ``` <img width="1319" height="879" alt="lora_test" src="https://github.com/user-attachments/assets/2a0b2325-5b05-4bbc-ac03-a7c9f0ad9d4c" /> - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: hukongyi <hukongyi@cmbchina.com>
|
Thanks for your first contributions! Your awesome first PR has been included in vLLM Ascend v0.13.0rc1 release. [1] https://github.com/vllm-project/vllm-ascend/releases/tag/v0.13.0rc1 |
vLLM version: v0.11.0 vLLM main: vllm-project/vllm ### What this PR does / why we need it? Fix the precision issue of the LoRA feature in vllm-ascend. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ```bash pytest tests/lora/test_llama_tp.py::test_llama_lora -s ``` <img width="1319" height="879" alt="lora_test" src="https://github.com/user-attachments/assets/2a0b2325-5b05-4bbc-ac03-a7c9f0ad9d4c" /> - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: hukongyi <hukongyi@cmbchina.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
vLLM version: v0.11.0 vLLM main: vllm-project/vllm ### What this PR does / why we need it? Fix the precision issue of the LoRA feature in vllm-ascend. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ```bash pytest tests/lora/test_llama_tp.py::test_llama_lora -s ``` <img width="1319" height="879" alt="lora_test" src="https://github.com/user-attachments/assets/2a0b2325-5b05-4bbc-ac03-a7c9f0ad9d4c" /> - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: hukongyi <hukongyi@cmbchina.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
vLLM version: v0.11.0
vLLM main: vllm-project/vllm
What this PR does / why we need it?
Fix the precision issue of the LoRA feature in vllm-ascend.
Does this PR introduce any user-facing change?
How was this patch tested?