[Model] Remove the unnecessary dtype conversion in MiniCPM#32523
[Model] Remove the unnecessary dtype conversion in MiniCPM#32523Isotr0py merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request removes an explicit dtype conversion to float32 in MiniCPMAttention before applying rotary embeddings. The rationale is that the underlying kernels now handle precision internally. While this is likely true for optimized custom kernels, I've raised a concern about the PyTorch-native fallback path (forward_native), which could suffer from precision loss without this explicit casting. I've recommended restoring the casting to ensure numerical stability across all execution paths.
|
@Isotr0py Hi :) Could you please help take a look? |
Isotr0py
left a comment
There was a problem hiding this comment.
LGTM, but would be better to have some acc benchmark results in case.
|
@Isotr0py Thanks! Here is the comparison between before and after this change. It didn't cause the acc regression. Command: Before: After: |
…ect#32523) Signed-off-by: gcanlin <canlinguosdu@gmail.com>
…ect#32523) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
### What this PR does / why we need it? Part of #5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>
### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>
…ect#32523) Signed-off-by: gcanlin <canlinguosdu@gmail.com>
### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>
### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Purpose
In the past, vllm-ascend has to add the patch for the unsupported dtype conversion in
MiniCPMAttentionforward. For removing the patch, we upstream the change into vLLM. For now, the conversion would be unnecessary for any platforms because the kernels handle precision internally.If necessary, I will give a acc test for MiniCPM.
Test Plan
See vllm-project/vllm-ascend#5975.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.