[Model] Remove the unnecessary dtype conversion in MiniCPM by gcanlin · Pull Request #32523 · vllm-project/vllm

gcanlin · 2026-01-17T16:06:24Z

Purpose

In the past, vllm-ascend has to add the patch for the unsupported dtype conversion in MiniCPMAttention forward. For removing the patch, we upstream the change into vLLM. For now, the conversion would be unnecessary for any platforms because the kernels handle precision internally.

If necessary, I will give a acc test for MiniCPM.

Test Plan

See vllm-project/vllm-ascend#5975.

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gemini-code-assist

Code Review

This pull request removes an explicit dtype conversion to float32 in MiniCPMAttention before applying rotary embeddings. The rationale is that the underlying kernels now handle precision internally. While this is likely true for optimized custom kernels, I've raised a concern about the PyTorch-native fallback path (forward_native), which could suffer from precision loss without this explicit casting. I've recommended restoring the casting to ensure numerical stability across all execution paths.

vllm/model_executor/models/minicpm.py

gcanlin · 2026-01-18T05:56:49Z

@Isotr0py Hi :) Could you please help take a look?

Isotr0py

LGTM, but would be better to have some acc benchmark results in case.

gcanlin · 2026-01-18T11:36:52Z

@Isotr0py Thanks! Here is the comparison between before and after this change. It didn't cause the acc regression.

Command:

vllm serve openbmb/MiniCPM-2B-sft-bf16 --trust-remote-code --max_num_batched_tokens 65536
evalscope eval  --model openbmb/MiniCPM-2B-sft-bf16  --api-url http://localhost:8000/v1  --api-key EMPTY  --eval-type server  --datasets gsm8k

Before:

+---------------------+-----------+----------+----------+-------+---------+---------+
| Model               | Dataset   | Metric   | Subset   |   Num |   Score | Cat.0   |
+=====================+===========+==========+==========+=======+=========+=========+
| MiniCPM-2B-sft-bf16 | gsm8k     | mean_acc | main     |  1319 |  0.4428 | default |
+---------------------+-----------+----------+----------+-------+---------+---------+

After:

+---------------------+-----------+----------+----------+-------+---------+---------+
| Model               | Dataset   | Metric   | Subset   |   Num |   Score | Cat.0   |
+=====================+===========+==========+==========+=======+=========+=========+
| MiniCPM-2B-sft-bf16 | gsm8k     | mean_acc | main     |  1319 |   0.442 | default |
+---------------------+-----------+----------+----------+-------+---------+---------+

…ect#32523) Signed-off-by: gcanlin <canlinguosdu@gmail.com>

…ect#32523) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

### What this PR does / why we need it? Part of #5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>

### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

…ect#32523) Signed-off-by: gcanlin <canlinguosdu@gmail.com>

### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>

### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Part of vllm-project#5304. After vllm-project/vllm#32523 merge, we could remove the patch of `MiniCPMAttention`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Test it locally. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>

[Model] Remove the unnecessary dtype conversion in MiniCPM

64f7855

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

vllm/model_executor/models/minicpm.py Show resolved Hide resolved

gcanlin mentioned this pull request Jan 17, 2026

[Patch] Remove the patch of MiniCPM vllm-project/vllm-ascend#5975

Merged

Isotr0py approved these changes Jan 18, 2026

View reviewed changes

Isotr0py enabled auto-merge (squash) January 18, 2026 06:12

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 18, 2026

Isotr0py merged commit fe36bf5 into vllm-project:main Jan 18, 2026
57 of 58 checks passed

gopalsarda pushed a commit to gopalsarda/vllm that referenced this pull request Jan 20, 2026

[Model] Remove the unnecessary dtype conversion in MiniCPM (vllm-proj…

5ef3711

…ect#32523) Signed-off-by: gcanlin <canlinguosdu@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[Model] Remove the unnecessary dtype conversion in MiniCPM (vllm-proj…

f68f1dc

…ect#32523) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Model] Remove the unnecessary dtype conversion in MiniCPM (vllm-proj…

1f9224d

…ect#32523) Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] Remove the unnecessary dtype conversion in MiniCPM#32523

[Model] Remove the unnecessary dtype conversion in MiniCPM#32523
Isotr0py merged 1 commit intovllm-project:mainfrom
gcanlin:minicpm

gcanlin commented Jan 17, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gcanlin commented Jan 18, 2026

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

gcanlin commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gcanlin commented Jan 17, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gcanlin commented Jan 18, 2026

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gcanlin commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcanlin commented Jan 17, 2026 •

edited by github-actions bot

Loading