[ROCm] Cast score correction bias tensor during model construction for DeepSeek/Kimi-K2 by heachary · Pull Request #39999 · vllm-project/vllm

heachary · 2026-04-16T10:12:54Z

Purpose

The moe score correction bias tensor was being cast to the gate output dtype on every forward pass. The datatype that this tensor needs to be cast to is known at model construction time and never changes beyond that. So this repeated cast is redundant work that launches an extra GPU kernel per MoE layer per forward call.

This PR moves the cast to the model construction thereby eliminating the per-forward-pass overhead.

Summary

Before

After

The trace shows the elementwise operation kernel responsible for this typecast operation happening before the grouped-topk operation in every forward pass. With the following changes this typecast is shifted to model construction thereby avoiding the call during every forward pass:

vllm/model_executor/models/deepseek_v2.py: Pre-cast e_score_correction_bias to match gate.out_dtype during DeepseekV2MoE construction. Since all downstream consumers (FusedMoE, router) share the same nn.Parameter object, this single mutation propagates everywhere.
vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py: Replace the runtime .to() cast with an assert that the bias dtype already matches the gating output dtype, catching any future regression where the init-time cast is missed.
vllm/model_executor/layers/fused_moe/router/fused_topk_bias_router.py: Same change — replace the runtime .to() cast with a matching assert.

Test Result

Accuracy

Benchmark	Metric	Score	Threshold	Status
GSM8K (5-shot, 250 samples)	exact_match (flexible-extract)	0.936	0.90	✅ PASS
GSM8K (5-shot, 250 samples)	exact_match (strict-match)	0.936	0.90	✅ PASS

Performance

config	conc	Baseline	type-cast fix	Improvement
1k1k	4	190	194	1.021053
1k1k	8	321	323	1.006231
1k1k	16	521	528	1.013436
1k1k	32	783	799	1.020434
1k1k	64	1173	1180	1.005968
1k8k	4	108	110	1.018519
1k8k	8	188	191	1.015957
1k8k	16	307	309	1.006515
1k8k	32	485	491	1.012371
1k8k	64	730	734	1.005479
8k1k	4	799	816	1.021277
8k1k	8	1348	1365	1.012611
8k1k	16	2051	2064	1.006338
8k1k	32	2917	2949	1.01097
8k1k	64	4038	4012	0.993561
			Geomean	1.011355

Test Plan

Accuracy test
Performance benchmark

Signed-off-by: Hemanth Acharya <heachary@amd.com>

gemini-code-assist

Code Review

This pull request improves the efficiency of the DeepSeek-V2 model by pre-casting the e_score_correction_bias during the initialization phase, which avoids repeated type conversions during each forward pass. Additionally, it adds assertions to the ROCm Aiter fused MoE layers to verify that the bias and gating output types match. Feedback highlights a potential issue where direct mutation of the parameter's data and the use of a static type attribute could cause the new assertions to fail if the model is cast to a different precision after initialization.

Signed-off-by: Hemanth Acharya <heachary@amd.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

tjtanaa

LGTM.

Signed-off-by: Hemanth Acharya <heachary@amd.com>

tjtanaa · 2026-04-21T11:39:08Z

@bnellnm could you take a final look?

bnellnm · 2026-04-21T20:40:24Z

+        # Pre-cast the bias to match the gate output dtype so the
+        # conversion is not repeated on every forward pass.  All
+        # downstream references (FusedMoE, router) share the same
+        # nn.Parameter object, so mutating .data propagates everywhere.
+        # Weight loading uses copy_(), which handles the dtype conversion.
+        # Only needed on ROCm where the aiter biased_grouped_topk kernel
+        # requires the bias dtype to match the gating output dtype.
+        if (
+            self.is_rocm_aiter_moe_enabled
+            and self.gate.e_score_correction_bias is not None
+        ):
+            self.gate.e_score_correction_bias.data = (
+                self.gate.e_score_correction_bias.data.to(self.gate.out_dtype)
+            )


I think this block of code could live in fused_moe/layer.py (with any additional appropriate checks, e.g. routing type)

I mentioned already in my previous comment why thats a harder change that i decided to skip. Let me elaborate with some details here:

Moving the bias pre-cast (lines 354-367) into FusedMoE.init() isn't standalone — it depends on gate.set_out_dtype() which is called just above it, and that call relies on self.experts.quant_method.is_monolithic and self.experts.routing_method_type — both only available after FusedMoE.init() completes. So both blocks (set_out_dtype() and the new bias dtype cast) would need to move together to the end of FusedMoE.init().

The concern is that this becomes more invasive: every model passing gate= to FusedMoE — including qwen3_moe, qwen3_next, step3p5, and AXK1 — would now have set_out_dtype called automatically in FusedMoE.init(), which changes their gate output dtype behavior even though they don't currently call set_out_dtype at all.

If this is not a big concern, I would like to leave this section as is to minimise the impact.

bnellnm

LGTM but I think the initial casting code could probably live in layer.py since it seems like it is generally applicable to ROCm MoE and particular routing methods.

heachary · 2026-04-22T08:32:13Z

@tjtanaa theres a failing unit test, but from the looks of it i think its disconnected to the changes in this PR. let me know if i should investigate further.

…r DeepSeek/Kimi-K2 (vllm-project#39999) Signed-off-by: Hemanth Acharya <heachary@amd.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

…r DeepSeek/Kimi-K2 (vllm-project#39999) Signed-off-by: Hemanth Acharya <heachary@amd.com> Signed-off-by: Adrian <info@zzit.ch>

…r DeepSeek/Kimi-K2 (vllm-project#39999) Signed-off-by: Hemanth Acharya <heachary@amd.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

shift score correction bias typecast to constructor

fdb7403

Signed-off-by: Hemanth Acharya <heachary@amd.com>

mergify Bot added deepseek Related to DeepSeek models rocm Related to AMD ROCm labels Apr 16, 2026

github-project-automation Bot added this to AMD Apr 16, 2026

github-project-automation Bot moved this to Todo in AMD Apr 16, 2026

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread vllm/model_executor/models/deepseek_v2.py

replace assert with a conditional cast

34dd717

Signed-off-by: Hemanth Acharya <heachary@amd.com>

heachary force-pushed the heachary/kk2/fix-score-correction-bias-typecast branch from 0330c2b to 34dd717 Compare April 17, 2026 11:25

lint

c5086a0

Signed-off-by: Hemanth Acharya <heachary@amd.com>

heachary marked this pull request as ready for review April 18, 2026 10:00

heachary requested review from mgoin, pavanimajety and tjtanaa as code owners April 18, 2026 10:00

claude Bot reviewed Apr 18, 2026

View reviewed changes

tjtanaa approved these changes Apr 19, 2026

View reviewed changes

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 19, 2026

Merge branch 'main' into heachary/kk2/fix-score-correction-bias-typecast

aac707d

tjtanaa enabled auto-merge (squash) April 19, 2026 14:19

Merge branch 'main' into heachary/kk2/fix-score-correction-bias-typecast

2857bbe

Rohan138 approved these changes Apr 20, 2026

View reviewed changes

bnellnm reviewed Apr 20, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/router/fused_topk_bias_router.py Outdated

bnellnm reviewed Apr 20, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py Outdated

bnellnm reviewed Apr 20, 2026

View reviewed changes

Comment thread vllm/model_executor/models/deepseek_v2.py Outdated

add rocm guards

04534fd

Signed-off-by: Hemanth Acharya <heachary@amd.com>

auto-merge was automatically disabled April 21, 2026 08:27
Head branch was pushed to by a user without write access

refactored conditional cast to biased_grouped_topk

cfaa32e

Signed-off-by: Hemanth Acharya <heachary@amd.com>

heachary requested review from Rohan138, bnellnm and tjtanaa April 21, 2026 09:29

bnellnm reviewed Apr 21, 2026

View reviewed changes

Merge branch 'main' into heachary/kk2/fix-score-correction-bias-typecast

f9945ac

bnellnm approved these changes Apr 23, 2026

View reviewed changes

tjtanaa merged commit fa4b705 into vllm-project:main Apr 24, 2026
72 checks passed

github-project-automation Bot moved this from Todo to Done in AMD Apr 24, 2026

Rohan138 mentioned this pull request Apr 29, 2026

[Perf][MoE][ROCm][Kimi-K2.5] Remove a redundant per-forward-pass dtype conversion of the routing bias parameter in DeepSeek-V2/V3 MoE #40341

Closed

rbrugaro-amd mentioned this pull request Apr 30, 2026

[ROCm][Bugfix] Fix init-time bias dtype cast when gate.out_dtype is None #41405

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Cast score correction bias tensor during model construction for DeepSeek/Kimi-K2#39999

[ROCm] Cast score correction bias tensor during model construction for DeepSeek/Kimi-K2#39999
tjtanaa merged 8 commits intovllm-project:mainfrom
heachary:heachary/kk2/fix-score-correction-bias-typecast

heachary commented Apr 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

tjtanaa left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjtanaa commented Apr 21, 2026

Uh oh!

bnellnm Apr 21, 2026 •

edited

Loading

Uh oh!

heachary Apr 22, 2026 •

edited

Loading

Uh oh!

bnellnm left a comment

Uh oh!

heachary commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

heachary commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Summary

Before

After

Test Result

Accuracy

Performance

Test Plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjtanaa commented Apr 21, 2026

Uh oh!

bnellnm Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heachary Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnellnm left a comment

Choose a reason for hiding this comment

Uh oh!

heachary commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

heachary commented Apr 16, 2026 •

edited

Loading

bnellnm Apr 21, 2026 •

edited

Loading

heachary Apr 22, 2026 •

edited

Loading