[Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe by xuebwang-amd · Pull Request #31676 · vllm-project/vllm

xuebwang-amd · 2026-01-04T09:11:27Z

Purpose

Since bias is typically not quantized, it should be added after the dequantization at last.
y = s_x * s_w * (Wq - zw) * (xq - zx) + bias
where:
s_x, s_w: scaling factors for activation and weight
Wq, xq: quantized weight and activation
zw, zx: zero points for weight and activation

Test Plan

(end-2-end): amd/gpt-oss-20b-WFP8-AFP8-KVFP8, see results below
- quantization scheme:
  - all weights (both attention and moe): FP8 per-tensor
  - all activation (both attention and moe): FP8 per-tensor
  - kv cache: FP8 per-tensor
(TODO?) may need a unit test on triton implemented fused moe kernel

Test Result

Example (TP=2):

Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd · 2026-01-04T09:13:14Z

Originated from PR#29008, #29008

gemini-code-assist

Code Review

This pull request addresses a correctness issue in the fused MoE Triton kernel by ensuring that bias addition and routed weight multiplication are performed after dequantization. The change correctly moves these operations to follow the scaling of the accumulator, which aligns with the standard mathematical formulation for quantized operations. This fix is crucial for numerical accuracy and appears to be implemented correctly.

xuebwang-amd · 2026-01-04T09:30:38Z

@tjtanaa @gshtras @BowenBao Please help to review this PR. Thanks.
BTW, Happy new year 2026!

tjtanaa · 2026-01-05T03:32:16Z

@mgoin PTAL . Thank you.

ApostaC · 2026-01-06T23:47:55Z

@mgoin Hey, could you take a look at this PR? Thanks!

mgoin

Thanks, this makes sense to me

…ed_moe_with_bias

AndreasKaratzas · 2026-01-07T23:30:05Z

@xuebwang-amd @tjtanaa

This PR breaks ROCm at: LoRA TP Test (Distributed). Specifically, the failure we are seeing is in test: FAILED lora/test_olmoe_tp.py::test_olmoe_lora_mixed. I am going to make a PR that fixes this failure, and I am going to CC you there. Feel free to check it out and see if it still serves the purpose you were aiming without introducing any test failures.

…oe (vllm-project#31676) Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…oe (vllm-project#31676) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…oe (vllm-project#31676) Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd added 2 commits January 4, 2026 09:04

fix bias adding for triton implemented fused_moe_kernel

73234d9

Signed-off-by: xuebwang-amd <xuebwang@amd.com>

a tiny code lint issue

08bb240

Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd requested review from mgoin and pavanimajety as code owners January 4, 2026 09:11

xuebwang-amd mentioned this pull request Jan 4, 2026

[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations #29008

Merged

22 tasks

gemini-code-assist bot reviewed Jan 4, 2026

View reviewed changes

mgoin approved these changes Jan 7, 2026

View reviewed changes

mgoin enabled auto-merge (squash) January 7, 2026 03:32

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed gpt-oss Related to GPT-OSS models labels Jan 7, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Jan 7, 2026

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Jan 7, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Jan 7, 2026

Merge remote-tracking branch 'origin/main' into xuebin_fix_triton_fus…

a80b38a

…ed_moe_with_bias

mgoin merged commit 0dd5dee into vllm-project:main Jan 7, 2026
51 of 52 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Jan 7, 2026

AndreasKaratzas mentioned this pull request Jan 7, 2026

[ROCm][LoRA] Fix MoE accuracy regression by preserving float32 router weight scaling #31931

Merged

1 task

xuebwang-amd mentioned this pull request Jan 8, 2026

[Kernel][MoE] fix computation order of MoE weight multiplication and improve flow #31962

Merged

gnovack mentioned this pull request Jan 9, 2026

fused_moe_kernel - cast accumulator after applying router weights #32002

Merged

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[Bugfix][Kernel] fix bias adding in triton kernel implemented fused m…

7e35142

…oe (vllm-project#31676) Signed-off-by: xuebwang-amd <xuebwang@amd.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[Bugfix][Kernel] fix bias adding in triton kernel implemented fused m…

99d31db

…oe (vllm-project#31676) Signed-off-by: xuebwang-amd <xuebwang@amd.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Bugfix][Kernel] fix bias adding in triton kernel implemented fused m…

9726046

…oe (vllm-project#31676) Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe#31676

[Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe#31676
mgoin merged 3 commits intovllm-project:mainfrom
xuebwang-amd:xuebin_fix_triton_fused_moe_with_bias

xuebwang-amd commented Jan 4, 2026 •

edited by github-actions bot

Loading

Uh oh!

xuebwang-amd commented Jan 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

xuebwang-amd commented Jan 4, 2026 •

edited

Loading

Uh oh!

tjtanaa commented Jan 5, 2026

Uh oh!

ApostaC commented Jan 6, 2026

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

AndreasKaratzas commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

xuebwang-amd commented Jan 4, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

xuebwang-amd commented Jan 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

xuebwang-amd commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaa commented Jan 5, 2026

Uh oh!

ApostaC commented Jan 6, 2026

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AndreasKaratzas commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xuebwang-amd commented Jan 4, 2026 •

edited by github-actions bot

Loading

xuebwang-amd commented Jan 4, 2026 •

edited

Loading