Skip to content

[Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe#31676

Merged
mgoin merged 3 commits intovllm-project:mainfrom
xuebwang-amd:xuebin_fix_triton_fused_moe_with_bias
Jan 7, 2026
Merged

[Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe#31676
mgoin merged 3 commits intovllm-project:mainfrom
xuebwang-amd:xuebin_fix_triton_fused_moe_with_bias

Conversation

@xuebwang-amd
Copy link
Contributor

@xuebwang-amd xuebwang-amd commented Jan 4, 2026

Purpose

Since bias is typically not quantized, it should be added after the dequantization at last.
y = s_x * s_w * (Wq - zw) * (xq - zx) + bias
where:
s_x, s_w: scaling factors for activation and weight
Wq, xq: quantized weight and activation
zw, zx: zero points for weight and activation

Test Plan

  • (end-2-end): amd/gpt-oss-20b-WFP8-AFP8-KVFP8, see results below
    • quantization scheme:
      • all weights (both attention and moe): FP8 per-tensor
      • all activation (both attention and moe): FP8 per-tensor
      • kv cache: FP8 per-tensor
  • (TODO?) may need a unit test on triton implemented fused moe kernel

Test Result

image

Example (TP=2):
image

Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
@xuebwang-amd
Copy link
Contributor Author

Originated from PR#29008, #29008

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a correctness issue in the fused MoE Triton kernel by ensuring that bias addition and routed weight multiplication are performed after dequantization. The change correctly moves these operations to follow the scaling of the accumulator, which aligns with the standard mathematical formulation for quantized operations. This fix is crucial for numerical accuracy and appears to be implemented correctly.

@xuebwang-amd
Copy link
Contributor Author

xuebwang-amd commented Jan 4, 2026

@tjtanaa @gshtras @BowenBao Please help to review this PR. Thanks.
BTW, Happy new year 2026!

@tjtanaa
Copy link
Collaborator

tjtanaa commented Jan 5, 2026

@mgoin PTAL . Thank you.

@ApostaC
Copy link
Collaborator

ApostaC commented Jan 6, 2026

@mgoin Hey, could you take a look at this PR? Thanks!

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this makes sense to me

@mgoin mgoin enabled auto-merge (squash) January 7, 2026 03:32
@mgoin mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed gpt-oss Related to GPT-OSS models labels Jan 7, 2026
@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Jan 7, 2026
@mgoin mgoin merged commit 0dd5dee into vllm-project:main Jan 7, 2026
51 of 52 checks passed
@AndreasKaratzas
Copy link
Collaborator

@xuebwang-amd @tjtanaa

This PR breaks ROCm at: LoRA TP Test (Distributed). Specifically, the failure we are seeing is in test: FAILED lora/test_olmoe_tp.py::test_olmoe_lora_mixed. I am going to make a PR that fixes this failure, and I am going to CC you there. Feel free to check it out and see if it still serves the purpose you were aiming without introducing any test failures.

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…oe (vllm-project#31676)

Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants