[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] for TRTLLM per-tensor FP8 MoE by mgoin · Pull Request #33620 · vllm-project/vllm

mgoin · 2026-02-03T00:35:50Z

Purpose

Test Plan

Test Result

Tested on B200

vllm serve RedHatAI/DeepSeek-Coder-V2-Lite-Instruct-FP8

# BEFORE: Fp8MoeBackend.FLASHINFER_TRTLLM
python tests/evals/gsm8k/gsm8k_eval.py --port 8000
Running GSM8K evaluation: 1319 questions, 5-shot
Evaluating: 100%|█████████████████████████████████████████████████████| 1319/1319 [00:36<00:00, 36.20it/s]
Results:
Accuracy: 0.016
Invalid responses: 0.041
Total latency: 36.448 s
Questions per second: 36.188
Total output tokens: 272255
Output tokens per second: 7469.620

# AFTER: Fp8MoeBackend.FLASHINFER_CUTLASS
python tests/evals/gsm8k/gsm8k_eval.py --port 8000
Running GSM8K evaluation: 1319 questions, 5-shot
Evaluating: 100%|█████████████████████████████████████████████████████| 1319/1319 [00:30<00:00, 43.25it/s]
Results:
Accuracy: 0.780
Invalid responses: 0.000
Total latency: 30.510 s
Questions per second: 43.232
Total output tokens: 172465
Output tokens per second: 5652.764

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

per-tensor FP8 MoE Signed-off-by: mgoin <mgoin64@gmail.com>

gemini-code-assist

Code Review

This pull request disables the Renormalize and RenormalizeNaive routing methods for TRT-LLM per-tensor FP8 MoE by commenting them out. This is a temporary measure to address accuracy issues as noted in the associated issue. The change is clear, well-commented, and appears to be a reasonable and safe approach to mitigate the bug. I have no further recommendations.

…LLM per-tensor FP8 MoE (vllm-project#33620) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Pai <416932041@qq.com>

…LLM per-tensor FP8 MoE (#33620) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit e346e2d) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

…LLM per-tensor FP8 MoE (vllm-project#33620) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>

…LLM per-tensor FP8 MoE (vllm-project#33620) Signed-off-by: mgoin <mgoin64@gmail.com>

Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM

cc10a08

per-tensor FP8 MoE Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin requested a review from pavanimajety as a code owner February 3, 2026 00:35

mergify bot added nvidia bug Something isn't working labels Feb 3, 2026

github-project-automation bot added this to NVIDIA Feb 3, 2026

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

mgoin added this to the v0.15.1 Hotfix milestone Feb 3, 2026

robertgshaw2-redhat approved these changes Feb 3, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Feb 3, 2026

robertgshaw2-redhat enabled auto-merge (squash) February 3, 2026 01:19

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 3, 2026

mgoin changed the title ~~[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE~~ [Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] for TRTLLM per-tensor FP8 MoE Feb 3, 2026

robertgshaw2-redhat merged commit e346e2d into vllm-project:main Feb 3, 2026
53 of 54 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Feb 3, 2026

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRT…

da313d9

…LLM per-tensor FP8 MoE (vllm-project#33620) Signed-off-by: mgoin <mgoin64@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] for TRTLLM per-tensor FP8 MoE#33620

[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] for TRTLLM per-tensor FP8 MoE#33620
robertgshaw2-redhat merged 1 commit intovllm-project:mainfrom
neuralmagic:disable-renormalize-trtllm-tensor-fp8-moe

mgoin commented Feb 3, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mgoin commented Feb 3, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgoin commented Feb 3, 2026 •

edited by github-actions bot

Loading