Skip to content

[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] for TRTLLM per-tensor FP8 MoE#33620

Merged
robertgshaw2-redhat merged 1 commit intovllm-project:mainfrom
neuralmagic:disable-renormalize-trtllm-tensor-fp8-moe
Feb 3, 2026
Merged

[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] for TRTLLM per-tensor FP8 MoE#33620
robertgshaw2-redhat merged 1 commit intovllm-project:mainfrom
neuralmagic:disable-renormalize-trtllm-tensor-fp8-moe

Conversation

@mgoin
Copy link
Copy Markdown
Member

@mgoin mgoin commented Feb 3, 2026

Purpose

FIX #33532

Test Plan

Test Result

Tested on B200

vllm serve RedHatAI/DeepSeek-Coder-V2-Lite-Instruct-FP8

# BEFORE: Fp8MoeBackend.FLASHINFER_TRTLLM
python tests/evals/gsm8k/gsm8k_eval.py --port 8000
Running GSM8K evaluation: 1319 questions, 5-shot
Evaluating: 100%|█████████████████████████████████████████████████████| 1319/1319 [00:36<00:00, 36.20it/s]
Results:
Accuracy: 0.016
Invalid responses: 0.041
Total latency: 36.448 s
Questions per second: 36.188
Total output tokens: 272255
Output tokens per second: 7469.620

# AFTER: Fp8MoeBackend.FLASHINFER_CUTLASS
python tests/evals/gsm8k/gsm8k_eval.py --port 8000
Running GSM8K evaluation: 1319 questions, 5-shot
Evaluating: 100%|█████████████████████████████████████████████████████| 1319/1319 [00:30<00:00, 43.25it/s]
Results:
Accuracy: 0.780
Invalid responses: 0.000
Total latency: 30.510 s
Questions per second: 43.232
Total output tokens: 172465
Output tokens per second: 5652.764

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

per-tensor FP8 MoE

Signed-off-by: mgoin <mgoin64@gmail.com>
@mgoin mgoin requested a review from pavanimajety as a code owner February 3, 2026 00:35
@mergify mergify bot added nvidia bug Something isn't working labels Feb 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request disables the Renormalize and RenormalizeNaive routing methods for TRT-LLM per-tensor FP8 MoE by commenting them out. This is a temporary measure to address accuracy issues as noted in the associated issue. The change is clear, well-commented, and appears to be a reasonable and safe approach to mitigate the bug. I have no further recommendations.

@mgoin mgoin added this to the v0.15.1 Hotfix milestone Feb 3, 2026
@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Feb 3, 2026
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) February 3, 2026 01:19
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 3, 2026
@mgoin mgoin changed the title [Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE [Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] for TRTLLM per-tensor FP8 MoE Feb 3, 2026
@robertgshaw2-redhat robertgshaw2-redhat merged commit e346e2d into vllm-project:main Feb 3, 2026
53 of 54 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Feb 3, 2026
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
…LLM per-tensor FP8 MoE (vllm-project#33620)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Pai <416932041@qq.com>
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
…LLM per-tensor FP8 MoE (vllm-project#33620)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Pai <416932041@qq.com>
robertgshaw2-redhat pushed a commit that referenced this pull request Feb 4, 2026
…LLM per-tensor FP8 MoE (#33620)

Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit e346e2d)

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
gameofdimension pushed a commit to gameofdimension/vllm that referenced this pull request Feb 5, 2026
…LLM per-tensor FP8 MoE (vllm-project#33620)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…LLM per-tensor FP8 MoE (vllm-project#33620)

Signed-off-by: mgoin <mgoin64@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[CI Failure]: DeepSeek V2 Lite FP8 0% Accuracy [NIGHTLY]

2 participants