Revert "[NemotronH] Do not force router to run in fp32 (#34582)" by roikoren755 · Pull Request #34808 · vllm-project/vllm

roikoren755 · 2026-02-18T14:44:12Z

Purpose

#34582 introduced an accuracy degradation. Working with @robertgshaw2-redhat for a better implementation of this performance improvement in #34302

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…34582)" This reverts commit 3b30e61. Signed-off-by: Roi Koren <roik@nvidia.com>

Signed-off-by: Roi Koren <roik@nvidia.com>

gemini-code-assist

Code Review

This pull request reverts a previous change that introduced an accuracy degradation in the NemotronH model. The changes force the Mixture-of-Experts (MoE) router to operate in float32 precision. This is achieved by explicitly setting the gate layer's parameters to float32 and casting the input hidden states to float32 before the router computation. This is a standard practice to maintain numerical stability in MoE routing and is a correct fix for the reported accuracy issue. The changes are clear, concise, and effectively address the problem. I find no issues with this revert.

mgoin · 2026-02-18T21:26:15Z

What accuracy regression did it cause and for what model?

roikoren755 · 2026-02-19T08:31:22Z

What accuracy regression did it cause and for what model?

#34704 caught the accuracy degradation for Nemotron 3 Nano FP8, which dropped 4-5%. Running a "standalone" GSM8K the score drops by about 2%. It seems like the BF16 model's accuracy also got a hit, but it stayed inside the tolerance of the test.

…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

roikoren755 added 2 commits February 18, 2026 16:47

Revert "[NemotronH] Do not force router to run in fp32 (vllm-project#…

a57c2eb

…34582)" This reverts commit 3b30e61. Signed-off-by: Roi Koren <roik@nvidia.com>

Leave this in

c7c5142

Signed-off-by: Roi Koren <roik@nvidia.com>

roikoren755 force-pushed the feat/nemotronh-router-revert branch from 916b6e6 to c7c5142 Compare February 18, 2026 14:47

gemini-code-assist bot reviewed Feb 18, 2026

View reviewed changes

edwinlim0919 mentioned this pull request Feb 18, 2026

[Bug]: Set env ROCP_TOOL_ATTACH=1 caused vllm server stopped #34205

Closed

1 task

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed nvidia labels Feb 19, 2026

github-project-automation bot added this to NVIDIA Feb 19, 2026

mgoin approved these changes Feb 19, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Feb 19, 2026

Merge branch 'main' into feat/nemotronh-router-revert

1eac481

vllm-bot merged commit 3eff45d into vllm-project:main Feb 19, 2026
49 of 56 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Feb 19, 2026

roikoren755 deleted the feat/nemotronh-router-revert branch February 22, 2026 09:33

jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026

Revert "[NemotronH] Do not force router to run in fp32 (vllm-project#…

b8db2c6

…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

Revert "[NemotronH] Do not force router to run in fp32 (vllm-project#…

352efbd

…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026

Revert "[NemotronH] Do not force router to run in fp32 (vllm-project#…

053fa67

…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[NemotronH] Do not force router to run in fp32 (#34582)"#34808

Revert "[NemotronH] Do not force router to run in fp32 (#34582)"#34808
vllm-bot merged 3 commits intovllm-project:mainfrom
roikoren755:feat/nemotronh-router-revert

roikoren755 commented Feb 18, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mgoin commented Feb 18, 2026

Uh oh!

roikoren755 commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

roikoren755 commented Feb 18, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mgoin commented Feb 18, 2026

Uh oh!

roikoren755 commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

roikoren755 commented Feb 18, 2026 •

edited by github-actions bot

Loading