Skip to content

Revert "[NemotronH] Do not force router to run in fp32 (#34582)"#34808

Merged
vllm-bot merged 3 commits intovllm-project:mainfrom
roikoren755:feat/nemotronh-router-revert
Feb 19, 2026
Merged

Revert "[NemotronH] Do not force router to run in fp32 (#34582)"#34808
vllm-bot merged 3 commits intovllm-project:mainfrom
roikoren755:feat/nemotronh-router-revert

Conversation

@roikoren755
Copy link
Copy Markdown
Contributor

@roikoren755 roikoren755 commented Feb 18, 2026

Purpose

#34582 introduced an accuracy degradation. Working with @robertgshaw2-redhat for a better implementation of this performance improvement in #34302

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…34582)"

This reverts commit 3b30e61.

Signed-off-by: Roi Koren <roik@nvidia.com>
Signed-off-by: Roi Koren <roik@nvidia.com>
@roikoren755 roikoren755 force-pushed the feat/nemotronh-router-revert branch from 916b6e6 to c7c5142 Compare February 18, 2026 14:47
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous change that introduced an accuracy degradation in the NemotronH model. The changes force the Mixture-of-Experts (MoE) router to operate in float32 precision. This is achieved by explicitly setting the gate layer's parameters to float32 and casting the input hidden states to float32 before the router computation. This is a standard practice to maintain numerical stability in MoE routing and is a correct fix for the reported accuracy issue. The changes are clear, concise, and effectively address the problem. I find no issues with this revert.

@mgoin
Copy link
Copy Markdown
Member

mgoin commented Feb 18, 2026

What accuracy regression did it cause and for what model?

@roikoren755
Copy link
Copy Markdown
Contributor Author

What accuracy regression did it cause and for what model?

#34704 caught the accuracy degradation for Nemotron 3 Nano FP8, which dropped 4-5%. Running a "standalone" GSM8K the score drops by about 2%. It seems like the BF16 model's accuracy also got a hit, but it stayed inside the tolerance of the test.

@mgoin mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed nvidia labels Feb 19, 2026
@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Feb 19, 2026
@vllm-bot vllm-bot merged commit 3eff45d into vllm-project:main Feb 19, 2026
49 of 56 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Feb 19, 2026
@roikoren755 roikoren755 deleted the feat/nemotronh-router-revert branch February 22, 2026 09:33
jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026
…34582)" (vllm-project#34808)

Signed-off-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
…34582)" (vllm-project#34808)

Signed-off-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
…34582)" (vllm-project#34808)

Signed-off-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
…34582)" (vllm-project#34808)

Signed-off-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
…34582)" (vllm-project#34808)

Signed-off-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
…34582)" (vllm-project#34808)

Signed-off-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants