Revert "[NemotronH] Do not force router to run in fp32 (#34582)"#34808
Revert "[NemotronH] Do not force router to run in fp32 (#34582)"#34808vllm-bot merged 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Roi Koren <roik@nvidia.com>
916b6e6 to
c7c5142
Compare
There was a problem hiding this comment.
Code Review
This pull request reverts a previous change that introduced an accuracy degradation in the NemotronH model. The changes force the Mixture-of-Experts (MoE) router to operate in float32 precision. This is achieved by explicitly setting the gate layer's parameters to float32 and casting the input hidden states to float32 before the router computation. This is a standard practice to maintain numerical stability in MoE routing and is a correct fix for the reported accuracy issue. The changes are clear, concise, and effectively address the problem. I find no issues with this revert.
|
What accuracy regression did it cause and for what model? |
#34704 caught the accuracy degradation for Nemotron 3 Nano FP8, which dropped 4-5%. Running a "standalone" GSM8K the score drops by about 2%. It seems like the BF16 model's accuracy also got a hit, but it stayed inside the tolerance of the test. |
…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
…34582)" (vllm-project#34808) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
Purpose
#34582 introduced an accuracy degradation. Working with @robertgshaw2-redhat for a better implementation of this performance improvement in #34302
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.