Add the missing router_logits_dtype for nemotron_h#32875
Add the missing router_logits_dtype for nemotron_h#32875cjluo-nv wants to merge 1 commit intovllm-project:mainfrom
Conversation
Nemotron H uses FP32 linear for gate and the router logits are FP32 instead of the default BF16. This fixes the error triggered ``` (EngineCore_DP2 pid=147) return self.forward_impl_chunked( (EngineCore_DP2 pid=147) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP2 pid=147) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1780, in forward_impl_chunked (EngineCore_DP2 pid=147) assert self.batched_router_logits.dtype == full_router_logits.dtype (EngineCore_DP2 pid=147) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` Signed-off-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a dtype mismatch in the Nemotron-H model's MoE layers. By explicitly setting router_logits_dtype to self.gate.params_dtype (which is torch.float32), it ensures that the internal router logits buffer in SharedFusedMoE has the same data type as the computed router logits, thus fixing the assertion error. The change is minimal, targeted, and well-justified by the PR description. The code quality is excellent, and I have no further recommendations.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
covered by #32669 |
Purpose
Nemotron H uses FP32 linear for gate and the router logits are FP32 instead of the default BF16.
This fixes the error triggered
Test Plan
Deploy nemotron nano v3 NVFP4 checkpoint and observe the error is gone
Test Result
The error is gone.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.