Skip to content

fix: correct mcore dtype + assertion on activation_func#572

Merged
terrykong merged 1 commit intomainfrom
tk/fix-mcore-types
Jun 28, 2025
Merged

fix: correct mcore dtype + assertion on activation_func#572
terrykong merged 1 commit intomainfrom
tk/fix-mcore-types

Conversation

@terrykong
Copy link
Collaborator

This is to address a bug found when testing Nemotron-H where the params_dtype gets set incorrectly and add an assert to give a hint of what might have happened if activation_func is empty.

The issue was during conversion we use the default value of the optimizer.params_dtype as float32 here https://github.com/NVIDIA/NeMo/blob/bab66472d2f2eb05ab621dbad66ad6031e4ee19e/nemo/tron/converter/common.py#L217. But the params_dtype is supposed to be set by the bf16 and fp16 args according to how it's handled in mcore: https://github.com/NVIDIA/Megatron-LM/blob/1ab876ddc4c1893c76f26d775226a8d1dcdfb3d2/megatron/training/arguments.py#L676

This change just respects that logic.

Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added this pull request to the merge queue Jun 27, 2025
Merged via the queue into main with commit 8771995 Jun 28, 2025
13 of 14 checks passed
@terrykong terrykong deleted the tk/fix-mcore-types branch June 28, 2025 00:58
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 28, 2025
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 30, 2025
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 30, 2025
)

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 30, 2025
)

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
therealnaveenkamal pushed a commit to therealnaveenkamal/RL that referenced this pull request Jul 7, 2025
YzjiaoNvd pushed a commit to YzjiaoNvd/NeMo-RL that referenced this pull request Jul 14, 2025
)

Signed-off-by: Terry Kong <terryk@nvidia.com>
KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025
Signed-off-by: Terry Kong <terryk@nvidia.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants