fix: allow zero grad norm in dtensor policies for consistency with Megatron by smahdavi4 · Pull Request #1618 · NVIDIA-NeMo/RL

smahdavi4 · 2025-12-09T20:37:28Z

What does this PR do ?

Currently Megatron only accepts float/int for grad norm. To disable grad norm, Dtensor needs None while megatron needs zero. Adding zero to dtensor as well to allow for a consistent grad norm clipping usage.

Summary by CodeRabbit

Bug Fixes
- Improved gradient clipping validation to correctly handle edge cases when the maximum gradient norm is configured to zero or negative values, preventing unintended clipping behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-12-09T20:38:00Z

ℹ️ File Consistency Check

Check based on commit: 8c1ab0a (PR #1618 from allow-zero-grad-norm)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

coderabbitai · 2025-12-09T20:40:13Z

📝 Walkthrough

Walkthrough

Two policy worker files are updated to add an additional guard condition to gradient clipping logic in the train() method, requiring max_grad_norm to be positive (greater than 0) in addition to being non-None.

Changes

Cohort / File(s)	Summary
Gradient clipping guard condition tightened `nemo_rl/models/policy/workers/dtensor_policy_worker.py`, `nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py`	Modified the condition for gradient clipping from `if max_grad_norm is not None` to `if max_grad_norm is not None and max_grad_norm > 0` to prevent clipping when max_grad_norm is zero or negative

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR modifies gradient clipping behavior but lacks test results or validation demonstrating no convergence regression.	Add test results or convergence comparison data to PR description showing the change does not cause training regressions.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title clearly and specifically describes the main change: allowing zero grad norm in dtensor policies for consistency with Megatron. It directly matches the core objective of the pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

terrykong

@joyang-nv to review

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

github-actions · 2026-01-13T21:17:17Z

ℹ️ File Consistency Check

Check based on commit: 6ceb345 (PR #1618 from allow-zero-grad-norm)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

github-actions · 2026-01-13T21:19:33Z

ℹ️ File Consistency Check

Check based on commit: 36874c6 (PR #1618 from allow-zero-grad-norm)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-01-13T21:26:24Z

ℹ️ File Consistency Check

Check based on commit: 316127f (PR #1618 from allow-zero-grad-norm)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

joyang-nv

Thanks for unifying. Can you help to add a unit test for this as following up PR?

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

github-actions · 2026-01-14T14:08:21Z

ℹ️ File Consistency Check

Check based on commit: 030e6e0 (PR #1618 from allow-zero-grad-norm)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

smahdavi4 · 2026-01-14T15:30:38Z

Thank you! I just fixed the linting issue, and the PR is ready to merge. I'll send a follow-up PR with test cases later today once this is merged.

smahdavi4 requested review from a team as code owners December 9, 2025 20:37

terrykong reviewed Dec 9, 2025

View reviewed changes

terrykong requested a review from joyang-nv December 9, 2025 20:41

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Dec 9, 2025

terrykong temporarily deployed to nemo-ci December 9, 2025 20:41 — with GitHub Actions Inactive

terrykong changed the title ~~Allow zero grad norm for consistency with Megatron~~ fix: allow zero grad norm for consistency with Megatron Dec 9, 2025

terrykong changed the title ~~fix: allow zero grad norm for consistency with Megatron~~ fix: allow zero grad norm in dtensor policies for consistency with Megatron Dec 9, 2025

terrykong temporarily deployed to nemo-ci December 9, 2025 20:45 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci December 10, 2025 00:17 — with GitHub Actions Inactive

smahdavi4 mentioned this pull request Dec 10, 2025

Update nemo-rl to latest NVIDIA-NeMo/Skills#1087

Merged

allow zero grad norm for consistency with megatron

6ceb345

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

smahdavi4 force-pushed the allow-zero-grad-norm branch from 8c1ab0a to 6ceb345 Compare January 13, 2026 21:16

fix

36874c6

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

fix

de24b4b

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

joyang-nv previously approved these changes Jan 14, 2026

View reviewed changes

linting fix

030e6e0

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

smahdavi4 dismissed joyang-nv’s stale review via 030e6e0 January 14, 2026 14:08

smahdavi4 force-pushed the allow-zero-grad-norm branch from 316127f to 030e6e0 Compare January 14, 2026 14:08

smahdavi4 closed this Jan 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow zero grad norm in dtensor policies for consistency with Megatron#1618

fix: allow zero grad norm in dtensor policies for consistency with Megatron#1618
smahdavi4 wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
smahdavi4:allow-zero-grad-norm

smahdavi4 commented Dec 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

coderabbitai bot commented Dec 9, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

terrykong left a comment •

edited

Loading

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

joyang-nv left a comment

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

smahdavi4 commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

smahdavi4 commented Dec 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Summary by CodeRabbit

Uh oh!

github-actions bot commented Dec 9, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

coderabbitai bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

terrykong left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 13, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Jan 13, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Jan 13, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

joyang-nv left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 14, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

smahdavi4 commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

smahdavi4 commented Dec 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 9, 2025 •

edited

Loading

terrykong left a comment •

edited

Loading