fix: allow zero grad norm in dtensor policies for consistency with Megatron#1618
fix: allow zero grad norm in dtensor policies for consistency with Megatron#1618smahdavi4 wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
Conversation
ℹ️ File Consistency CheckCheck based on commit: 8c1ab0a (PR #1618 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
📝 WalkthroughWalkthroughTwo policy worker files are updated to add an additional guard condition to gradient clipping logic in the Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
8c1ab0a to
6ceb345
Compare
ℹ️ File Consistency CheckCheck based on commit: 6ceb345 (PR #1618 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
ℹ️ File Consistency CheckCheck based on commit: 36874c6 (PR #1618 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
ℹ️ File Consistency CheckCheck based on commit: 316127f (PR #1618 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
joyang-nv
left a comment
There was a problem hiding this comment.
Thanks for unifying. Can you help to add a unit test for this as following up PR?
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
316127f to
030e6e0
Compare
ℹ️ File Consistency CheckCheck based on commit: 030e6e0 (PR #1618 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
|
Thank you! I just fixed the linting issue, and the PR is ready to merge. I'll send a follow-up PR with test cases later today once this is merged. |
What does this PR do ?
Currently Megatron only accepts float/int for grad norm. To disable grad norm, Dtensor needs None while megatron needs zero. Adding zero to dtensor as well to allow for a consistent grad norm clipping usage.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.