Update Qwen3 235B A22B MXFP8 GB200/300 recipe and resolve NaN grad norm#2209
Conversation
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
📝 WalkthroughWalkthroughThis PR updates QWEN3 model pretraining workload configurations, adjusting expert model parallelism from 16 to 32, removing virtual pipeline model parallelism settings, and adding CUDA graph optimization scopes for attention and mixture-of-experts operations. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~4 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/ok to test 0a8447a |
…rm (#2209) Signed-off-by: Dingqing Yang <dingqingy@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
…rm (NVIDIA-NeMo#2209) Signed-off-by: Dingqing Yang <dingqingy@nvidia.com> Signed-off-by: sowmen <sowmendipta@gmail.com>
…rm (NVIDIA-NeMo#2209) Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
What does this PR do ?
Additional Information
Summary by CodeRabbit