Skip to content

cp: Update Qwen3 235B A22B MXFP8 GB200/300 recipe and resolve NaN grad norm (2209) into r0.3.0#2210

Merged
ko3n1g merged 1 commit intor0.3.0from
cherry-pick-2209-r0.3.0
Feb 6, 2026
Merged

cp: Update Qwen3 235B A22B MXFP8 GB200/300 recipe and resolve NaN grad norm (2209) into r0.3.0#2210
ko3n1g merged 1 commit intor0.3.0from
cherry-pick-2209-r0.3.0

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Feb 4, 2026

beep boop [🤖]: Hi @dingqingy-nv 👋,

we've cherry picked #2209 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

  • Chores
    • Updated pretraining configurations for model training to adjust parallelization and GPU execution parameters, optimizing training efficiency.

…rm (#2209)

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@ko3n1g
Copy link
Copy Markdown
Contributor Author

ko3n1g commented Feb 4, 2026

/ok to test cecb77e

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

Modified Qwen3 model pretraining configurations on GB300 and GB200 GPUs. Removed virtual pipeline model parallel sizing, increased expert model parallel size to 32, and enabled CUDA graph optimization for attention, MoE router, and MoE preprocessing operations.

Changes

Cohort / File(s) Summary
Qwen3 Config Updates
scripts/performance/configs/qwen/qwen3_workload_base_configs.py
Removed virtual_pipeline_model_parallel_size=12, increased expert_model_parallel_size from 16 to 32 for GB300 config, set expert_model_parallel_size=32 for GB200 config, and added cuda_graph_scope=["attn", "moe_router", "moe_preprocess"] for GPU optimization.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • Megatron-Bridge#2209: Modifies identical QWEN3 configurations in the same file with overlapping parameter changes (removing virtual pipeline parallelism, increasing expert model parallelism to 32, and adding CUDA graph scope).

Suggested reviewers

  • dingqingy-nv
  • thomasdhc
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR contains major changes affecting model numerics (NaN gradient norm resolution) and performance (expert parallelism and cuda_graph optimization) but lacks test results and validation metrics in the description. Add test results, performance metrics, and numerical stability validation demonstrating the NaN issue resolution to the PR description.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: updating Qwen3 235B A22B MXFP8 recipe for GB200/300 and resolving a NaN grad norm issue, which aligns with the configuration modifications shown in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cherry-pick-2209-r0.3.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ko3n1g ko3n1g merged commit d7a13b1 into r0.3.0 Feb 6, 2026
18 of 20 checks passed
@ko3n1g ko3n1g deleted the cherry-pick-2209-r0.3.0 branch February 6, 2026 18:15
ko3n1g added a commit that referenced this pull request Feb 7, 2026
…e NaN grad norm (2209)` into `r0.3.0` (#2210)"

This reverts commit d7a13b1.
ko3n1g added a commit that referenced this pull request Feb 8, 2026
…ve NaN grad norm (2209)` into `r0.3.0` (#2210)"

This reverts commit 34aec47.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants