Skip to content

fix: all2all for qwen3-next H100#2479

Merged
ko3n1g merged 1 commit intomainfrom
ko3n1g/fix/qwen3-next
Feb 23, 2026
Merged

fix: all2all for qwen3-next H100#2479
ko3n1g merged 1 commit intomainfrom
ko3n1g/fix/qwen3-next

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Feb 22, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Chores
    • Updated model configuration to optimize token distribution for H100 hardware variants, enhancing efficiency during large-scale pretraining operations.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 22, 2026

📝 Walkthrough

Walkthrough

A configuration parameter cfg.model.moe_token_dispatcher_type is set to "alltoall" for the H100 variant of the qwen3_next_80b_a3b pretrain configuration. This adds a single line to an existing configuration file.

Changes

Cohort / File(s) Summary
Qwen3 Pretrain Configuration
scripts/performance/configs/qwen/qwen3_llm_pretrain.py
Added moe_token_dispatcher_type assignment set to "alltoall" for H100 variant configuration path.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

  • fix: perf configs 2 #2393: Modifies the same config field cfg.model.moe_token_dispatcher_type in the same file but sets it to "flex" for multiple Qwen3 variants, indicating related MOE dispatcher configuration updates.

Suggested reviewers

  • yaoyu-33
  • malay-nagda
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: all2all for qwen3-next H100' directly matches the main change: adding the 'alltoall' moe_token_dispatcher_type configuration for the H100 variant of qwen3-next.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes ✅ Passed Single-line configuration fix aligning H100 variant with existing patterns in related configurations; minor changes do not require test result documentation per check criteria.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ko3n1g/fix/qwen3-next

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/performance/configs/qwen/qwen3_llm_pretrain.py (1)

337-344: Explicitly set moe_token_dispatcher_type for all qwen3_next_80b_a3b GPU variants for consistency.

All qwen3_235b_a22b and qwen3_30b_a3b per-GPU configs explicitly set moe_token_dispatcher_type. The four qwen3_next_80b_a3b variants (gb200, b300, b200, gb300) silently rely on the default "alltoall" from qwen3_next_80b_a3b_pretrain_config(). While the H100 variant does override it explicitly, adding explicit assignments to the other four variants improves consistency, auditability, and makes platform-specific choices visible at the config level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/qwen/qwen3_llm_pretrain.py` around lines 337 -
344, The qwen3_next_80b_a3b GPU variant configs should explicitly set
cfg.moe_token_dispatcher_type instead of relying on the default from
qwen3_next_80b_a3b_pretrain_config(); update each qwen3_next_80b_a3b variant
function (the gb200, b300, b200, gb300 variants that build cfg via
qwen3_next_80b_a3b_pretrain_config()) to assign cfg.moe_token_dispatcher_type =
"<appropriate_dispatcher>" (use the same string used by the H100 variant or the
intended dispatcher like "alltoall" or whichever platform-specific choice is
required) before calling set_qwen3_next_common_configs(cfg) /
set_workload_base_configs(cfg, base_cfg) and returning cfg so the dispatcher
choice is explicit and consistent across all variants.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/performance/configs/qwen/qwen3_llm_pretrain.py`:
- Around line 337-344: The qwen3_next_80b_a3b GPU variant configs should
explicitly set cfg.moe_token_dispatcher_type instead of relying on the default
from qwen3_next_80b_a3b_pretrain_config(); update each qwen3_next_80b_a3b
variant function (the gb200, b300, b200, gb300 variants that build cfg via
qwen3_next_80b_a3b_pretrain_config()) to assign cfg.moe_token_dispatcher_type =
"<appropriate_dispatcher>" (use the same string used by the H100 variant or the
intended dispatcher like "alltoall" or whichever platform-specific choice is
required) before calling set_qwen3_next_common_configs(cfg) /
set_workload_base_configs(cfg, base_cfg) and returning cfg so the dispatcher
choice is explicit and consistent across all variants.

@ko3n1g ko3n1g added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 23, 2026
@ko3n1g ko3n1g merged commit b9c6b53 into main Feb 23, 2026
56 checks passed
@ko3n1g ko3n1g deleted the ko3n1g/fix/qwen3-next branch February 23, 2026 08:08
ko3n1g added a commit that referenced this pull request Feb 24, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g mentioned this pull request Feb 24, 2026
5 tasks
pengdurice pushed a commit to pengdurice/Megatron-Bridge that referenced this pull request Feb 24, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
copy-pr-bot bot pushed a commit that referenced this pull request Mar 19, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants