Conversation
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
📝 WalkthroughWalkthroughThis pull request adds token-level MOE dispatcher configuration to multiple Qwen3 pretraining model configurations. Specifically, Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/performance/configs/qwen/qwen3_llm_pretrain.py (1)
78-84:⚠️ Potential issue | 🟠 Major
moe_token_dispatcher_typeis not set in the gb300 config for qwen3_235b, unlike all other sibling configs.Every other
qwen3_235b_a22b_pretrain_config_*andqwen3_30b_a3b_pretrain_config_*function explicitly setscfg.model.moe_token_dispatcher_type(to either"flex"or"alltoall"). This function setsmoe_flex_dispatcher_backendbut omits the dispatcher type, so it will silently use whatever default the framework provides.If this is intentional, a comment explaining the omission would help. Otherwise, it likely needs
"flex"to match the gb200 variant:Proposed fix
cfg.model.moe_flex_dispatcher_backend = base_cfg.moe_flex_dispatcher_backend + cfg.model.moe_token_dispatcher_type = "flex" set_qwen3_common_configs(cfg)As per coding guidelines: "Do not add arbitrary defaults for configs, be as explicit as possible."
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit