fix: call apply_flex_dispatcher_backend with correct backend in MoE pretrain recipes by yaoyu-33 · Pull Request #2288 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-02-09T21:41:23Z

Summary

Fixes the flex dispatcher backend setup in MoE pretrain recipes. Previously, apply_flex_dispatcher_backend was either called with None (which returned early doing nothing) or not called at all in pretrain configs, leaving moe_token_dispatcher_type as "alltoall" instead of "flex".

Changes

DSv3 (deepseek_v3.py): Set moe_flex_dispatcher_backend="hybridep" and call apply_flex_dispatcher_backend at end of both pretrain configs
Qwen3 MoE (qwen3_moe.py): Set moe_flex_dispatcher_backend="deepep" and call apply_flex_dispatcher_backend at end of both pretrain configs (30B and 235B)
Qwen3-Next (qwen3_next.py): Set moe_flex_dispatcher_backend="deepep" and call apply_flex_dispatcher_backend at end of pretrain config
Performance overrides (overrides.py): Ensure moe_token_dispatcher_type="flex" and moe_shared_expert_overlap=False when flex backend is set

The apply_flex_dispatcher_backend call is placed at the end of each config so it can override the fallback defaults (moe_token_dispatcher_type="alltoall", moe_shared_expert_overlap) while also performing GPU hardware compatibility validation.

Backend assignments per Malay's guidance

hybridep: all DSv3 configs
deepep: Qwen3 MoE and Qwen3-Next pretrain configs

Summary by CodeRabbit

Release Notes

Chores
- Improved consistency in Mixture of Experts (MoE) token dispatcher configuration across model recipes (DeepSeek V3, Qwen3, and Qwen3-MoE).
- Optimized the timing of dispatcher backend application in the model configuration initialization flow for better reliability.

…retrain recipes - DSv3: set moe_flex_dispatcher_backend='hybridep', call apply_flex_dispatcher_backend at end of pretrain config for hardware validation and proper flex dispatcher setup - Qwen3 MoE (30B, 235B): set moe_flex_dispatcher_backend='deepep', call apply_flex_dispatcher_backend at end of pretrain config - Qwen3-Next: set moe_flex_dispatcher_backend='deepep', call apply_flex_dispatcher_backend at end of pretrain config - Add fallback comments noting moe_token_dispatcher_type and moe_shared_expert_overlap may be overridden by apply_flex_dispatcher_backend Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot · 2026-02-09T21:41:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-09T21:45:38Z

📝 Walkthrough

Walkthrough

Three recipe configuration files are updated to defer MoE dispatcher backend application to the end of the configuration process instead of early initialization. Default backend values are shifted from None/deepep to hybridep, with centralized apply_flex_dispatcher_backend calls added at the end of configuration functions and inline comments added to clarify override behavior.

Changes

Cohort / File(s)	Summary
MoE Dispatcher Backend Refactoring `src/megatron/bridge/recipes/deepseek/deepseek_v3.py`, `src/megatron/bridge/recipes/qwen/qwen3_moe.py`, `src/megatron/bridge/recipes/qwen/qwen3_next.py`	Removed early `apply_flex_dispatcher_backend` calls and deferred backend application to end of configuration. Updated default `moe_flex_dispatcher_backend` values and added inline comments explaining potential override behavior and configuration application order.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly and clearly describes the main change: calling apply_flex_dispatcher_backend with the correct backend in MoE pretrain recipes across multiple model configs.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes	✅ Passed	PR applies established apply_flex_dispatcher_backend function (22 existing uses) across 3 recipe files (~26 lines changed), constituting a low-risk targeted bug fix of configuration omissions.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yuya/fix-flex-dispatcher-backend-recipes

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments

src/megatron/bridge/recipes/deepseek/deepseek_v3.py (1)

171-172: Note: moe_shared_expert_overlap = True will be silently overridden to False.

apply_flex_dispatcher_backend (Line 196) unconditionally sets moe_shared_expert_overlap = False. The comment on Line 171 hints at this, but the initial True value on Line 172 is effectively dead code on compatible GPUs. Consider setting it to False here as well to avoid confusion for readers who don't trace through the override, or add a more explicit comment.

This is a readability nit — no functional issue.

Also applies to: 196-197

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…retrain recipes (#2288) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

malay-nagda approved these changes Feb 11, 2026

View reviewed changes

ko3n1g merged commit 7b5a60a into main Feb 11, 2026
2 checks passed

ko3n1g deleted the yuya/fix-flex-dispatcher-backend-recipes branch February 11, 2026 19:04

coderabbitai bot mentioned this pull request Feb 12, 2026

fix: Perf configs after refactoring #2357

Merged

5 tasks

ko3n1g pushed a commit that referenced this pull request Feb 24, 2026

fix: call apply_flex_dispatcher_backend with correct backend in MoE p…

199723c

…retrain recipes (#2288) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

ko3n1g mentioned this pull request Feb 24, 2026

260201: Cherrypick various changes #2509

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: call apply_flex_dispatcher_backend with correct backend in MoE pretrain recipes#2288

fix: call apply_flex_dispatcher_backend with correct backend in MoE pretrain recipes#2288
ko3n1g merged 1 commit intomainfrom
yuya/fix-flex-dispatcher-backend-recipes

yaoyu-33 commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yaoyu-33 commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Backend assignments per Malay's guidance

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yaoyu-33 commented Feb 9, 2026 •

edited by coderabbitai bot

Loading