Skip to content

fix: call apply_flex_dispatcher_backend with correct backend in MoE pretrain recipes#2288

Merged
ko3n1g merged 1 commit intomainfrom
yuya/fix-flex-dispatcher-backend-recipes
Feb 11, 2026
Merged

fix: call apply_flex_dispatcher_backend with correct backend in MoE pretrain recipes#2288
ko3n1g merged 1 commit intomainfrom
yuya/fix-flex-dispatcher-backend-recipes

Conversation

@yaoyu-33
Copy link
Copy Markdown
Contributor

@yaoyu-33 yaoyu-33 commented Feb 9, 2026

Summary

Fixes the flex dispatcher backend setup in MoE pretrain recipes. Previously, apply_flex_dispatcher_backend was either called with None (which returned early doing nothing) or not called at all in pretrain configs, leaving moe_token_dispatcher_type as "alltoall" instead of "flex".

Changes

  • DSv3 (deepseek_v3.py): Set moe_flex_dispatcher_backend="hybridep" and call apply_flex_dispatcher_backend at end of both pretrain configs
  • Qwen3 MoE (qwen3_moe.py): Set moe_flex_dispatcher_backend="deepep" and call apply_flex_dispatcher_backend at end of both pretrain configs (30B and 235B)
  • Qwen3-Next (qwen3_next.py): Set moe_flex_dispatcher_backend="deepep" and call apply_flex_dispatcher_backend at end of pretrain config
  • Performance overrides (overrides.py): Ensure moe_token_dispatcher_type="flex" and moe_shared_expert_overlap=False when flex backend is set

The apply_flex_dispatcher_backend call is placed at the end of each config so it can override the fallback defaults (moe_token_dispatcher_type="alltoall", moe_shared_expert_overlap) while also performing GPU hardware compatibility validation.

Backend assignments per Malay's guidance

  • hybridep: all DSv3 configs
  • deepep: Qwen3 MoE and Qwen3-Next pretrain configs

Summary by CodeRabbit

Release Notes

  • Chores
    • Improved consistency in Mixture of Experts (MoE) token dispatcher configuration across model recipes (DeepSeek V3, Qwen3, and Qwen3-MoE).
    • Optimized the timing of dispatcher backend application in the model configuration initialization flow for better reliability.

…retrain recipes

- DSv3: set moe_flex_dispatcher_backend='hybridep', call apply_flex_dispatcher_backend
  at end of pretrain config for hardware validation and proper flex dispatcher setup
- Qwen3 MoE (30B, 235B): set moe_flex_dispatcher_backend='deepep', call
  apply_flex_dispatcher_backend at end of pretrain config
- Qwen3-Next: set moe_flex_dispatcher_backend='deepep', call
  apply_flex_dispatcher_backend at end of pretrain config
- Add fallback comments noting moe_token_dispatcher_type and moe_shared_expert_overlap
  may be overridden by apply_flex_dispatcher_backend

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

Three recipe configuration files are updated to defer MoE dispatcher backend application to the end of the configuration process instead of early initialization. Default backend values are shifted from None/deepep to hybridep, with centralized apply_flex_dispatcher_backend calls added at the end of configuration functions and inline comments added to clarify override behavior.

Changes

Cohort / File(s) Summary
MoE Dispatcher Backend Refactoring
src/megatron/bridge/recipes/deepseek/deepseek_v3.py, src/megatron/bridge/recipes/qwen/qwen3_moe.py, src/megatron/bridge/recipes/qwen/qwen3_next.py
Removed early apply_flex_dispatcher_backend calls and deferred backend application to end of configuration. Updated default moe_flex_dispatcher_backend values and added inline comments explaining potential override behavior and configuration application order.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the main change: calling apply_flex_dispatcher_backend with the correct backend in MoE pretrain recipes across multiple model configs.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes ✅ Passed PR applies established apply_flex_dispatcher_backend function (22 existing uses) across 3 recipe files (~26 lines changed), constituting a low-risk targeted bug fix of configuration omissions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yuya/fix-flex-dispatcher-backend-recipes

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
src/megatron/bridge/recipes/deepseek/deepseek_v3.py (1)

171-172: Note: moe_shared_expert_overlap = True will be silently overridden to False.

apply_flex_dispatcher_backend (Line 196) unconditionally sets moe_shared_expert_overlap = False. The comment on Line 171 hints at this, but the initial True value on Line 172 is effectively dead code on compatible GPUs. Consider setting it to False here as well to avoid confusion for readers who don't trace through the override, or add a more explicit comment.

This is a readability nit — no functional issue.

Also applies to: 196-197


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ko3n1g ko3n1g merged commit 7b5a60a into main Feb 11, 2026
2 checks passed
@ko3n1g ko3n1g deleted the yuya/fix-flex-dispatcher-backend-recipes branch February 11, 2026 19:04
@coderabbitai coderabbitai bot mentioned this pull request Feb 12, 2026
5 tasks
ko3n1g pushed a commit that referenced this pull request Feb 24, 2026
…retrain recipes (#2288)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@ko3n1g ko3n1g mentioned this pull request Feb 24, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants