Conversation
…retrain recipes - DSv3: set moe_flex_dispatcher_backend='hybridep', call apply_flex_dispatcher_backend at end of pretrain config for hardware validation and proper flex dispatcher setup - Qwen3 MoE (30B, 235B): set moe_flex_dispatcher_backend='deepep', call apply_flex_dispatcher_backend at end of pretrain config - Qwen3-Next: set moe_flex_dispatcher_backend='deepep', call apply_flex_dispatcher_backend at end of pretrain config - Add fallback comments noting moe_token_dispatcher_type and moe_shared_expert_overlap may be overridden by apply_flex_dispatcher_backend Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
📝 WalkthroughWalkthroughThree recipe configuration files are updated to defer MoE dispatcher backend application to the end of the configuration process instead of early initialization. Default backend values are shifted from None/deepep to hybridep, with centralized Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…retrain recipes (#2288) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Summary
Fixes the flex dispatcher backend setup in MoE pretrain recipes. Previously,
apply_flex_dispatcher_backendwas either called withNone(which returned early doing nothing) or not called at all in pretrain configs, leavingmoe_token_dispatcher_typeas"alltoall"instead of"flex".Changes
deepseek_v3.py): Setmoe_flex_dispatcher_backend="hybridep"and callapply_flex_dispatcher_backendat end of both pretrain configsqwen3_moe.py): Setmoe_flex_dispatcher_backend="deepep"and callapply_flex_dispatcher_backendat end of both pretrain configs (30B and 235B)qwen3_next.py): Setmoe_flex_dispatcher_backend="deepep"and callapply_flex_dispatcher_backendat end of pretrain configoverrides.py): Ensuremoe_token_dispatcher_type="flex"andmoe_shared_expert_overlap=Falsewhen flex backend is setThe
apply_flex_dispatcher_backendcall is placed at the end of each config so it can override the fallback defaults (moe_token_dispatcher_type="alltoall",moe_shared_expert_overlap) while also performing GPU hardware compatibility validation.Backend assignments per Malay's guidance
Summary by CodeRabbit
Release Notes