Skip to content

fix(mimo): adapt model layer for MCore submodule bump#2978

Merged
aroshanghias-nvd merged 1 commit intoNVIDIA-NeMo:mimo/phase4-training-rebuildfrom
yashaswikarnati:mimo/phase4-mcore-bump-fixes
Mar 25, 2026
Merged

fix(mimo): adapt model layer for MCore submodule bump#2978
aroshanghias-nvd merged 1 commit intoNVIDIA-NeMo:mimo/phase4-training-rebuildfrom
yashaswikarnati:mimo/phase4-mcore-bump-fixes

Conversation

@yashaswikarnati
Copy link
Copy Markdown
Contributor

Summary

  • Add create_pg(["tp", "cp", "ep", "pp", "dp"]) for MimoOptimizer's intra_dist_opt process group
  • Remap Bridge "llm" key to MCore MIMO_LANGUAGE_MODULE_KEY ("language") in module_to_grid_map when constructing MimoModelConfig
  • Remove language_module_key kwarg (removed from MimoModelConfig in MCore)
  • Update test assertion for remapped key

Test plan

  • Existing MIMO unit tests pass (159/162, 3 pre-existing failures)
  • E2e training test passes on 8 GPUs

🤖 Generated with Claude Code

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Bump 3rdparty/Megatron-LM to combined MiMo commit (10b3ddd4d) which
includes MimoOptimizer, distributed checkpoint, and bridge 2D tensor fix.

Adapt Bridge to MCore API changes:
- Migrate all "llm" keys to MIMO_LANGUAGE_MODULE_KEY ("language")
- Add create_pg(["tp", "cp", "ep", "pp", "dp"]) for MimoOptimizer
- Add module_output_ndim for correct 2D/3D tensor routing
- Remove language_module_key kwarg (removed from MimoModelConfig)
- Use MIMO_LANGUAGE_MODULE_KEY instead of removed role.language_module_name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yashaswikarnati yashaswikarnati force-pushed the mimo/phase4-mcore-bump-fixes branch from a1cca33 to d60201f Compare March 25, 2026 07:31
@aroshanghias-nvd aroshanghias-nvd merged commit f845bf5 into NVIDIA-NeMo:mimo/phase4-training-rebuild Mar 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants