Add MimoOptimizer for heterogeneous parallelism by yashaswikarnati · Pull Request #4019 · NVIDIA/Megatron-LM

yashaswikarnati · 2026-03-24T18:30:20Z

Summary

Replaces #3212 (closed when base branch pull-request/3211 was deleted after #3211 merged).

Adds optimizer support for MIMO models where different modules (encoder, LLM) can have different DP/TP/PP configurations.

MimoOptimizer class managing per-module MegatronOptimizer instances
Global gradient norm via all_reduce MAX across module boundaries
Module-aware gradient clipping using the global norm
Module-keyed state dicts for checkpointing
intra_dist_opt group spans full module world ["tp","cp","ep","pp","dp"] matching standard Megatron's intra_distributed_optimizer_instance_group
Assert num_distributed_optimizer_instances == 1 (multi-instance not yet supported)
HyperCommGrid.is_current_rank_in_grid() helper

Test plan

Unit tests pass (test_mimo_optimizer.py)
2-GPU integration test (test_baseline_2gpu)
4-GPU integration test (test_lm_pp3_4gpu)
8-GPU integration tests (test_encoder_tp2_llm_tp2_pp3_8gpu, test_full_pp_8gpu)

🤖 Generated with Claude Code

copy-pr-bot · 2026-03-24T18:30:25Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-03-24T18:30:31Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

yashaswikarnati · 2026-03-24T19:18:42Z

/claude review

megatron/core/models/mimo/optimizer.py

yashaswikarnati · 2026-03-24T19:28:31Z

/ok to test e48ecde

yashaswikarnati · 2026-03-24T19:33:44Z

/claude review

claude

LGTM

megatron/core/models/mimo/optimizer.py

Adds optimizer support for MIMO models where different modules (encoder, LLM) can have different DP/TP/PP configurations. - MimoOptimizer class managing per-module MegatronOptimizer instances - Global gradient norm via all_reduce MAX across module boundaries - Module-aware gradient clipping using the global norm - Module-keyed state dicts for checkpointing - intra_dist_opt group spans full module world ["tp","cp","ep","pp","dp"] matching standard Megatron's intra_distributed_optimizer_instance_group - Assert num_distributed_optimizer_instances == 1 (multi-instance not yet supported) - HyperCommGrid.is_current_rank_in_grid() helper - Optimizer integrated into existing 1F1B schedule tests (8-GPU) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yashaswikarnati · 2026-03-24T23:27:00Z

/ok to test bfb508d

svcnvidia-nemo-ci · 2026-03-25T00:20:52Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23518683621

yashaswikarnati requested review from a team as code owners March 24, 2026 18:30

svcnvidia-nemo-ci marked this pull request as draft March 24, 2026 18:30

yashaswikarnati force-pushed the yash/mimo-optimizer-pr branch from 9659951 to 6a8ff0c Compare March 24, 2026 18:46

yashaswikarnati marked this pull request as ready for review March 24, 2026 18:54

svcnvidia-nemo-ci requested a review from a team March 24, 2026 18:54

svcnvidia-nemo-ci added Final Review PR is in the "final review" stage complexity: medium labels Mar 24, 2026

yaoyu-33 approved these changes Mar 24, 2026

View reviewed changes

claude bot reviewed Mar 24, 2026

View reviewed changes

megatron/core/models/mimo/optimizer.py Show resolved Hide resolved

yashaswikarnati mentioned this pull request Mar 24, 2026

Add distributed checkpoint support for non-colocated MiMo #4020

Merged

5 tasks

yashaswikarnati force-pushed the yash/mimo-optimizer-pr branch from 6a8ff0c to e48ecde Compare March 24, 2026 19:24

svcnvidia-nemo-ci added this to the Core 0.16 milestone Mar 24, 2026

yashaswikarnati force-pushed the yash/mimo-optimizer-pr branch from e48ecde to 6b743fd Compare March 24, 2026 19:33

claude bot approved these changes Mar 24, 2026

View reviewed changes

jaredcasper approved these changes Mar 24, 2026

View reviewed changes

megatron/core/models/mimo/optimizer.py Outdated Show resolved Hide resolved

svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels Mar 24, 2026

yashaswikarnati force-pushed the yash/mimo-optimizer-pr branch from 6b743fd to bfb508d Compare March 24, 2026 23:17

yashaswikarnati enabled auto-merge March 24, 2026 23:26

copy-pr-bot bot temporarily deployed to test March 24, 2026 23:27 Inactive

yashaswikarnati mentioned this pull request Mar 24, 2026

[DO NOT MERGE] Combined MiMo non-colocated changes for MBridge integration #4022

Draft

yashaswikarnati added this pull request to the merge queue Mar 25, 2026

Merged via the queue into NVIDIA:main with commit d86ba0b Mar 25, 2026
64 checks passed

yashaswikarnati deleted the yash/mimo-optimizer-pr branch March 25, 2026 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MimoOptimizer for heterogeneous parallelism#4019

Add MimoOptimizer for heterogeneous parallelism#4019
yashaswikarnati merged 1 commit intoNVIDIA:mainfrom
yashaswikarnati:yash/mimo-optimizer-pr

yashaswikarnati commented Mar 24, 2026

Uh oh!

copy-pr-bot bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

yashaswikarnati commented Mar 24, 2026

Uh oh!

Uh oh!

yashaswikarnati commented Mar 24, 2026

Uh oh!

yashaswikarnati commented Mar 24, 2026

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

yashaswikarnati commented Mar 24, 2026

Uh oh!

svcnvidia-nemo-ci commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yashaswikarnati commented Mar 24, 2026

Summary

Test plan

Uh oh!

copy-pr-bot bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

yashaswikarnati commented Mar 24, 2026

Uh oh!

Uh oh!

yashaswikarnati commented Mar 24, 2026

Uh oh!

yashaswikarnati commented Mar 24, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yashaswikarnati commented Mar 24, 2026

Uh oh!

svcnvidia-nemo-ci commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants