Introduce refactored model builder abstractions by maanug-nv · Pull Request #2241 · NVIDIA-NeMo/Megatron-Bridge

maanug-nv · 2026-02-05T18:25:18Z

What does this PR do ?

This PR refactors the ModelProviderMixin to split up responsibilities.

This refactor preserves the existing level of model customizability as well as conversion compatibility, while separating model configuration and initialization/building into separate components. This architecture was determined to be more comprehensible based on several sources of feedback.
In particular, a custom model type can define one or both of:

A serializable configuration object encapsulating all config settings for the model. This may be a shallow dataclass hierarchy.
A builder object that determines how the model is initialized for distributed training or inference using aforementioned config.

Base model architectures like GPT and Mamba will follow this design.

These new abstractions are all new classes rather than direct modifications to ModelProviderMixin to ease migration. Additionally, ModelProviderMixin relied on the inheritance from TransformerConfig, modified some config attributes during model building, and contained some unused code. Therefore, I think we should deprecate the ModelProviderMixin after integrating the new abstractions across the codebase.

The new classes are:

ModelConfig - Has some required attributes, including import path of builder. Supports serialization to and from dictionary.
ModelBuilder - Contains stub methods for implementation of (distributed) model init. Also maintains pre-wrap and post-wrap hook registration from ModelProviderMixin which is used by PEFT. Usage of the hooks is left up to the child class.

This PR also includes the necessary changes for the Mamba base model to support this new interface. The refactor was easier to demonstrate on Mamba than on GPT. 'mamba/mamba_builder.py' contains:

MambaModelConfig which encapsulates all config settings used to create an MCore Mamba model.
MambaModelBuilder which defines how a distributed Mamba model should be built.

Since distributed model initialization is identical for GPT and Mamba models, the logic is extracted into a separate function under 'unimodal.py', which can be called by GPTModelBuilder in the future. The logic is also split into helpers for readability.

I plan to upstream all new files under src/ in this PR to MegatronLM in the near future. It may also be worthwhile to push the base model configs, e.g. MambaModelConfig to MCore eventually use directly with the MCore model. Additionally, if TransformerConfig is broken up in the future, the model-specific config can remain at the top-level of the hierarchy.

Some additional notes:

Advanced use cases may need to customize how the model wrapping/distributed model initialization is done, eg MIMO. Therefore, it makes more sense for that functionality to be defined in the ModelBuilder, so that both the single-stage building logic and distributed building logic can be overriden from the same place.
With the functional helpers in unimodal.py, these abstractions should appear as a simple layer on top of existing MLM code like get_model() and mamba_builders.py.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Release Notes

New Features
- Introduced a model configuration and building framework supporting distributed training with DDP/FSDP options.
- Added Mamba model builder with configurable stack specifications.
- Added hook system for customizing model wrapping behavior.
- Enabled virtual pipeline staging support for models.
Refactor
- Consolidated distributed model utilities into shared module.

copy-pr-bot · 2026-02-05T18:25:22Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

src/megatron/bridge/models/common.py

Signed-off-by: Maanu Grover <maanug@nvidia.com>

src/megatron/bridge/models/common/base.py

Signed-off-by: Maanu Grover <maanug@nvidia.com>

src/megatron/bridge/models/common/base.py

ananthsub · 2026-02-20T03:08:30Z

src/megatron/bridge/models/common/base.py

+        pg_collection: ProcessGroupCollection,
+        ddp_config: DistributedDataParallelConfig | None = None,
+        overlap_param_gather_with_optimizer_step: bool = False,
+        use_megatron_fsdp: bool = False,
+        use_torch_fsdp2: bool = False,
+        wrap_with_ddp: bool = True,
+        data_parallel_random_init: bool = True,
+        mixed_precision_wrapper: Callable[[Any, MegatronModule], MegatronModule] | None = Float16Module,
+        model_type: ModelType = ModelType.encoder_or_decoder,


i see that this is the translation for the existing setup, but how do these args generalize for the MIMO use case?

mimo can take in pg collection and ddp config as dict for submodules? I dont want to over-design to fit mimo

i think signature can be a little flexible until it's integrated into training loop, which will be in follow-up PR(s).

@yashaswikarnati do you have any feedback on how this signature can be a bit better for MIMO without changing too much?

Sorry just coming back to this. I see that in the signature we accept only one pg_collection. do we plan to have a different signature for multimodal.

I dont think thats overdesign - I would say thats the bare minimum to achieve the functionality.

Signed-off-by: Maanu Grover <maanug@nvidia.com>

src/megatron/bridge/models/common/base.py

Signed-off-by: Maanu Grover <maanug@nvidia.com>

maanug-nv changed the base branch from maanug/provider-refactor-mamba to main February 10, 2026 18:08

yaoyu-33 reviewed Feb 12, 2026

View reviewed changes

src/megatron/bridge/models/common.py Outdated Show resolved Hide resolved

yaoyu-33 reviewed Feb 12, 2026

View reviewed changes

src/megatron/bridge/models/common.py Outdated Show resolved Hide resolved

yaoyu-33 reviewed Feb 12, 2026

View reviewed changes

src/megatron/bridge/models/common.py Outdated Show resolved Hide resolved

maanug-nv added 21 commits February 18, 2026 00:09

add serialization protocol

bb6a3c8

Signed-off-by: Maanu Grover <maanug@nvidia.com>

add base class for extra model config

a7460d0

Signed-off-by: Maanu Grover <maanug@nvidia.com>

add base class for model builder

abbd7a1

Signed-off-by: Maanu Grover <maanug@nvidia.com>

distributed wrapper helper and placeholders

01ba610

Signed-off-by: Maanu Grover <maanug@nvidia.com>

add common model provider

2860127

Signed-off-by: Maanu Grover <maanug@nvidia.com>

add mamba-specific implementations

ca11416

Signed-off-by: Maanu Grover <maanug@nvidia.com>

update base classes

882ce1f

Signed-off-by: Maanu Grover <maanug@nvidia.com>

update mamba impls

70e5bce

Signed-off-by: Maanu Grover <maanug@nvidia.com>

support recursive seralization of config

eac5660

Signed-off-by: Maanu Grover <maanug@nvidia.com>

cleanup

07e9385

Signed-off-by: Maanu Grover <maanug@nvidia.com>

revise docstrings

b267524

Signed-off-by: Maanu Grover <maanug@nvidia.com>

rename and note about state

31d40ba

Signed-off-by: Maanu Grover <maanug@nvidia.com>

revise generic provider

215c602

Signed-off-by: Maanu Grover <maanug@nvidia.com>

flesh out distributed model init more

75eccfa

Signed-off-by: Maanu Grover <maanug@nvidia.com>

do ddp wrapping

d12505e

Signed-off-by: Maanu Grover <maanug@nvidia.com>

not suitable as a dataclass

5762378

Signed-off-by: Maanu Grover <maanug@nvidia.com>

add set/get attribute overrides

53a0f4a

Signed-off-by: Maanu Grover <maanug@nvidia.com>

restructure into dir

eefc669

Signed-off-by: Maanu Grover <maanug@nvidia.com>

reorganize llm distributed model building

ef5fc68

Signed-off-by: Maanu Grover <maanug@nvidia.com>

move helper functions

f5a50cd

Signed-off-by: Maanu Grover <maanug@nvidia.com>

implement pre+post-wrap hook system

f5707f5

Signed-off-by: Maanu Grover <maanug@nvidia.com>

maanug-nv force-pushed the maanug/provider-refactor-mamba-simpler branch from f561800 to f5707f5 Compare February 18, 2026 08:09

maanug-nv changed the title ~~Alternate impl of model provider refactor~~ Introduce refactored model builder abstractions Feb 18, 2026

maanug-nv commented Feb 18, 2026

View reviewed changes

src/megatron/bridge/models/common/base.py Outdated Show resolved Hide resolved

update docstrings and typehints

e1b3951

Signed-off-by: Maanu Grover <maanug@nvidia.com>

maanug-nv requested review from yaoyu-33 and yashaswikarnati February 19, 2026 07:02

add unit tests for mamba builder and config

f58e16e

Signed-off-by: Maanu Grover <maanug@nvidia.com>

Phlip79 previously approved these changes Feb 20, 2026

View reviewed changes

ananthsub reviewed Feb 20, 2026

View reviewed changes

add unit tests for base classes

267e63e

Signed-off-by: Maanu Grover <maanug@nvidia.com>

maanug-nv dismissed Phlip79’s stale review via 267e63e February 20, 2026 05:21

copy-pr-bot bot temporarily deployed to nemo-ci February 20, 2026 05:21 Inactive

copy-pr-bot bot temporarily deployed to test February 20, 2026 05:21 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 20, 2026 05:35 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 20, 2026 05:42 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 20, 2026 05:53 Inactive

yaoyu-33 reviewed Feb 20, 2026

View reviewed changes

src/megatron/bridge/models/common/base.py Outdated Show resolved Hide resolved

maanug-nv added 4 commits February 20, 2026 12:14

rename to_dict()->as_dict()

08286af

Signed-off-by: Maanu Grover <maanug@nvidia.com>

move wrap hook lists to config

8419531

Signed-off-by: Maanu Grover <maanug@nvidia.com>

make model config not abstract

29e8e40

Signed-off-by: Maanu Grover <maanug@nvidia.com>

add unit tests for unimodal distributed init

ae82da6

Signed-off-by: Maanu Grover <maanug@nvidia.com>

yaoyu-33 approved these changes Feb 23, 2026

View reviewed changes

maanug-nv mentioned this pull request Feb 24, 2026

Enable custom model flops calculation NVIDIA/Megatron-LM#3552

Open

maanug-nv mentioned this pull request Mar 6, 2026

Introduce refactored GPT builder #2671

Merged

5 tasks

maanug-nv mentioned this pull request Mar 14, 2026

Integrate ModelConfig & ModelBuilder interface into training loop #2798

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce refactored model builder abstractions#2241

Introduce refactored model builder abstractions#2241
maanug-nv merged 33 commits intomainfrom
maanug/provider-refactor-mamba-simpler

maanug-nv commented Feb 5, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ananthsub Feb 20, 2026

Uh oh!

yaoyu-33 Feb 20, 2026 •

edited

Loading

Uh oh!

maanug-nv Feb 20, 2026

Uh oh!

yashaswikarnati Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

maanug-nv commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ananthsub Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maanug-nv Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

yashaswikarnati Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

maanug-nv commented Feb 5, 2026 •

edited

Loading

yaoyu-33 Feb 20, 2026 •

edited

Loading