[model, refactor] refactor: Centralize provider_bridge config mapping in base class for VLM models by yaoyu-33 · Pull Request #2250 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-02-05T23:22:52Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Bug Fixes
- Removed generation_config propagation from model provider bridges to prevent configuration conflicts.
New Features
- Added squared_relu activation function support for enhanced model conversion.
Refactor
- Standardized model bridge registration patterns across provider implementations.
- Refactored provider inheritance hierarchies for improved consistency.
- Enhanced model-to-provider configuration mapping for Nemotron, Qwen, and Gemma model variants.
Tests
- Updated test coverage to reflect provider and bridge architecture changes.

This refactoring centralizes model-specific configurations within the provider_bridge method of each model bridge. Changes: - Add MoE-related field mappings to base class CONFIG_MAPPING: - num_experts -> num_moe_experts - num_experts_per_tok -> moe_router_topk - moe_intermediate_size -> moe_ffn_hidden_size - Refactor LlamaBridge: - Use MEGATRON_DEFAULTS and HF_DEFAULTS class attributes - Override provider_bridge only for RoPE scaling (Llama 3.1/3.2) - Refactor Qwen2Bridge: - Use MEGATRON_DEFAULTS (add_qkv_bias=True) and HF_DEFAULTS - No provider_bridge override needed - Refactor Qwen3Bridge: - Use MEGATRON_DEFAULTS (qk_layernorm=True) and HF_DEFAULTS - No provider_bridge override needed - Refactor Qwen3MoEBridge: - Use MEGATRON_DEFAULTS with MoE settings and HF_DEFAULTS - No provider_bridge override needed - Update tests to expect GPTModelProvider instead of model-specific providers - Add verification scripts for both Llama and Qwen bridges Verified on remote server: - Qwen/Qwen2-0.5B: PASS - Qwen/Qwen2-7B: PASS - Qwen/Qwen3-0.6B: PASS - Qwen/Qwen3-1.7B: PASS - Qwen/Qwen3-30B-A3B: PASS

…dels - Add MLAModelProvider as unified base for Multi-Latent Attention models - Refactor DeepSeek V2/V3 bridges to use MLAModelProvider - Refactor Kimi K2 bridge to use MLAModelProvider - Move model-specific defaults from providers to MEGATRON_DEFAULTS in bridges - Add model_type parameter to @register_bridge decorator for auto HF config - Simplify provider files to deprecated backward-compatible aliases Verified: DeepSeek-V2-Lite, DeepSeek-V2, DeepSeek-V3, Moonlight-16B, Kimi-K2

- Register GemmaModelProvider, Gemma2ModelProvider, Gemma3ModelProvider via decorator - Add MEGATRON_DEFAULTS to Gemma/Gemma2 bridges for explicit config defaults - Add gelu_pytorch_tanh -> fast_gelu to ACTIVATION_MAPPING in model_bridge.py - Add verification script for Gemma provider refactoring Verified: gemma-2b, gemma-7b, gemma-2-2b, gemma-2-9b, gemma-2-27b, gemma-3-4b-it, gemma-3-12b-it, gemma-3-27b-it

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

OLMoE HF config doesn't have head_dim attribute, so kv_channels was left as None. This fix calculates it as hidden_size // num_attention_heads (2048 // 16 = 128 for OLMoE-1B-7B). This follows the pattern used by MistralBridge and NemotronHBridge.

…atibility

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 and others added 30 commits January 23, 2026 09:29

Merge branch 'main' into feature/provider-bridge-refactor

aa54966

update

4ade385

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove MEGATRON_DEFAULTS

3ec0b0a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove testing scripts

c100b4a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yarn fix

3c709c5

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

clean ups

9eb8542

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix

2cc3b35

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix unit tests

ca54e4f

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

unit test fix

1ed2069

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

functional test fix

9facb3e

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

code rabbit

b753ab8

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix functional tests

ab24acf

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove KimiK2Bridge from init

b6f8e29

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

clean up deprecated functional tests

167055a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Merge branch 'main' into feature/provider-bridge-refactor

22a6321

olmoe update

f9a3231

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix: always set yarn params with None defaults for MCoreGPTModel comp…

ada7d05

…atibility

Merge branch 'main' into feature/provider-bridge-refactor

5f24f9b

lint

057175f

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Merge branch 'main' into feature/provider-bridge-refactor-2

54e0971

nemotron H bridge update

6c6a3b9

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

nemotron bridge update

0c94698

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

nemotron bridge update

4d827e6

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

qwen 25 vl provider rename

0cb6cb5

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove autocast_dtype in config

eb9656f

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

update gemma3 vl and nemotron vl

07a4e6c

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot bot temporarily deployed to nemo-ci February 12, 2026 18:00 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 12, 2026 18:00 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 12, 2026 18:00 Inactive

cuichenx approved these changes Feb 12, 2026

View reviewed changes

This was referenced Feb 14, 2026

Add MiMo dense MTP models bridge support #2387

Merged

Introduce refactored model builder abstractions #2241

Merged

ko3n1g mentioned this pull request Feb 24, 2026

260201: Cherrypick various changes #2509

Merged

5 tasks

This was referenced Feb 24, 2026

Add GLM5 support #2469

Closed

ci: pr-2384 #2548

Open

[bridge, model] Qwen 3.5 VL Bridge #2530

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model, refactor] refactor: Centralize provider_bridge config mapping in base class for VLM models#2250

[model, refactor] refactor: Centralize provider_bridge config mapping in base class for VLM models#2250
yaoyu-33 merged 44 commits intomainfrom
feature/provider-bridge-refactor-3

yaoyu-33 commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yaoyu-33 commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yaoyu-33 commented Feb 5, 2026 •

edited by coderabbitai bot

Loading