Skip to content

[model, refactor] refactor: Centralize provider_bridge config mapping in base class for VLM models#2250

Merged
yaoyu-33 merged 44 commits intomainfrom
feature/provider-bridge-refactor-3
Feb 12, 2026
Merged

[model, refactor] refactor: Centralize provider_bridge config mapping in base class for VLM models#2250
yaoyu-33 merged 44 commits intomainfrom
feature/provider-bridge-refactor-3

Conversation

@yaoyu-33
Copy link
Copy Markdown
Contributor

@yaoyu-33 yaoyu-33 commented Feb 5, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Bug Fixes

    • Removed generation_config propagation from model provider bridges to prevent configuration conflicts.
  • New Features

    • Added squared_relu activation function support for enhanced model conversion.
  • Refactor

    • Standardized model bridge registration patterns across provider implementations.
    • Refactored provider inheritance hierarchies for improved consistency.
    • Enhanced model-to-provider configuration mapping for Nemotron, Qwen, and Gemma model variants.
  • Tests

    • Updated test coverage to reflect provider and bridge architecture changes.

yaoyu-33 and others added 30 commits January 23, 2026 09:29
This refactoring centralizes model-specific configurations within the
provider_bridge method of each model bridge.

Changes:
- Add MoE-related field mappings to base class CONFIG_MAPPING:
  - num_experts -> num_moe_experts
  - num_experts_per_tok -> moe_router_topk
  - moe_intermediate_size -> moe_ffn_hidden_size

- Refactor LlamaBridge:
  - Use MEGATRON_DEFAULTS and HF_DEFAULTS class attributes
  - Override provider_bridge only for RoPE scaling (Llama 3.1/3.2)

- Refactor Qwen2Bridge:
  - Use MEGATRON_DEFAULTS (add_qkv_bias=True) and HF_DEFAULTS
  - No provider_bridge override needed

- Refactor Qwen3Bridge:
  - Use MEGATRON_DEFAULTS (qk_layernorm=True) and HF_DEFAULTS
  - No provider_bridge override needed

- Refactor Qwen3MoEBridge:
  - Use MEGATRON_DEFAULTS with MoE settings and HF_DEFAULTS
  - No provider_bridge override needed

- Update tests to expect GPTModelProvider instead of model-specific providers
- Add verification scripts for both Llama and Qwen bridges

Verified on remote server:
- Qwen/Qwen2-0.5B: PASS
- Qwen/Qwen2-7B: PASS
- Qwen/Qwen3-0.6B: PASS
- Qwen/Qwen3-1.7B: PASS
- Qwen/Qwen3-30B-A3B: PASS
…dels

- Add MLAModelProvider as unified base for Multi-Latent Attention models
- Refactor DeepSeek V2/V3 bridges to use MLAModelProvider
- Refactor Kimi K2 bridge to use MLAModelProvider
- Move model-specific defaults from providers to MEGATRON_DEFAULTS in bridges
- Add model_type parameter to @register_bridge decorator for auto HF config
- Simplify provider files to deprecated backward-compatible aliases

Verified: DeepSeek-V2-Lite, DeepSeek-V2, DeepSeek-V3, Moonlight-16B, Kimi-K2
- Register GemmaModelProvider, Gemma2ModelProvider, Gemma3ModelProvider via decorator
- Add MEGATRON_DEFAULTS to Gemma/Gemma2 bridges for explicit config defaults
- Add gelu_pytorch_tanh -> fast_gelu to ACTIVATION_MAPPING in model_bridge.py
- Add verification script for Gemma provider refactoring

Verified: gemma-2b, gemma-7b, gemma-2-2b, gemma-2-9b, gemma-2-27b,
         gemma-3-4b-it, gemma-3-12b-it, gemma-3-27b-it
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
OLMoE HF config doesn't have head_dim attribute, so kv_channels was
left as None. This fix calculates it as hidden_size // num_attention_heads
(2048 // 16 = 128 for OLMoE-1B-7B).

This follows the pattern used by MistralBridge and NemotronHBridge.
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants