[model, refactor] refactor: Centralize provider_bridge config mapping in base class for Nemotron models by yaoyu-33 · Pull Request #2225 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-02-05T00:16:52Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

New Features
- Enhanced bidirectional configuration translation between model formats
- Expanded bridge registration API with provider and model-type parameters
- Improved rotary position embedding (RoPE) scaling support across models
- Better activation function mapping capabilities
Refactor
- Unified provider architecture across multiple model implementations
- Streamlined model-to-framework configuration handling
Chores
- Removed deprecated test files validating provider equivalence

This refactoring centralizes model-specific configurations within the provider_bridge method of each model bridge. Changes: - Add MoE-related field mappings to base class CONFIG_MAPPING: - num_experts -> num_moe_experts - num_experts_per_tok -> moe_router_topk - moe_intermediate_size -> moe_ffn_hidden_size - Refactor LlamaBridge: - Use MEGATRON_DEFAULTS and HF_DEFAULTS class attributes - Override provider_bridge only for RoPE scaling (Llama 3.1/3.2) - Refactor Qwen2Bridge: - Use MEGATRON_DEFAULTS (add_qkv_bias=True) and HF_DEFAULTS - No provider_bridge override needed - Refactor Qwen3Bridge: - Use MEGATRON_DEFAULTS (qk_layernorm=True) and HF_DEFAULTS - No provider_bridge override needed - Refactor Qwen3MoEBridge: - Use MEGATRON_DEFAULTS with MoE settings and HF_DEFAULTS - No provider_bridge override needed - Update tests to expect GPTModelProvider instead of model-specific providers - Add verification scripts for both Llama and Qwen bridges Verified on remote server: - Qwen/Qwen2-0.5B: PASS - Qwen/Qwen2-7B: PASS - Qwen/Qwen3-0.6B: PASS - Qwen/Qwen3-1.7B: PASS - Qwen/Qwen3-30B-A3B: PASS

…dels - Add MLAModelProvider as unified base for Multi-Latent Attention models - Refactor DeepSeek V2/V3 bridges to use MLAModelProvider - Refactor Kimi K2 bridge to use MLAModelProvider - Move model-specific defaults from providers to MEGATRON_DEFAULTS in bridges - Add model_type parameter to @register_bridge decorator for auto HF config - Simplify provider files to deprecated backward-compatible aliases Verified: DeepSeek-V2-Lite, DeepSeek-V2, DeepSeek-V3, Moonlight-16B, Kimi-K2

- Register GemmaModelProvider, Gemma2ModelProvider, Gemma3ModelProvider via decorator - Add MEGATRON_DEFAULTS to Gemma/Gemma2 bridges for explicit config defaults - Add gelu_pytorch_tanh -> fast_gelu to ACTIVATION_MAPPING in model_bridge.py - Add verification script for Gemma provider refactoring Verified: gemma-2b, gemma-7b, gemma-2-2b, gemma-2-9b, gemma-2-27b, gemma-3-4b-it, gemma-3-12b-it, gemma-3-27b-it

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

OLMoE HF config doesn't have head_dim attribute, so kv_channels was left as None. This fix calculates it as hidden_size // num_attention_heads (2048 // 16 = 128 for OLMoE-1B-7B). This follows the pattern used by MistralBridge and NemotronHBridge.

…atibility

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot · 2026-02-05T00:16:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-05T00:28:13Z

📝 Walkthrough

Walkthrough

This PR introduces bidirectional HuggingFace↔Megatron configuration translation support in MegatronModelBridge, refactors model bridges to use unified provider classes (GPTModelProvider/MLAModelProvider), removes generation_config from providers, and removes provider configuration equivalence tests.

Changes

Cohort / File(s)	Summary
Core Bridge Infrastructure `src/megatron/bridge/models/conversion/model_bridge.py`	Adds CONFIG_MAPPING, rope scaling mappings (YARN/MLA), activation function mappings, bidirectional config conversion methods (hf_config_to_provider_kwargs, megatron_to_hf_config), PROVIDER_CLASS attribute, SOURCE_NAME/MODEL_TYPE class attributes. Enhanced register_bridge decorator to accept provider and model_type parameters. Updated provider_bridge to use hf_config_to_provider_kwargs.
DeepSeek Model Bridges `src/megatron/bridge/models/deepseek/deepseek_provider.py`, `src/megatron/bridge/models/deepseek/deepseek_v2_bridge.py`, `src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py`, `src/megatron/bridge/models/deepseek/common.py`	DeepSeek providers now inherit MLAModelProvider with deprecation warnings; bridges return MLAModelProvider instead of model-specific providers. Added provider and model_type metadata to bridge registrations. Removed generation_config propagation.
Gemma Model Bridges `src/megatron/bridge/models/gemma/gemma_bridge.py`, `src/megatron/bridge/models/gemma/gemma2_bridge.py`, `src/megatron/bridge/models/gemma/gemma3_bridge.py`	Bridges now obtain provider via super().provider_bridge() and configure in-place with Gemma-specific attributes (normalization, activation, position embedding, dropout). Added provider and model_type to decorator. Removed explicit provider construction.
GLM/Nemotron Model Bridges `src/megatron/bridge/models/glm/glm45_bridge.py`, `src/megatron/bridge/models/nemotron/nemotron_bridge.py`, `src/megatron/bridge/models/nemotronh/nemotron_h_bridge.py`, `src/megatron/bridge/models/nemotron_vl/nemotron_vl_provider.py`, `src/megatron/bridge/models/nemotronh/__init__.py`	Bridges refactored to use GPTModelProvider/MambaModelProvider; added provider and model_type to registrations. In-place provider configuration replaces explicit provider construction. Added CONFIG_MAPPING for field translations.
GPT/Llama Model Bridges `src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py`, `src/megatron/bridge/models/llama/llama_bridge.py`, `src/megatron/bridge/models/llama_nemotron/llama_nemotron_bridge.py`	Bridges updated to return GPTModelProvider with in-place configuration. Added megatron_to_hf_config classmethod to LlamaBridge. Replaced rope scaling custom logic with unified Megatron Core support. Updated rope_scaling_factor naming.
Qwen Model Bridges `src/megatron/bridge/models/qwen/qwen2_bridge.py`, `src/megatron/bridge/models/qwen/qwen3_bridge.py`, `src/megatron/bridge/models/qwen/qwen3_moe_bridge.py`, `src/megatron/bridge/models/qwen/qwen3_next_bridge.py`, `src/megatron/bridge/models/qwen_vl/qwen25_vl_bridge.py`, `src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py`	Bridges refactored to use super().provider_bridge() and configure providers in-place. Added model_type to registrations. Removed generation_config from provider instantiations.
Provider Classes `src/megatron/bridge/models/gpt_provider.py`, `src/megatron/bridge/models/mla_provider.py`, `src/megatron/bridge/models/llama/llama_provider.py`, `src/megatron/bridge/models/olmoe/olmoe_bridge.py`	Added rope scaling fields (rope_scaling, rope_scaling_factor, rotary_scaling_factor) to GPTModelProvider. Created new MLAModelProvider combining MLATransformerConfig and GPTModelProvider. Removed generation_config fields. Simplified rope scaling logic in Llama providers.
Test Removals (Functional) `tests/functional_tests/models/deepseek/test_deepseek_provider_mapping.py`, `tests/functional_tests/models/gemma/test_gemma{2,3}_provider.py`, `tests/functional_tests/models/gemma/test_gemma_provider.py`, `tests/functional_tests/models/glm/test_glm45_provider.py`, `tests/functional_tests/models/llama_nemotron/test_llama_nemotron_provider.py`, `tests/functional_tests/models/qwen/test_qwen{2,3}_provider.py`, `tests/functional_tests/models/qwen/test_qwen3_moe_provider.py`	Removed provider configuration equivalence test files that validated bridge-generated vs predefined provider configs across multiple model families.
Test Updates (Unit) `tests/unit_tests/models/deepseek/test_deepseek_{bridges,provider}.py`, `tests/unit_tests/models/glm/test_glm45_bridge.py`, `tests/unit_tests/models/gpt_oss/test_gpt_oss_bridges.py`, `tests/unit_tests/models/llama/test_llama_bridge.py`, `tests/unit_tests/models/llama_nemotron/test_llama_nemotron_bridge.py`, `tests/unit_tests/models/olmoe/test_olmoe_bridge.py`, `tests/unit_tests/models/qwen/test_qwen{3,3_moe}_bridge.py`	Updated provider type expectations from model-specific providers (e.g., Qwen3ModelProvider) to GPTModelProvider/MLAModelProvider. Added extensive tests for HF↔Megatron config conversion in LlamaBridge. Strengthened mocks with spec-based construction. Updated rope_scaling_factor references.
Test Updates (Deepseek CLI) `tests/functional_tests/models/deepseek/test_deepseek_conversion.py`	Added --trust-remote-code CLI flag to conversion test invocation.

Sequence Diagram(s)

sequenceDiagram
    participant HF as HuggingFace Config
    participant Bridge as MegatronModelBridge
    participant Translator as Config Translators
    participant Provider as Provider (GPT/MLA)
    
    HF->>Bridge: hf_config_to_provider_kwargs(hf_config)
    activate Bridge
    Bridge->>Translator: Map HF fields via CONFIG_MAPPING
    Translator-->>Bridge: Basic config dict
    Bridge->>Translator: Convert activations via hf_to_megatron_activation
    Translator-->>Bridge: Mapped activation functions
    Bridge->>Translator: Handle rope scaling (YARN/MLA)
    Translator-->>Bridge: Rope scaling params
    Bridge-->>Provider: provider_kwargs dict
    deactivate Bridge
    Provider->>Provider: Instantiate with translated kwargs
    
    Provider->>Bridge: megatron_to_hf_config(provider)
    activate Bridge
    Bridge->>Translator: Extract rope scaling from provider
    Translator-->>Bridge: Rope scaling config
    Bridge->>Translator: Convert activation via megatron_to_hf_activation
    Translator-->>Bridge: HF activation name
    Bridge->>Translator: Reverse map config fields
    Translator-->>Bridge: HF config dict
    Bridge-->>HF: Complete HF config
    deactivate Bridge

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[model, refactor] refactor: Centralize provider_bridge config mapping in base class #2052: Substantially overlaps with the base-class refactor in model_bridge.py (CONFIG_MAPPING, PROVIDER_CLASS, activation/rope mappings, hf↔megatron conversion helpers, register_bridge signature extension).
Nemotron-3-Nano Model Support #1914: Both PRs modify Nemotron-H bridge/provider surface and MoE-related config mappings, adding/adjusting provider classes and parameter mappings.

Suggested reviewers

cuichenx
ananthsub

🚥 Pre-merge checks | ✅ 1 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title mentions 'Nemotron models' but the changeset encompasses a much broader refactoring affecting Llama, Qwen, Gemma, DeepSeek, GLM, and other model families.	Revise the title to reflect the scope: e.g., 'Refactor: Centralize provider_bridge config mapping in base class across model families' or similar to accurately convey the comprehensive nature of changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.23% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR contains major refactoring of configuration mapping system and provider initialization across 15+ bridges with new public API surface, but PR description lacks test results, testing information, or documentation demonstrating no regressions.	Update PR description to include test results showing no failures, verification of provider configuration equivalence, confirmation of no performance regressions, and summary of new test coverage added.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/provider-bridge-refactor-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@src/megatron/bridge/models/conversion/model_bridge.py`:
- Around line 388-395: The yarn_params dict is populated with entries from
YARN_ROPE_SCALING_MAPPING even when rope_scaling.get(hf_key) is None, so filter
out None values: when iterating over self.YARN_ROPE_SCALING_MAPPING only assign
megatron_key -> value into yarn_params if value is not None (leave
position_embedding_type and the "truncate" -> yarn_correction_range_round_to_int
mapping as-is), then attach provider_kwargs["_yarn_params"] only when
yarn_params contains at least the position_embedding_type or other non-None
entries.

In `@src/megatron/bridge/models/gemma/gemma3_bridge.py`:
- Around line 63-67: The code assigns provider.rope_scaling_factor using
hf_config.rope_scaling["factor"] which will raise a KeyError if rope_scaling is
a dict missing "factor"; update the assignment in gemma3_bridge.py to
defensively access the key (e.g., use hf_config.rope_scaling.get("factor", 1.0)
or check for the key/None first) so provider.rope_scaling_factor defaults to 1.0
when hf_config.rope_scaling is falsy or lacks "factor".

In `@src/megatron/bridge/models/gpt_provider.py`:
- Around line 139-144: The config class currently defines an unused field
rotary_scaling_factor while only rope_scaling_factor is passed into
MCoreGPTModel; either remove rotary_scaling_factor from the class to avoid
dead/unused configuration, or retain it but add a clear comment above the field
explaining its intended purpose and when subclasses (e.g., DeepSeek/Kimi) should
override it for MLA-specific behavior; update references in the class docstring
and ensure MCoreGPTModel instantiation (where rope_scaling_factor is passed)
remains consistent with the chosen approach.

In `@tests/unit_tests/models/llama/test_llama_bridge.py`:
- Around line 386-390: Rename the misnamed test method to include the missing
underscore so pytest discovers it: change the function name
testhf_config_to_provider_kwargs_from_base_class to
test_hf_config_to_provider_kwargs_from_base_class; ensure the assertions still
reference LlamaBridge and MegatronModelBridge and the
hf_config_to_provider_kwargs attribute to verify the inherited instance method.
- Around line 398-430: Rename the test function
testhf_config_to_provider_kwargs_returns_correct_mappings to
test_hf_config_to_provider_kwargs_returns_correct_mappings so pytest will
discover it; locate the method in
tests/unit_tests/models/llama/test_llama_bridge.py (the current def
testhf_config_to_provider_kwargs_returns_correct_mappings) and simply insert the
missing underscore in the function name while leaving the body and the call to
LlamaBridge.hf_config_to_provider_kwargs unchanged.

🧹 Nitpick comments (20)

tests/unit_tests/models/olmoe/test_olmoe_bridge.py (2)
484-485: Simplify boolean comparison.

Per ruff E712, avoid equality comparisons to True. Use the boolean value directly.
Proposed fix
-            assert AutoBridge.supports(config) == True
+            assert AutoBridge.supports(config)
488-489: Same simplification for False comparison.
Proposed fix
-        assert AutoBridge.supports(non_causal_config) == False
+        assert not AutoBridge.supports(non_causal_config)
tests/unit_tests/models/qwen/test_qwen3_bridge.py (2)
455-455: Simplify boolean comparison.

Same pattern as other test files - avoid == True.
Proposed fix
-            assert AutoBridge.supports(config) == True
+            assert AutoBridge.supports(config)
460-460: Simplify False comparison.
Proposed fix
-        assert AutoBridge.supports(non_causal_config) == False
+        assert not AutoBridge.supports(non_causal_config)
src/megatron/bridge/models/qwen/qwen3_moe_bridge.py (1)
43-45: Add explicit type hints to provider_bridge.

The method lost its argument/return type annotations.
🛠️ Suggested fix
 from megatron.bridge.models.conversion.param_mapping import (
     AutoMapping,
     GatedMLPMapping,
     QKVMapping,
 )
+from megatron.bridge.models.gpt_provider import GPTModelProvider
+from megatron.bridge.models.hf_pretrained.causal_lm import PreTrainedCausalLM
@@
-    def provider_bridge(self, hf_pretrained):
+    def provider_bridge(self, hf_pretrained: PreTrainedCausalLM) -> GPTModelProvider:
As per coding guidelines: Use type hints for function arguments and return types.
tests/unit_tests/models/qwen/test_qwen3_moe_bridge.py (1)
94-99: Docstring mismatch: spec is [], not list.

Align the comment with the actual Mock spec.
✏️ Suggested fix
-        Uses spec=list to make getattr return None for undefined attributes
+        Uses spec=[] to make getattr return None for undefined attributes
         instead of Mock objects, which would incorrectly be passed to GPTModelProvider.
src/megatron/bridge/models/qwen/qwen3_bridge.py (1)
43-45: Restore type hints on provider_bridge.

The method should keep explicit argument/return types.
🛠️ Suggested fix
 from megatron.bridge.models.conversion.param_mapping import (
     AutoMapping,
     GatedMLPMapping,
     QKVMapping,
 )
+from megatron.bridge.models.gpt_provider import GPTModelProvider
+from megatron.bridge.models.hf_pretrained.causal_lm import PreTrainedCausalLM
@@
-    def provider_bridge(self, hf_pretrained):
+    def provider_bridge(self, hf_pretrained: PreTrainedCausalLM) -> GPTModelProvider:
As per coding guidelines: Use type hints for function arguments and return types.
src/megatron/bridge/models/deepseek/deepseek_provider.py (1)
16-16: Use T | None instead of Optional[T].

Convert Optional annotations to the Python 3.10+ union syntax.
🛠️ Suggested fix
-from typing import Callable, List, Optional, Union
+from typing import Callable, List, Union
@@
-    q_lora_rank: Optional[int] = 1536
+    q_lora_rank: int | None = 1536
@@
-    q_lora_rank: Optional[int] = None
+    q_lora_rank: int | None = None
As per coding guidelines: Use 'T | None' for nullable types instead of 'Optional[T]'.

Also applies to: 57-58, 104-105
src/megatron/bridge/models/gemma/gemma2_bridge.py (1)
46-48: Use Google-style docstring here.

Please add Args/Returns so the method docstring is Sphinx-parseable.
Proposed docstring shape
-    """Convert HuggingFace config to Gemma2ModelProvider."""
+    """Convert HuggingFace config to Gemma2ModelProvider.
+
+    Args:
+        hf_pretrained: HuggingFace pretrained model wrapper.
+
+    Returns:
+        Gemma2ModelProvider: Configured provider for Megatron model.
+    """
As per coding guidelines: Use Google style docstrings (parseable by Sphinx) for classes and functions.
src/megatron/bridge/models/llama/llama_bridge.py (1)
31-31: Rename global logger to follow G_ convention.*

Global module variables should use the G_ prefix.
Suggested rename
-logger = logging.getLogger(__name__)
+G_LOGGER = logging.getLogger(__name__)
As per coding guidelines: Use upper snake_case and prefix 'G' for global variables (e.g., G_MY_GLOBAL).
src/megatron/bridge/models/nemotron/nemotron_bridge.py (2)
29-31: Add type hints + Google-style docstring for squared_relu.

This keeps helper utilities consistent with the repo’s typing and docstring conventions.
Proposed update
-def squared_relu(x):
-    """Squared ReLU activation function."""
-    return torch.pow(torch.nn.functional.relu(x), 2)
+def squared_relu(x: torch.Tensor) -> torch.Tensor:
+    """Squared ReLU activation function.
+
+    Args:
+        x: Input tensor.
+
+    Returns:
+        Output tensor after applying squared ReLU.
+    """
+    return torch.pow(torch.nn.functional.relu(x), 2)
As per coding guidelines: Use type hints for function arguments and return types; Use Google style docstrings (parseable by Sphinx) for classes and functions.

52-55: Optional: use iterable unpacking for CONFIG_MAPPING.

This avoids list concatenation and aligns with ruff’s style suggestion.
Example change
-    CONFIG_MAPPING = MegatronModelBridge.CONFIG_MAPPING + [
-        # Nemotron uses norm_eps instead of rms_norm_eps
-        ("norm_eps", "layernorm_epsilon"),
-    ]
+    CONFIG_MAPPING = [
+        *MegatronModelBridge.CONFIG_MAPPING,
+        # Nemotron uses norm_eps instead of rms_norm_eps
+        ("norm_eps", "layernorm_epsilon"),
+    ]
src/megatron/bridge/models/qwen/qwen2_bridge.py (1)
46-58: Consider adding type hints for consistency.

Other bridges like GemmaBridge include type hints on provider_bridge. Adding them here would improve IDE support and documentation.
-    def provider_bridge(self, hf_pretrained):
-        """Convert HuggingFace Qwen2 config to GPTModelProvider."""
+    def provider_bridge(self, hf_pretrained: "PreTrainedCausalLM") -> "GPTModelProvider":
+        """Convert HuggingFace Qwen2 config to GPTModelProvider."""
src/megatron/bridge/models/glm/glm45_bridge.py (1)
34-39: Remove unused noqa directive.

Per static analysis, the # noqa: F401 directive is unnecessary since the F401 rule is not enabled in the ruff configuration.
Suggested fix
 try:
-    import transformer_engine  # noqa: F401
+    import transformer_engine

     HAVE_TE = True
 except (ImportError, ModuleNotFoundError):
     HAVE_TE = False
src/megatron/bridge/models/deepseek/deepseek_v2_bridge.py (1)
27-32: Remove unused noqa directive.

Same issue as in glm45_bridge.py - the # noqa: F401 directive is unnecessary.
Suggested fix
 try:
-    import transformer_engine  # noqa: F401
+    import transformer_engine

     HAVE_TE = True
 except (ImportError, ModuleNotFoundError):
     HAVE_TE = False
src/megatron/bridge/models/nemotronh/nemotron_h_bridge.py (1)
57-70: Consider using iterable unpacking for CONFIG_MAPPING.

Per static analysis (RUF005), unpacking is preferred over concatenation for better readability and slight performance improvement.
Suggested refactor
-    CONFIG_MAPPING = MegatronModelBridge.CONFIG_MAPPING + [
+    CONFIG_MAPPING = [
+        *MegatronModelBridge.CONFIG_MAPPING,
         # Mamba-specific fields
         ("mamba_head_dim", "mamba_head_dim"),
         ("mamba_num_heads", "mamba_num_heads"),
         ("n_groups", "mamba_num_groups"),
         ("ssm_state_size", "mamba_state_dim"),
         ("hybrid_override_pattern", "hybrid_override_pattern"),
         ("residual_in_fp32", "fp32_residual_connection"),
         ("use_bias", "add_bias_linear"),
         ("layer_norm_epsilon", "layernorm_epsilon"),
         # MoE-specific fields (already in base but with different HF names)
         ("moe_shared_expert_intermediate_size", "moe_shared_expert_intermediate_size"),
     ]
src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py (2)
31-31: Remove unused noqa directive.

Static analysis indicates the # noqa: F401 is unnecessary since no F401 rule violation exists here. The import is used for the side-effect check.
Proposed fix
-    import transformer_engine  # noqa: F401
+    import transformer_engine
47-94: Significant code duplication with DeepSeek V2 bridge.

Comparing with deepseek_v2_bridge.py (from relevant snippets), approximately 40+ lines in provider_bridge are identical between V2 and V3 (lines 51-84 mirror the V2 implementation). The only differences appear to be:

V3 adds moe_router_enable_expert_bias = True (line 65)

V3 adds moe_shared_expert_overlap = True (line 64)

Consider extracting common DeepSeek provider configuration to a shared helper (e.g., in common.py) to reduce duplication and ensure consistency.
src/megatron/bridge/models/conversion/model_bridge.py (2)
244-285: Annotate mutable class attributes with ClassVar.

Per static analysis (RUF012), mutable class attributes like CONFIG_MAPPING should be annotated with typing.ClassVar to indicate they are class-level rather than instance-level attributes.
Proposed fix
+from typing import ClassVar
+
 class MegatronModelBridge(...):
     ...
-    CONFIG_MAPPING = [
+    CONFIG_MAPPING: ClassVar[list[tuple[str, str]]] = [
         # Core architecture
         ("num_hidden_layers", "num_layers"),
         ...
     ]
Apply similar annotations to YARN_ROPE_SCALING_MAPPING, MLA_ROPE_SCALING_MAPPING, and ACTIVATION_MAPPING.
368-368: Local import to avoid circular dependency.

The import of MLAModelProvider inside hf_config_to_provider_kwargs is likely to prevent circular imports. Consider adding a brief comment explaining this.
Proposed documentation
+        # Import here to avoid circular dependency (MLAModelProvider imports from this module)
         from megatron.bridge.models.mla_provider import MLAModelProvider

src/megatron/bridge/models/conversion/model_bridge.py

coderabbitai · 2026-02-05T00:28:16Z

src/megatron/bridge/models/gemma/gemma3_bridge.py

+        # Gemma3-specific features not in CONFIG_MAPPING
+        provider.window_size = hf_config.sliding_window
+        provider.rotary_base = (hf_config.rope_local_base_freq, hf_config.rope_theta)
+        provider.softmax_scale = 1.0 / math.sqrt(hf_config.query_pre_attn_scalar)
+        provider.rope_scaling_factor = hf_config.rope_scaling["factor"] if hf_config.rope_scaling else 1.0


⚠️ Potential issue | 🟡 Minor

Consider defensive access for rope_scaling["factor"].

If hf_config.rope_scaling is a dict but doesn't contain the "factor" key, this will raise a KeyError. Consider using .get() with a default value.

Suggested fix

- provider.rope_scaling_factor = hf_config.rope_scaling["factor"] if hf_config.rope_scaling else 1.0 + provider.rope_scaling_factor = hf_config.rope_scaling.get("factor", 1.0) if hf_config.rope_scaling else 1.0

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Gemma3-specific features not in CONFIG_MAPPING

provider.window_size = hf_config.sliding_window

provider.rotary_base = (hf_config.rope_local_base_freq, hf_config.rope_theta)

provider.softmax_scale = 1.0 / math.sqrt(hf_config.query_pre_attn_scalar)

provider.rope_scaling_factor = hf_config.rope_scaling["factor"] if hf_config.rope_scaling else 1.0

# Gemma3-specific features not in CONFIG_MAPPING

provider.window_size = hf_config.sliding_window

provider.rotary_base = (hf_config.rope_local_base_freq, hf_config.rope_theta)

provider.softmax_scale = 1.0 / math.sqrt(hf_config.query_pre_attn_scalar)

provider.rope_scaling_factor = hf_config.rope_scaling.get("factor", 1.0) if hf_config.rope_scaling else 1.0

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/gemma/gemma3_bridge.py` around lines 63 - 67, The code assigns provider.rope_scaling_factor using hf_config.rope_scaling["factor"] which will raise a KeyError if rope_scaling is a dict missing "factor"; update the assignment in gemma3_bridge.py to defensively access the key (e.g., use hf_config.rope_scaling.get("factor", 1.0) or check for the key/None first) so provider.rope_scaling_factor defaults to 1.0 when hf_config.rope_scaling is falsy or lacks "factor".

src/megatron/bridge/models/gpt_provider.py

coderabbitai · 2026-02-05T00:28:16Z

tests/unit_tests/models/llama/test_llama_bridge.py

+    def test_provider_bridge_backward_compatibility(self, mock_pretrained_llama):
+        """Test that provider_bridge still works as an alias for provider_bridge."""
        bridge = LlamaBridge()

-        # Pass model only
+        # Both methods should return equivalent results
+        result_config = bridge.provider_bridge(mock_pretrained_llama)
+        result_provider = bridge.provider_bridge(mock_pretrained_llama)
+
+        # They should have the same architecture
+        assert result_config.num_layers == result_provider.num_layers
+        assert result_config.hidden_size == result_provider.hidden_size
+        assert result_config.normalization == result_provider.normalization


⚠️ Potential issue | 🟡 Minor

Test is a tautology - calls same method twice.

This test calls bridge.provider_bridge(mock_pretrained_llama) twice and compares the results. Since both calls create new provider instances, the comparison is always true for equivalent configurations. The docstring says it tests "provider_bridge still works as an alias for provider_bridge" which doesn't make sense.

Consider either removing this test or clarifying its intent (e.g., testing idempotency or determinism).

Suggested clarification

def test_provider_bridge_backward_compatibility(self, mock_pretrained_llama): - """Test that provider_bridge still works as an alias for provider_bridge.""" + """Test that provider_bridge produces consistent results across multiple calls.""" bridge = LlamaBridge() - # Both methods should return equivalent results + # Multiple calls should return equivalent results (determinism check) result_config = bridge.provider_bridge(mock_pretrained_llama) result_provider = bridge.provider_bridge(mock_pretrained_llama) - # They should have the same architecture + # They should have the same architecture (consistency check) assert result_config.num_layers == result_provider.num_layers

coderabbitai · 2026-02-05T00:28:16Z

tests/unit_tests/models/llama/test_llama_bridge.py

+    def testhf_config_to_provider_kwargs_from_base_class(self):
+        """Test that hf_config_to_provider_kwargs is inherited from MegatronModelBridge."""
+        assert hasattr(LlamaBridge, "hf_config_to_provider_kwargs")
+        # It's an instance method, check it exists on the class
+        assert hasattr(MegatronModelBridge, "hf_config_to_provider_kwargs")


⚠️ Potential issue | 🟡 Minor

Fix test method naming - missing underscore.

The method name testhf_config_to_provider_kwargs_from_base_class is missing an underscore after test, which means pytest won't discover it as a test.

Proposed fix

- def testhf_config_to_provider_kwargs_from_base_class(self): + def test_hf_config_to_provider_kwargs_from_base_class(self):

🤖 Prompt for AI Agents

In `@tests/unit_tests/models/llama/test_llama_bridge.py` around lines 386 - 390, Rename the misnamed test method to include the missing underscore so pytest discovers it: change the function name testhf_config_to_provider_kwargs_from_base_class to test_hf_config_to_provider_kwargs_from_base_class; ensure the assertions still reference LlamaBridge and MegatronModelBridge and the hf_config_to_provider_kwargs attribute to verify the inherited instance method.

coderabbitai · 2026-02-05T00:28:16Z

tests/unit_tests/models/llama/test_llama_bridge.py

+    def testhf_config_to_provider_kwargs_returns_correct_mappings(self):
+        """Test that hf_config_to_provider_kwargs correctly maps HF config to provider kwargs."""
+        # Create a mock HF config
+        mock_hf_config = Mock()
+        mock_hf_config.num_hidden_layers = 32
+        mock_hf_config.hidden_size = 4096
+        mock_hf_config.intermediate_size = 14336
+        mock_hf_config.num_attention_heads = 32
+        mock_hf_config.num_key_value_heads = 8
+        mock_hf_config.vocab_size = 128256
+        mock_hf_config.max_position_embeddings = 8192
+        mock_hf_config.rope_theta = 500000.0
+        mock_hf_config.rms_norm_eps = 1e-05
+        mock_hf_config.initializer_range = 0.02
+        mock_hf_config.hidden_act = "silu"
+        mock_hf_config.torch_dtype = "bfloat16"
+        mock_hf_config.attention_dropout = 0.0
+        mock_hf_config.tie_word_embeddings = False
+        mock_hf_config.attention_bias = False
+        mock_hf_config.mlp_bias = False
+
+        bridge = LlamaBridge()
+        kwargs = bridge.hf_config_to_provider_kwargs(mock_hf_config)
+
+        assert kwargs["num_layers"] == 32
+        assert kwargs["hidden_size"] == 4096
+        assert kwargs["ffn_hidden_size"] == 14336
+        assert kwargs["num_attention_heads"] == 32
+        assert kwargs["num_query_groups"] == 8
+        assert kwargs["vocab_size"] == 128256
+        assert kwargs["seq_length"] == 8192
+        assert kwargs["rotary_base"] == 500000.0
+        assert kwargs["activation_func"] == F.silu


⚠️ Potential issue | 🟡 Minor

Fix test method naming - missing underscore.

Same issue: testhf_config_to_provider_kwargs_returns_correct_mappings should be test_hf_config_to_provider_kwargs_returns_correct_mappings.

Proposed fix

- def testhf_config_to_provider_kwargs_returns_correct_mappings(self): + def test_hf_config_to_provider_kwargs_returns_correct_mappings(self):

🤖 Prompt for AI Agents

In `@tests/unit_tests/models/llama/test_llama_bridge.py` around lines 398 - 430, Rename the test function testhf_config_to_provider_kwargs_returns_correct_mappings to test_hf_config_to_provider_kwargs_returns_correct_mappings so pytest will discover it; locate the method in tests/unit_tests/models/llama/test_llama_bridge.py (the current def testhf_config_to_provider_kwargs_returns_correct_mappings) and simply insert the missing underscore in the function name while leaving the body and the call to LlamaBridge.hf_config_to_provider_kwargs unchanged.

cuichenx · 2026-02-06T21:49:18Z

tests/functional_tests/models/nemotronh/test_nemotron_h_provider.py

why is this test removed?

src/megatron/bridge/models/conversion/model_bridge.py

yaoyu-33 and others added 27 commits January 23, 2026 09:29

Merge branch 'main' into feature/provider-bridge-refactor

aa54966

update

4ade385

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove MEGATRON_DEFAULTS

3ec0b0a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove testing scripts

c100b4a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yarn fix

3c709c5

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

clean ups

9eb8542

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix

2cc3b35

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix unit tests

ca54e4f

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

unit test fix

1ed2069

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

functional test fix

9facb3e

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

code rabbit

b753ab8

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix functional tests

ab24acf

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove KimiK2Bridge from init

b6f8e29

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

clean up deprecated functional tests

167055a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Merge branch 'main' into feature/provider-bridge-refactor

22a6321

olmoe update

f9a3231

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix: always set yarn params with None defaults for MCoreGPTModel comp…

ada7d05

…atibility

Merge branch 'main' into feature/provider-bridge-refactor

5f24f9b

lint

057175f

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Merge branch 'main' into feature/provider-bridge-refactor-2

54e0971

nemotron H bridge update

6c6a3b9

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

nemotron bridge update

0c94698

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

nemotron bridge update

4d827e6

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci February 5, 2026 23:43 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 6, 2026 19:53 Inactive

cuichenx reviewed Feb 6, 2026

View reviewed changes

cuichenx approved these changes Feb 10, 2026

View reviewed changes

yaoyu-33 merged commit 841b531 into main Feb 10, 2026
84 of 86 checks passed

coderabbitai bot mentioned this pull request Feb 11, 2026

feat: Add support for Qwen2-Audio #2324

Merged

ko3n1g mentioned this pull request Feb 24, 2026

260201: Cherrypick various changes #2509

Merged

5 tasks

This was referenced Feb 25, 2026

GPT-OSS examples #2422

Merged

[model, recipe, test] refactor: remove specific model provider dataclasses #2599

Merged

coderabbitai bot mentioned this pull request Mar 4, 2026

Store hf_pretrained as properties of Megatron*Bridge classes #2644

Open

5 tasks

Conversation

yaoyu-33 commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 5, 2026

Uh oh!

coderabbitai bot commented Feb 5, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

cuichenx Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yaoyu-33 commented Feb 5, 2026 •

edited by coderabbitai bot

Loading