[model, refactor] refactor: Centralize provider_bridge config mapping in base class by yaoyu-33 · Pull Request #2052 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-01-23T21:25:45Z

[model, refactor] refactor: Centralize provider_bridge config mapping in base class

Summary

This PR implements the provider_bridge refactoring proposal (docs/proposals/provider_bridge_refactor.md). It centralizes common HF to Megatron configuration mappings in the MegatronModelBridge base class and refactors model bridges to use the new pattern with direct property assignment.

This is partial work - model-specific provider classes (e.g., LlamaModelProvider, Qwen2ModelProvider) are NOT removed yet. That cleanup will come in a follow-up PR.

Motivation

Before (scattered model-specific logic):

class LlamaBridge(MegatronModelBridge):
    def provider_bridge(self, hf_pretrained):
        return LlamaModelProvider(
            num_layers=hf_pretrained.config.num_hidden_layers,
            hidden_size=hf_pretrained.config.hidden_size,
            ffn_hidden_size=hf_pretrained.config.intermediate_size,
            # ... 20+ more field mappings duplicated across bridges
        )

After (centralized mapping + direct property assignment):

class LlamaBridge(MegatronModelBridge):
    def provider_bridge(self, hf_pretrained):
        provider = super().provider_bridge(hf_pretrained)  # Returns GPTModelProvider
        
        # Only model-specific settings
        provider.normalization = "RMSNorm"
        provider.gated_linear_unit = True
        provider.position_embedding_type = "rope"
        
        return provider

Key Changes

Base Class Enhancements (`MegatronModelBridge`)

Added centralized mappings and helper methods:

Addition	Description
`CONFIG_MAPPING`	30+ bidirectional field mappings (HF to Megatron) including MoE and MLA fields
`ACTIVATION_MAPPING`	Common activation function mapping (silu, gelu, relu, tanh, gelu_pytorch_tanh)
`YARN_ROPE_SCALING_MAPPING`	YARN rope scaling field mappings
`hf_config_to_provider_kwargs()`	HF config to Megatron provider kwargs conversion
`megatron_to_hf_config()`	Megatron provider to HF config dict (for export)
`hf_to_megatron_activation()`	Activation string to function
`megatron_to_hf_activation()`	Activation function to string
Default `provider_bridge()`	Creates provider using `CONFIG_MAPPING`
`PROVIDER_CLASS`	Support for custom provider class via `@register_bridge(provider=...)`

New: `MLAModelProvider`

Added a minimal MLA (Multi-Latent Attention) provider class that combines MLATransformerConfig with GPTModelProvider. Used by DeepSeek V2/V3 and Kimi K2.

Refactored Bridges

All refactored bridges now:

Call super().provider_bridge() to get a provider with common settings from CONFIG_MAPPING
Set model-specific properties directly on the provider
Return the appropriate provider type (GPTModelProvider or MLAModelProvider)

Bridge	Status	Provider	Notes
`LlamaBridge`	Refactored	`GPTModelProvider`	RoPE scaling for Llama 3.1/3.2
`Qwen2Bridge`	Refactored	`GPTModelProvider`	`add_qkv_bias=True`
`Qwen3Bridge`	Refactored	`GPTModelProvider`	`qk_layernorm=True`, no QKV bias
`Qwen3MoEBridge`	Refactored	`GPTModelProvider`	MoE settings
`DeepSeekV2Bridge`	Refactored	`MLAModelProvider`	MLA + MoE settings
`DeepSeekV3Bridge`	Refactored	`MLAModelProvider`	MLA + MoE + expert bias
`KimiK2Bridge`	New	`MLAModelProvider`	MLA + MoE (similar to DeepSeek V3)
`GemmaBridge`	Refactored	`GPTModelProvider`	Embedding scaling
`Gemma2Bridge`	Refactored	`Gemma2ModelProvider`	Logit softcapping, sliding window
`Gemma3Bridge`	Refactored	`Gemma2ModelProvider`	Gemma3-specific settings
`GLM45Bridge`	Refactored	`GPTModelProvider`	MTP mappings, MoE layer freq
`GPTOSSBridge`	Refactored	`GPTModelProvider`	Expert weight handling

Simplified Providers

Provider	Changes
`DeepSeekModelProvider` family	Simplified - specific configs moved to bridges
`LlamaModelProvider`	Simplified - specific configs moved to bridge

Not Included in This PR (Future Work)

Item	Status	Notes
Remove unused provider size variants	Deferred	e.g., `Llama2ModelProvider7B`, `Qwen2ModelProvider7B`
Refactor remaining bridges	Deferred	Mistral, Phi, Nemotron, VLM bridges, etc.

Files Changed

Core

src/megatron/bridge/models/conversion/model_bridge.py - Added CONFIG_MAPPING, ACTIVATION_MAPPING, YARN_ROPE_SCALING_MAPPING, helper methods, default provider_bridge(), PROVIDER_CLASS support

New Files

src/megatron/bridge/models/mla_provider.py - New MLAModelProvider for MLA-based models
src/megatron/bridge/models/kimi/__init__.py - Kimi module init
src/megatron/bridge/models/kimi/kimi_bridge.py - New Kimi K2 bridge

Refactored Bridges

src/megatron/bridge/models/llama/llama_bridge.py
src/megatron/bridge/models/qwen/qwen2_bridge.py
src/megatron/bridge/models/qwen/qwen3_bridge.py
src/megatron/bridge/models/qwen/qwen3_moe_bridge.py
src/megatron/bridge/models/deepseek/deepseek_v2_bridge.py
src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py
src/megatron/bridge/models/gemma/gemma_bridge.py
src/megatron/bridge/models/gemma/gemma2_bridge.py
src/megatron/bridge/models/gemma/gemma3_bridge.py
src/megatron/bridge/models/glm/glm45_bridge.py
src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py

Simplified Providers

src/megatron/bridge/models/deepseek/deepseek_provider.py
src/megatron/bridge/models/llama/llama_provider.py

Tests

tests/unit_tests/models/llama/test_llama_bridge.py - Updated to expect GPTModelProvider
tests/unit_tests/models/qwen/test_qwen3_bridge.py - Updated
tests/unit_tests/models/qwen/test_qwen3_moe_bridge.py - Updated

Design Principles

Following docs/proposals/provider_bridge_refactor.md:

Use base class CONFIG_MAPPING - Common field mappings are handled automatically
Direct property assignment - Set model-specific config directly on provider
Upstream common mappings - Add new field mappings to base class when they apply to multiple models
Minimal overrides - Only set properties that differ from base class defaults
Custom provider via decorator - Use @register_bridge(provider=MLAModelProvider) for MLA models

Breaking Changes

None for public API
Tests updated to expect GPTModelProvider instead of some model-specific providers

Checklist

Summary by CodeRabbit

New Features
- Added bidirectional configuration translation between HuggingFace and Megatron formats with nested field support.
- Introduced YARN and RoPE scaling position embedding support.
- Added activation function mapping for silu, gelu, relu, and tanh across frameworks.
- Enabled model-type specification for bridge registration.
Improvements
- Unified provider interfaces across multiple model architectures.
- Enhanced model configuration handling with vocab sizing and dtype adjustments.
Deprecations
- Legacy model provider classes now issue deprecation warnings.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

This refactoring centralizes model-specific configurations within the provider_bridge method of each model bridge. Changes: - Add MoE-related field mappings to base class CONFIG_MAPPING: - num_experts -> num_moe_experts - num_experts_per_tok -> moe_router_topk - moe_intermediate_size -> moe_ffn_hidden_size - Refactor LlamaBridge: - Use MEGATRON_DEFAULTS and HF_DEFAULTS class attributes - Override provider_bridge only for RoPE scaling (Llama 3.1/3.2) - Refactor Qwen2Bridge: - Use MEGATRON_DEFAULTS (add_qkv_bias=True) and HF_DEFAULTS - No provider_bridge override needed - Refactor Qwen3Bridge: - Use MEGATRON_DEFAULTS (qk_layernorm=True) and HF_DEFAULTS - No provider_bridge override needed - Refactor Qwen3MoEBridge: - Use MEGATRON_DEFAULTS with MoE settings and HF_DEFAULTS - No provider_bridge override needed - Update tests to expect GPTModelProvider instead of model-specific providers - Add verification scripts for both Llama and Qwen bridges Verified on remote server: - Qwen/Qwen2-0.5B: PASS - Qwen/Qwen2-7B: PASS - Qwen/Qwen3-0.6B: PASS - Qwen/Qwen3-1.7B: PASS - Qwen/Qwen3-30B-A3B: PASS

copy-pr-bot · 2026-01-23T21:25:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…dels - Add MLAModelProvider as unified base for Multi-Latent Attention models - Refactor DeepSeek V2/V3 bridges to use MLAModelProvider - Refactor Kimi K2 bridge to use MLAModelProvider - Move model-specific defaults from providers to MEGATRON_DEFAULTS in bridges - Add model_type parameter to @register_bridge decorator for auto HF config - Simplify provider files to deprecated backward-compatible aliases Verified: DeepSeek-V2-Lite, DeepSeek-V2, DeepSeek-V3, Moonlight-16B, Kimi-K2

- Register GemmaModelProvider, Gemma2ModelProvider, Gemma3ModelProvider via decorator - Add MEGATRON_DEFAULTS to Gemma/Gemma2 bridges for explicit config defaults - Add gelu_pytorch_tanh -> fast_gelu to ACTIVATION_MAPPING in model_bridge.py - Add verification script for Gemma provider refactoring Verified: gemma-2b, gemma-7b, gemma-2-2b, gemma-2-9b, gemma-2-27b, gemma-3-4b-it, gemma-3-12b-it, gemma-3-27b-it

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

src/megatron/bridge/models/conversion/model_bridge.py

src/megatron/bridge/models/deepseek/deepseek_provider.py

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-01-27T01:19:39Z

/ok to test ca54e4f

coderabbitai · 2026-01-27T01:25:43Z

📝 Walkthrough

Walkthrough

This PR introduces bidirectional HuggingFace-to-Megatron configuration translation on MegatronModelBridge through new class attributes and methods, consolidates multiple model providers into shared base classes (GPTModelProvider, MLAModelProvider), and refactors bridge registration to accept provider and model_type parameters while replacing inline provider construction with attribute configuration patterns.

Changes

Cohort / File(s)	Summary
Core Bridge Infrastructure `src/megatron/bridge/models/conversion/model_bridge.py`	Added bidirectional config translation: `CONFIG_MAPPING`, `ACTIVATION_MAPPING`, `YARN_ROPE_SCALING_MAPPING` class attributes; new methods `hf_config_to_provider_kwargs()`, `megatron_to_hf_config()`, `hf_to_megatron_activation()`, `megatron_to_hf_activation()`. Extended `register_bridge()` and `create_bridge_decorator()` to accept optional `provider` and `model_type` parameters. Updated `provider_bridge()` to leverage new translation helpers.
Base Provider Updates `src/megatron/bridge/models/gpt_provider.py`	Added `position_embedding_type` field supporting `"yarn"` mode; added `rope_scaling`, `rope_scaling_factor`, and `rotary_scaling_factor` fields. Updated `provide()` to pass rope scaling parameters to `MCoreGPTModel`.
New MLA Provider `src/megatron/bridge/models/mla_provider.py`	New class `MLAModelProvider` combining `MLATransformerConfig` and `GPTModelProvider` for multi-latent attention model support.
DeepSeek Provider Consolidation `src/megatron/bridge/models/deepseek/deepseek_provider.py`	Refactored `DeepSeekModelProvider`, `DeepSeekV2ModelProvider`, `DeepSeekV2LiteModelProvider`, `DeepSeekV3ModelProvider`, and `MoonlightModelProvider16B` to subclass `MLAModelProvider` with deprecation warnings. Added backward-compatibility aliases (`DeepSeekProvider`, `DeepSeekV2Provider`, etc.).
DeepSeek Bridge Updates `src/megatron/bridge/models/deepseek/deepseek_v2_bridge.py`, `src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py`	Updated bridge decorators to use `provider=MLAModelProvider` and `model_type` parameters. Replaced inline provider construction with `super().provider_bridge()` followed by explicit attribute configuration (normalization, attention, MoE settings, fusion flags). Added `AutoMapping` entry for expert bias in v3 bridge.
Gemma Bridge Updates `src/megatron/bridge/models/gemma/gemma_bridge.py`, `src/megatron/bridge/models/gemma/gemma2_bridge.py`, `src/megatron/bridge/models/gemma/gemma3_bridge.py`	Updated bridge decorators with `provider` and `model_type` parameters. Refactored `provider_bridge()` to call `super().provider_bridge()` and configure attributes in-place rather than constructing providers with parameter maps. Added imports for activation functions where needed.
GLM45 Bridge `src/megatron/bridge/models/glm/glm45_bridge.py`	Replaced `GLMMoEModelProvider` with `GPTModelProvider`; added `model_type="glm4_moe"` to decorator. Updated `provider_bridge()` to use superclass method and configure MoE/optimization flags on the returned provider. Added `self._hf_config` storage in `build_conversion_tasks()`.
GPT-OSS Bridge `src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py`	Changed return type from `GPTOSSProvider` to `GPTModelProvider`. Added `model_type="gpt_oss"` decorator parameter. Introduced fallback for `quick_gelu` import; removed `GenerationConfig`, `GptOssConfig` imports. Replaced provider construction with `super().provider_bridge()` and attribute configuration including yarn-related settings.
Llama Bridge and Provider `src/megatron/bridge/models/llama/llama_bridge.py`, `src/megatron/bridge/models/llama/llama_provider.py`	Updated bridge to return `GPTModelProvider` and added `model_type="llama"`. Added `megatron_to_hf_config()` classmethod. Refactored `provider_bridge()` to use superclass method with RoPE scaling detection. Updated provider classes to use `rope_scaling`/`rope_scaling_factor` instead of NTK-specific parameters; removed `apply_rope_scaling()` function.
Qwen Bridge Updates `src/megatron/bridge/models/qwen/qwen2_bridge.py`, `src/megatron/bridge/models/qwen/qwen3_bridge.py`, `src/megatron/bridge/models/qwen/qwen3_moe_bridge.py`	Added `model_type` parameters to decorators. Refactored `provider_bridge()` methods to use `super().provider_bridge()` and configure attributes instead of constructing new providers. Removed explicit type annotations and provider class imports.
Kimi Bridge `src/megatron/bridge/models/kimi/kimi_bridge.py`, `src/megatron/bridge/models/kimi/__init__.py`	New `KimiK2Bridge` class implementing `provider_bridge()` and `mapping_registry()` for MLA-based Kimi K2 support. Exported `KimiK2Bridge` from module `__init__.py`.
Llama Bridge Tests `tests/unit_tests/models/llama/test_llama_bridge.py`	Extensively rewritten test suite: replaced `LlamaModelProvider` with `GPTModelProvider`; added tests for registration, RoPE scaling, CONFIG_MAPPING, ACTIVATION_MAPPING, bidirectional config conversion, and `AutoBridge` integration.
Qwen Bridge Tests `tests/unit_tests/models/qwen/test_qwen3_bridge.py`, `tests/unit_tests/models/qwen/test_qwen3_moe_bridge.py`	Updated tests to expect `GPTModelProvider` instead of model-specific providers; replaced `Qwen3ModelProvider` and `Qwen3MoEModelProvider` imports with `GPTModelProvider`; tightened mock configurations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	❓ Inconclusive	PR description includes unit test updates but lacks documented test execution results, regression analysis, or convergence validation. PR objectives explicitly state 'CI full run remain[s] TODO', indicating testing was incomplete at submission.	Provide full CI test execution results, regression analysis showing numerical outputs unchanged, and clarify whether deferred testing is acceptable as follow-up work.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main change as centralizing provider_bridge config mapping in the base class, which accurately reflects the core purpose of this refactoring.
Docstring Coverage	✅ Passed	Docstring coverage is 83.15% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/megatron/bridge/models/deepseek/deepseek_provider.py (2)
85-85: Type annotation mismatch: int = None should be Optional[int].

The annotation q_lora_rank: int = None is incorrect since None is not a valid int. This should be Optional[int] = None or int | None = None.
Proposed fix
-    q_lora_rank: int = None
+    q_lora_rank: Optional[int] = None
Also update the imports at line 16:
-from typing import List, Union
+from typing import List, Optional, Union
165-165: Same type annotation issue: int = None should be Optional[int].

Same issue as line 85 - q_lora_rank: int = None should use Optional[int].

🤖 Fix all issues with AI agents

In `@src/megatron/bridge/models/gemma/gemma3_bridge.py`:
- Around line 56-65: After you set
provider.fp16/provider.bf16/provider.params_dtype from dtype_from_hf, also set
provider.autocast_dtype to the same derived dtype so autocasting matches the VL
precision override; locate where provider.fp16/bf16/params_dtype are assigned
(using dtype_from_hf with hf_vl_config) and set provider.autocast_dtype =
self.dtype_from_hf(hf_vl_config, default=torch.float32) (or reuse the computed
params dtype) to keep autocast consistent with params.

In `@src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py`:
- Around line 83-86: When forcing BF16 in gpt_oss_bridge, clear the FP16 flag to
avoid conflicting dtype settings: in the same location where you set
provider.bf16 = True and provider.params_dtype = torch.bfloat16 (near
provider.hidden_dropout), also set provider.fp16 = False so the provider does
not have both fp16 and bf16 enabled; update the assignment in the GPT-OSS bridge
initialization (the code that modifies provider.hidden_dropout / provider.bf16 /
provider.params_dtype) to explicitly clear provider.fp16 when enabling BF16.

In `@src/megatron/bridge/models/gpt_provider.py`:
- Around line 139-145: The parameters rope_scaling, rope_scaling_factor, and
seq_len_interpolation_factor are being passed unconditionally to Megatron APIs;
to match the existing defensive pattern used for mtp_block_spec, check the
target function/class signatures (use inspect.signature where mtp_block_spec is
handled) before adding these kwargs, and only include them in the kwargs dict if
the inspected signature has those parameters; remove direct positional/keyword
assignments for rope_scaling/rope_scaling_factor/seq_len_interpolation_factor
and instead add them to the **kwargs conditionally (referencing the same
inspection logic used around mtp_block_spec).

In `@src/megatron/bridge/models/kimi/kimi_bridge.py`:
- Around line 33-75: The Kimi bridge's provider_bridge (method provider_bridge)
currently omits an explicit moe_aux_loss_coeff; add an explicit assignment
setting provider.moe_aux_loss_coeff = 1e-3 in provider_bridge (alongside the
other provider.* assignments) so the KimiK2Provider's auxiliary MoE loss
coefficient is set consistently with GLM45/Qwen3/DeepSeek V3 bridges.

In `@src/megatron/bridge/models/qwen/qwen2_bridge.py`:
- Around line 37-39: Update the class docstring in Qwen2 (which currently
mentions MEGATRON_DEFAULTS) to instead state that model-specific settings are
applied in provider_bridge; locate the docstring on the Qwen2 class in
qwen2_bridge.py (and any nearby reference to MegatronModelBridge’s
CONFIG_MAPPING/ACTIVATION_MAPPING) and replace the outdated reference to
MEGATRON_DEFAULTS with a concise note that provider_bridge handles
model-specific defaults.

In `@tests/unit_tests/models/llama/test_llama_bridge.py`:
- Around line 218-227: The test function
test_provider_bridge_rope_scaling_params currently accepts an unused fixture
parameter llama_config; remove that parameter from the test signature so it
becomes def test_provider_bridge_rope_scaling_params(self,
mock_pretrained_llama): and ensure any references to llama_config inside the
test are not present, then run the linter to confirm Ruff no longer flags the
unused argument; locate this test in
tests/unit_tests/models/llama/test_llama_bridge.py and update the function
signature accordingly (function name: test_provider_bridge_rope_scaling_params,
class uses LlamaBridge and mock_pretrained_llama).

🧹 Nitpick comments (1)

src/megatron/bridge/models/conversion/model_bridge.py (1)
244-310: Consider annotating mutable class attributes with ClassVar.

The CONFIG_MAPPING, YARN_ROPE_SCALING_MAPPING, and ACTIVATION_MAPPING are mutable class attributes that should ideally be annotated with typing.ClassVar to indicate they are class-level and not instance-level attributes.
Proposed fix

Add ClassVar to imports and annotate:
 from typing import (
     Callable,
+    ClassVar,
     Dict,
     Generic,
     ...
 )
-    CONFIG_MAPPING = [
+    CONFIG_MAPPING: ClassVar[list[tuple[str, str]]] = [
         # Core architecture
         ...
     ]

-    YARN_ROPE_SCALING_MAPPING = [
+    YARN_ROPE_SCALING_MAPPING: ClassVar[list[tuple[str, str]]] = [
         ...
     ]

-    ACTIVATION_MAPPING = {
+    ACTIVATION_MAPPING: ClassVar[dict[str, Callable]] = {
         ...
     }

src/megatron/bridge/models/gemma/gemma3_bridge.py

src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py

src/megatron/bridge/models/gpt_provider.py

src/megatron/bridge/models/kimi/kimi_bridge.py

src/megatron/bridge/models/qwen/qwen2_bridge.py

tests/unit_tests/models/llama/test_llama_bridge.py

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-01-27T05:05:23Z

/ok to test 1ed2069

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-01-28T06:53:27Z

/ok to test ab24acf

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-01-28T17:15:51Z

/ok to test b6f8e29

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-01-28T22:00:48Z

/ok to test fa03b2c

copy-pr-bot · 2026-01-28T22:00:51Z

/ok to test fa03b2c

@yaoyu-33, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

yaoyu-33 · 2026-01-28T22:01:10Z

/ok to test 167055a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-02-03T00:09:58Z

/ok to test f9a3231

OLMoE HF config doesn't have head_dim attribute, so kv_channels was left as None. This fix calculates it as hidden_size // num_attention_heads (2048 // 16 = 128 for OLMoE-1B-7B). This follows the pattern used by MistralBridge and NemotronHBridge.

yaoyu-33 · 2026-02-03T17:46:21Z

/ok to test d5b7890

…atibility

yaoyu-33 · 2026-02-03T21:15:31Z

/ok to test ada7d05

yaoyu-33 · 2026-02-03T21:45:25Z

/ok to test 5f24f9b

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-02-03T22:45:13Z

/ok to test 057175f

cuichenx

LGTM

yaoyu-33 added 6 commits January 23, 2026 15:16

Merge branch 'main' into feature/provider-bridge-refactor

aa54966

update

4ade385

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove MEGATRON_DEFAULTS

3ec0b0a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove testing scripts

c100b4a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 commented Jan 26, 2026

View reviewed changes

src/megatron/bridge/models/conversion/model_bridge.py Outdated Show resolved Hide resolved

yaoyu-33 commented Jan 26, 2026

View reviewed changes

src/megatron/bridge/models/conversion/model_bridge.py Show resolved Hide resolved

yaoyu-33 commented Jan 26, 2026

View reviewed changes

src/megatron/bridge/models/conversion/model_bridge.py Outdated Show resolved Hide resolved

yaoyu-33 commented Jan 26, 2026

View reviewed changes

src/megatron/bridge/models/deepseek/deepseek_provider.py Show resolved Hide resolved

yaoyu-33 commented Jan 26, 2026

View reviewed changes

src/megatron/bridge/models/deepseek/deepseek_provider.py Show resolved Hide resolved

yaoyu-33 added 4 commits January 26, 2026 15:51

yarn fix

3c709c5

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

clean ups

9eb8542

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix

2cc3b35

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix unit tests

ca54e4f

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 changed the title ~~Refactor provider_bridge for Llama and Qwen models~~ [model, refactor] refactor: Centralize provider_bridge config mapping in base class Jan 27, 2026

yaoyu-33 marked this pull request as ready for review January 27, 2026 01:14

copy-pr-bot bot temporarily deployed to nemo-ci January 27, 2026 01:20 Inactive

copy-pr-bot bot temporarily deployed to test January 27, 2026 01:20 Inactive

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci January 27, 2026 01:35 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 27, 2026 01:42 Failure

unit test fix

1ed2069

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot bot temporarily deployed to nemo-ci January 27, 2026 05:05 Inactive

copy-pr-bot bot temporarily deployed to test January 27, 2026 05:06 Inactive

fix functional tests

ab24acf

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

remove KimiK2Bridge from init

b6f8e29

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

clean up deprecated functional tests

167055a

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 added 2 commits February 2, 2026 14:11

Merge branch 'main' into feature/provider-bridge-refactor

22a6321

olmoe update

f9a3231

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix: always set yarn params with None defaults for MCoreGPTModel comp…

ada7d05

…atibility

Merge branch 'main' into feature/provider-bridge-refactor

5f24f9b

lint

057175f

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

cuichenx approved these changes Feb 4, 2026

View reviewed changes

This was referenced Feb 5, 2026

[model, refactor] refactor: Centralize provider_bridge config mapping in base class for Nemotron models #2225

Merged

Support qwen3-vl for THD format and CP #1943

Merged

feat: Add support for Qwen2-Audio #2324

Merged

HollowMan6 mentioned this pull request Feb 14, 2026

fix: Correctly pass None hf_config values to provider_kwargs #2384

Merged

5 tasks

coderabbitai bot mentioned this pull request Feb 14, 2026

Add MiMo dense MTP models bridge support #2387

Merged

5 tasks

ko3n1g mentioned this pull request Feb 24, 2026

260201: Cherrypick various changes #2509

Merged

5 tasks

coderabbitai bot mentioned this pull request Feb 24, 2026

Add GLM5 support #2469

Closed

5 tasks

weijiac0619 mentioned this pull request Feb 25, 2026

GPT-OSS examples #2422

Merged

5 tasks

This was referenced Feb 27, 2026

[bridge, model] Qwen 3.5 VL Bridge #2530

Merged

[model, recipe, test] refactor: remove specific model provider dataclasses #2599

Merged

Conversation

yaoyu-33 commented Jan 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[model, refactor] refactor: Centralize provider_bridge config mapping in base class

Summary

Motivation

Key Changes

Base Class Enhancements (MegatronModelBridge)

New: MLAModelProvider

Refactored Bridges

Simplified Providers

Not Included in This PR (Future Work)

Files Changed

Core

New Files

Refactored Bridges

Simplified Providers

Tests

Design Principles

Breaking Changes

Checklist

Related

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jan 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 commented Jan 27, 2026

Uh oh!

yaoyu-33 commented Jan 28, 2026

Uh oh!

yaoyu-33 commented Jan 28, 2026

Uh oh!

yaoyu-33 commented Jan 28, 2026

Uh oh!

copy-pr-bot bot commented Jan 28, 2026

Uh oh!

yaoyu-33 commented Jan 28, 2026

Uh oh!

yaoyu-33 commented Feb 3, 2026

Uh oh!

yaoyu-33 commented Feb 3, 2026

Uh oh!

yaoyu-33 commented Feb 3, 2026

Uh oh!

yaoyu-33 commented Feb 3, 2026

Uh oh!

yaoyu-33 commented Feb 3, 2026

Uh oh!

cuichenx left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yaoyu-33 commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

Base Class Enhancements (`MegatronModelBridge`)

New: `MLAModelProvider`