Skip to content

[model, refactor] refactor: Centralize provider_bridge config mapping in base class#2052

Merged
cuichenx merged 23 commits intomainfrom
feature/provider-bridge-refactor
Feb 4, 2026
Merged

[model, refactor] refactor: Centralize provider_bridge config mapping in base class#2052
cuichenx merged 23 commits intomainfrom
feature/provider-bridge-refactor

Conversation

@yaoyu-33
Copy link
Copy Markdown
Contributor

@yaoyu-33 yaoyu-33 commented Jan 23, 2026

[model, refactor] refactor: Centralize provider_bridge config mapping in base class

Summary

This PR implements the provider_bridge refactoring proposal (docs/proposals/provider_bridge_refactor.md). It centralizes common HF to Megatron configuration mappings in the MegatronModelBridge base class and refactors model bridges to use the new pattern with direct property assignment.

This is partial work - model-specific provider classes (e.g., LlamaModelProvider, Qwen2ModelProvider) are NOT removed yet. That cleanup will come in a follow-up PR.

Motivation

Before (scattered model-specific logic):

class LlamaBridge(MegatronModelBridge):
    def provider_bridge(self, hf_pretrained):
        return LlamaModelProvider(
            num_layers=hf_pretrained.config.num_hidden_layers,
            hidden_size=hf_pretrained.config.hidden_size,
            ffn_hidden_size=hf_pretrained.config.intermediate_size,
            # ... 20+ more field mappings duplicated across bridges
        )

After (centralized mapping + direct property assignment):

class LlamaBridge(MegatronModelBridge):
    def provider_bridge(self, hf_pretrained):
        provider = super().provider_bridge(hf_pretrained)  # Returns GPTModelProvider
        
        # Only model-specific settings
        provider.normalization = "RMSNorm"
        provider.gated_linear_unit = True
        provider.position_embedding_type = "rope"
        
        return provider

Key Changes

Base Class Enhancements (MegatronModelBridge)

Added centralized mappings and helper methods:

Addition Description
CONFIG_MAPPING 30+ bidirectional field mappings (HF to Megatron) including MoE and MLA fields
ACTIVATION_MAPPING Common activation function mapping (silu, gelu, relu, tanh, gelu_pytorch_tanh)
YARN_ROPE_SCALING_MAPPING YARN rope scaling field mappings
hf_config_to_provider_kwargs() HF config to Megatron provider kwargs conversion
megatron_to_hf_config() Megatron provider to HF config dict (for export)
hf_to_megatron_activation() Activation string to function
megatron_to_hf_activation() Activation function to string
Default provider_bridge() Creates provider using CONFIG_MAPPING
PROVIDER_CLASS Support for custom provider class via @register_bridge(provider=...)

New: MLAModelProvider

Added a minimal MLA (Multi-Latent Attention) provider class that combines MLATransformerConfig with GPTModelProvider. Used by DeepSeek V2/V3 and Kimi K2.

Refactored Bridges

All refactored bridges now:

  1. Call super().provider_bridge() to get a provider with common settings from CONFIG_MAPPING
  2. Set model-specific properties directly on the provider
  3. Return the appropriate provider type (GPTModelProvider or MLAModelProvider)
Bridge Status Provider Notes
LlamaBridge Refactored GPTModelProvider RoPE scaling for Llama 3.1/3.2
Qwen2Bridge Refactored GPTModelProvider add_qkv_bias=True
Qwen3Bridge Refactored GPTModelProvider qk_layernorm=True, no QKV bias
Qwen3MoEBridge Refactored GPTModelProvider MoE settings
DeepSeekV2Bridge Refactored MLAModelProvider MLA + MoE settings
DeepSeekV3Bridge Refactored MLAModelProvider MLA + MoE + expert bias
KimiK2Bridge New MLAModelProvider MLA + MoE (similar to DeepSeek V3)
GemmaBridge Refactored GPTModelProvider Embedding scaling
Gemma2Bridge Refactored Gemma2ModelProvider Logit softcapping, sliding window
Gemma3Bridge Refactored Gemma2ModelProvider Gemma3-specific settings
GLM45Bridge Refactored GPTModelProvider MTP mappings, MoE layer freq
GPTOSSBridge Refactored GPTModelProvider Expert weight handling

Simplified Providers

Provider Changes
DeepSeekModelProvider family Simplified - specific configs moved to bridges
LlamaModelProvider Simplified - specific configs moved to bridge

Not Included in This PR (Future Work)

Item Status Notes
Remove unused provider size variants Deferred e.g., Llama2ModelProvider7B, Qwen2ModelProvider7B
Refactor remaining bridges Deferred Mistral, Phi, Nemotron, VLM bridges, etc.

Files Changed

Core

  • src/megatron/bridge/models/conversion/model_bridge.py - Added CONFIG_MAPPING, ACTIVATION_MAPPING, YARN_ROPE_SCALING_MAPPING, helper methods, default provider_bridge(), PROVIDER_CLASS support

New Files

  • src/megatron/bridge/models/mla_provider.py - New MLAModelProvider for MLA-based models
  • src/megatron/bridge/models/kimi/__init__.py - Kimi module init
  • src/megatron/bridge/models/kimi/kimi_bridge.py - New Kimi K2 bridge

Refactored Bridges

  • src/megatron/bridge/models/llama/llama_bridge.py
  • src/megatron/bridge/models/qwen/qwen2_bridge.py
  • src/megatron/bridge/models/qwen/qwen3_bridge.py
  • src/megatron/bridge/models/qwen/qwen3_moe_bridge.py
  • src/megatron/bridge/models/deepseek/deepseek_v2_bridge.py
  • src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py
  • src/megatron/bridge/models/gemma/gemma_bridge.py
  • src/megatron/bridge/models/gemma/gemma2_bridge.py
  • src/megatron/bridge/models/gemma/gemma3_bridge.py
  • src/megatron/bridge/models/glm/glm45_bridge.py
  • src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py

Simplified Providers

  • src/megatron/bridge/models/deepseek/deepseek_provider.py
  • src/megatron/bridge/models/llama/llama_provider.py

Tests

  • tests/unit_tests/models/llama/test_llama_bridge.py - Updated to expect GPTModelProvider
  • tests/unit_tests/models/qwen/test_qwen3_bridge.py - Updated
  • tests/unit_tests/models/qwen/test_qwen3_moe_bridge.py - Updated

Design Principles

Following docs/proposals/provider_bridge_refactor.md:

  1. Use base class CONFIG_MAPPING - Common field mappings are handled automatically
  2. Direct property assignment - Set model-specific config directly on provider
  3. Upstream common mappings - Add new field mappings to base class when they apply to multiple models
  4. Minimal overrides - Only set properties that differ from base class defaults
  5. Custom provider via decorator - Use @register_bridge(provider=MLAModelProvider) for MLA models

Breaking Changes

  • None for public API
  • Tests updated to expect GPTModelProvider instead of some model-specific providers

Checklist

  • Base class CONFIG_MAPPING covers common HF to Megatron field mappings
  • Base class ACTIVATION_MAPPING covers common activation functions
  • MoE-related mappings added to base class
  • MLA-related mappings added to base class
  • YARN rope scaling mappings added to base class
  • MLAModelProvider added for MLA-based models
  • LlamaBridge refactored
  • Qwen2Bridge refactored
  • Qwen3Bridge refactored
  • Qwen3MoEBridge refactored
  • DeepSeekV2Bridge refactored
  • DeepSeekV3Bridge refactored
  • KimiK2Bridge added (new)
  • GemmaBridge refactored
  • Gemma2Bridge refactored
  • Gemma3Bridge refactored
  • GLM45Bridge refactored
  • GPTOSSBridge refactored
  • Unit tests updated
  • Run full CI test suite
  • Follow-up PR: Remove unused model-specific provider size variants

Related

  • Design document: docs/proposals/provider_bridge_refactor.md

Summary by CodeRabbit

  • New Features

    • Added bidirectional configuration translation between HuggingFace and Megatron formats with nested field support.
    • Introduced YARN and RoPE scaling position embedding support.
    • Added activation function mapping for silu, gelu, relu, and tanh across frameworks.
    • Enabled model-type specification for bridge registration.
  • Improvements

    • Unified provider interfaces across multiple model architectures.
    • Enhanced model configuration handling with vocab sizing and dtype adjustments.
  • Deprecations

    • Legacy model provider classes now issue deprecation warnings.

✏️ Tip: You can customize this high-level summary in your review settings.

This refactoring centralizes model-specific configurations within the
provider_bridge method of each model bridge.

Changes:
- Add MoE-related field mappings to base class CONFIG_MAPPING:
  - num_experts -> num_moe_experts
  - num_experts_per_tok -> moe_router_topk
  - moe_intermediate_size -> moe_ffn_hidden_size

- Refactor LlamaBridge:
  - Use MEGATRON_DEFAULTS and HF_DEFAULTS class attributes
  - Override provider_bridge only for RoPE scaling (Llama 3.1/3.2)

- Refactor Qwen2Bridge:
  - Use MEGATRON_DEFAULTS (add_qkv_bias=True) and HF_DEFAULTS
  - No provider_bridge override needed

- Refactor Qwen3Bridge:
  - Use MEGATRON_DEFAULTS (qk_layernorm=True) and HF_DEFAULTS
  - No provider_bridge override needed

- Refactor Qwen3MoEBridge:
  - Use MEGATRON_DEFAULTS with MoE settings and HF_DEFAULTS
  - No provider_bridge override needed

- Update tests to expect GPTModelProvider instead of model-specific providers
- Add verification scripts for both Llama and Qwen bridges

Verified on remote server:
- Qwen/Qwen2-0.5B: PASS
- Qwen/Qwen2-7B: PASS
- Qwen/Qwen3-0.6B: PASS
- Qwen/Qwen3-1.7B: PASS
- Qwen/Qwen3-30B-A3B: PASS
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Jan 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…dels

- Add MLAModelProvider as unified base for Multi-Latent Attention models
- Refactor DeepSeek V2/V3 bridges to use MLAModelProvider
- Refactor Kimi K2 bridge to use MLAModelProvider
- Move model-specific defaults from providers to MEGATRON_DEFAULTS in bridges
- Add model_type parameter to @register_bridge decorator for auto HF config
- Simplify provider files to deprecated backward-compatible aliases

Verified: DeepSeek-V2-Lite, DeepSeek-V2, DeepSeek-V3, Moonlight-16B, Kimi-K2
- Register GemmaModelProvider, Gemma2ModelProvider, Gemma3ModelProvider via decorator
- Add MEGATRON_DEFAULTS to Gemma/Gemma2 bridges for explicit config defaults
- Add gelu_pytorch_tanh -> fast_gelu to ACTIVATION_MAPPING in model_bridge.py
- Add verification script for Gemma provider refactoring

Verified: gemma-2b, gemma-7b, gemma-2-2b, gemma-2-9b, gemma-2-27b,
         gemma-3-4b-it, gemma-3-12b-it, gemma-3-27b-it
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33 yaoyu-33 changed the title Refactor provider_bridge for Llama and Qwen models [model, refactor] refactor: Centralize provider_bridge config mapping in base class Jan 27, 2026
@yaoyu-33 yaoyu-33 marked this pull request as ready for review January 27, 2026 01:14
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test ca54e4f

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

This PR introduces bidirectional HuggingFace-to-Megatron configuration translation on MegatronModelBridge through new class attributes and methods, consolidates multiple model providers into shared base classes (GPTModelProvider, MLAModelProvider), and refactors bridge registration to accept provider and model_type parameters while replacing inline provider construction with attribute configuration patterns.

Changes

Cohort / File(s) Summary
Core Bridge Infrastructure
src/megatron/bridge/models/conversion/model_bridge.py
Added bidirectional config translation: CONFIG_MAPPING, ACTIVATION_MAPPING, YARN_ROPE_SCALING_MAPPING class attributes; new methods hf_config_to_provider_kwargs(), megatron_to_hf_config(), hf_to_megatron_activation(), megatron_to_hf_activation(). Extended register_bridge() and create_bridge_decorator() to accept optional provider and model_type parameters. Updated provider_bridge() to leverage new translation helpers.
Base Provider Updates
src/megatron/bridge/models/gpt_provider.py
Added position_embedding_type field supporting "yarn" mode; added rope_scaling, rope_scaling_factor, and rotary_scaling_factor fields. Updated provide() to pass rope scaling parameters to MCoreGPTModel.
New MLA Provider
src/megatron/bridge/models/mla_provider.py
New class MLAModelProvider combining MLATransformerConfig and GPTModelProvider for multi-latent attention model support.
DeepSeek Provider Consolidation
src/megatron/bridge/models/deepseek/deepseek_provider.py
Refactored DeepSeekModelProvider, DeepSeekV2ModelProvider, DeepSeekV2LiteModelProvider, DeepSeekV3ModelProvider, and MoonlightModelProvider16B to subclass MLAModelProvider with deprecation warnings. Added backward-compatibility aliases (DeepSeekProvider, DeepSeekV2Provider, etc.).
DeepSeek Bridge Updates
src/megatron/bridge/models/deepseek/deepseek_v2_bridge.py, src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py
Updated bridge decorators to use provider=MLAModelProvider and model_type parameters. Replaced inline provider construction with super().provider_bridge() followed by explicit attribute configuration (normalization, attention, MoE settings, fusion flags). Added AutoMapping entry for expert bias in v3 bridge.
Gemma Bridge Updates
src/megatron/bridge/models/gemma/gemma_bridge.py, src/megatron/bridge/models/gemma/gemma2_bridge.py, src/megatron/bridge/models/gemma/gemma3_bridge.py
Updated bridge decorators with provider and model_type parameters. Refactored provider_bridge() to call super().provider_bridge() and configure attributes in-place rather than constructing providers with parameter maps. Added imports for activation functions where needed.
GLM45 Bridge
src/megatron/bridge/models/glm/glm45_bridge.py
Replaced GLMMoEModelProvider with GPTModelProvider; added model_type="glm4_moe" to decorator. Updated provider_bridge() to use superclass method and configure MoE/optimization flags on the returned provider. Added self._hf_config storage in build_conversion_tasks().
GPT-OSS Bridge
src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py
Changed return type from GPTOSSProvider to GPTModelProvider. Added model_type="gpt_oss" decorator parameter. Introduced fallback for quick_gelu import; removed GenerationConfig, GptOssConfig imports. Replaced provider construction with super().provider_bridge() and attribute configuration including yarn-related settings.
Llama Bridge and Provider
src/megatron/bridge/models/llama/llama_bridge.py, src/megatron/bridge/models/llama/llama_provider.py
Updated bridge to return GPTModelProvider and added model_type="llama". Added megatron_to_hf_config() classmethod. Refactored provider_bridge() to use superclass method with RoPE scaling detection. Updated provider classes to use rope_scaling/rope_scaling_factor instead of NTK-specific parameters; removed apply_rope_scaling() function.
Qwen Bridge Updates
src/megatron/bridge/models/qwen/qwen2_bridge.py, src/megatron/bridge/models/qwen/qwen3_bridge.py, src/megatron/bridge/models/qwen/qwen3_moe_bridge.py
Added model_type parameters to decorators. Refactored provider_bridge() methods to use super().provider_bridge() and configure attributes instead of constructing new providers. Removed explicit type annotations and provider class imports.
Kimi Bridge
src/megatron/bridge/models/kimi/kimi_bridge.py, src/megatron/bridge/models/kimi/__init__.py
New KimiK2Bridge class implementing provider_bridge() and mapping_registry() for MLA-based Kimi K2 support. Exported KimiK2Bridge from module __init__.py.
Llama Bridge Tests
tests/unit_tests/models/llama/test_llama_bridge.py
Extensively rewritten test suite: replaced LlamaModelProvider with GPTModelProvider; added tests for registration, RoPE scaling, CONFIG_MAPPING, ACTIVATION_MAPPING, bidirectional config conversion, and AutoBridge integration.
Qwen Bridge Tests
tests/unit_tests/models/qwen/test_qwen3_bridge.py, tests/unit_tests/models/qwen/test_qwen3_moe_bridge.py
Updated tests to expect GPTModelProvider instead of model-specific providers; replaced Qwen3ModelProvider and Qwen3MoEModelProvider imports with GPTModelProvider; tightened mock configurations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Test Results For Major Changes ❓ Inconclusive PR description includes unit test updates but lacks documented test execution results, regression analysis, or convergence validation. PR objectives explicitly state 'CI full run remain[s] TODO', indicating testing was incomplete at submission. Provide full CI test execution results, regression analysis showing numerical outputs unchanged, and clarify whether deferred testing is acceptable as follow-up work.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change as centralizing provider_bridge config mapping in the base class, which accurately reflects the core purpose of this refactoring.
Docstring Coverage ✅ Passed Docstring coverage is 83.15% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/megatron/bridge/models/deepseek/deepseek_provider.py (2)

85-85: Type annotation mismatch: int = None should be Optional[int].

The annotation q_lora_rank: int = None is incorrect since None is not a valid int. This should be Optional[int] = None or int | None = None.

Proposed fix
-    q_lora_rank: int = None
+    q_lora_rank: Optional[int] = None

Also update the imports at line 16:

-from typing import List, Union
+from typing import List, Optional, Union

165-165: Same type annotation issue: int = None should be Optional[int].

Same issue as line 85 - q_lora_rank: int = None should use Optional[int].

🤖 Fix all issues with AI agents
In `@src/megatron/bridge/models/gemma/gemma3_bridge.py`:
- Around line 56-65: After you set
provider.fp16/provider.bf16/provider.params_dtype from dtype_from_hf, also set
provider.autocast_dtype to the same derived dtype so autocasting matches the VL
precision override; locate where provider.fp16/bf16/params_dtype are assigned
(using dtype_from_hf with hf_vl_config) and set provider.autocast_dtype =
self.dtype_from_hf(hf_vl_config, default=torch.float32) (or reuse the computed
params dtype) to keep autocast consistent with params.

In `@src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py`:
- Around line 83-86: When forcing BF16 in gpt_oss_bridge, clear the FP16 flag to
avoid conflicting dtype settings: in the same location where you set
provider.bf16 = True and provider.params_dtype = torch.bfloat16 (near
provider.hidden_dropout), also set provider.fp16 = False so the provider does
not have both fp16 and bf16 enabled; update the assignment in the GPT-OSS bridge
initialization (the code that modifies provider.hidden_dropout / provider.bf16 /
provider.params_dtype) to explicitly clear provider.fp16 when enabling BF16.

In `@src/megatron/bridge/models/gpt_provider.py`:
- Around line 139-145: The parameters rope_scaling, rope_scaling_factor, and
seq_len_interpolation_factor are being passed unconditionally to Megatron APIs;
to match the existing defensive pattern used for mtp_block_spec, check the
target function/class signatures (use inspect.signature where mtp_block_spec is
handled) before adding these kwargs, and only include them in the kwargs dict if
the inspected signature has those parameters; remove direct positional/keyword
assignments for rope_scaling/rope_scaling_factor/seq_len_interpolation_factor
and instead add them to the **kwargs conditionally (referencing the same
inspection logic used around mtp_block_spec).

In `@src/megatron/bridge/models/kimi/kimi_bridge.py`:
- Around line 33-75: The Kimi bridge's provider_bridge (method provider_bridge)
currently omits an explicit moe_aux_loss_coeff; add an explicit assignment
setting provider.moe_aux_loss_coeff = 1e-3 in provider_bridge (alongside the
other provider.* assignments) so the KimiK2Provider's auxiliary MoE loss
coefficient is set consistently with GLM45/Qwen3/DeepSeek V3 bridges.

In `@src/megatron/bridge/models/qwen/qwen2_bridge.py`:
- Around line 37-39: Update the class docstring in Qwen2 (which currently
mentions MEGATRON_DEFAULTS) to instead state that model-specific settings are
applied in provider_bridge; locate the docstring on the Qwen2 class in
qwen2_bridge.py (and any nearby reference to MegatronModelBridge’s
CONFIG_MAPPING/ACTIVATION_MAPPING) and replace the outdated reference to
MEGATRON_DEFAULTS with a concise note that provider_bridge handles
model-specific defaults.

In `@tests/unit_tests/models/llama/test_llama_bridge.py`:
- Around line 218-227: The test function
test_provider_bridge_rope_scaling_params currently accepts an unused fixture
parameter llama_config; remove that parameter from the test signature so it
becomes def test_provider_bridge_rope_scaling_params(self,
mock_pretrained_llama): and ensure any references to llama_config inside the
test are not present, then run the linter to confirm Ruff no longer flags the
unused argument; locate this test in
tests/unit_tests/models/llama/test_llama_bridge.py and update the function
signature accordingly (function name: test_provider_bridge_rope_scaling_params,
class uses LlamaBridge and mock_pretrained_llama).
🧹 Nitpick comments (1)
src/megatron/bridge/models/conversion/model_bridge.py (1)

244-310: Consider annotating mutable class attributes with ClassVar.

The CONFIG_MAPPING, YARN_ROPE_SCALING_MAPPING, and ACTIVATION_MAPPING are mutable class attributes that should ideally be annotated with typing.ClassVar to indicate they are class-level and not instance-level attributes.

Proposed fix

Add ClassVar to imports and annotate:

 from typing import (
     Callable,
+    ClassVar,
     Dict,
     Generic,
     ...
 )
-    CONFIG_MAPPING = [
+    CONFIG_MAPPING: ClassVar[list[tuple[str, str]]] = [
         # Core architecture
         ...
     ]

-    YARN_ROPE_SCALING_MAPPING = [
+    YARN_ROPE_SCALING_MAPPING: ClassVar[list[tuple[str, str]]] = [
         ...
     ]

-    ACTIVATION_MAPPING = {
+    ACTIVATION_MAPPING: ClassVar[dict[str, Callable]] = {
         ...
     }

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 1ed2069

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test ab24acf

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test b6f8e29

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test fa03b2c

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Jan 28, 2026

/ok to test fa03b2c

@yaoyu-33, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 167055a

@yaoyu-33
Copy link
Copy Markdown
Contributor Author

yaoyu-33 commented Feb 3, 2026

/ok to test f9a3231

OLMoE HF config doesn't have head_dim attribute, so kv_channels was
left as None. This fix calculates it as hidden_size // num_attention_heads
(2048 // 16 = 128 for OLMoE-1B-7B).

This follows the pattern used by MistralBridge and NemotronHBridge.
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

yaoyu-33 commented Feb 3, 2026

/ok to test d5b7890

@yaoyu-33
Copy link
Copy Markdown
Contributor Author

yaoyu-33 commented Feb 3, 2026

/ok to test ada7d05

@yaoyu-33
Copy link
Copy Markdown
Contributor Author

yaoyu-33 commented Feb 3, 2026

/ok to test 5f24f9b

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

yaoyu-33 commented Feb 3, 2026

/ok to test 057175f

Copy link
Copy Markdown
Contributor

@cuichenx cuichenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants