260201: Cherrypick various changes by ko3n1g · Pull Request #2509 · NVIDIA-NeMo/Megatron-Bridge

ko3n1g · 2026-02-24T10:41:40Z

What does this PR do ?

Includes:

Does not include:

~~feat: default packed_sequence to True in all finetune recipes #2284 (will be handled separately)~~
~~Fix LLAMA3 LoRa TFLOPs Formula #2416 (not ready/reverted)~~
Add HybridEP support for Qwen3 235B H100 #2566 (potentially not included due to perf regression)
Onboard NVFP4 and MXFP8 recipes #2600 (too late)

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

New Features
- Added comprehensive HuggingFace ↔ Megatron configuration translation with activation function mapping support.
- Introduced Kimi K2 and Nemotron 3 Nano model support with dedicated pretraining configurations.
- Enhanced model bridge registration with explicit provider and model type metadata.
Improvements
- Simplified recipe APIs to parameterless functions for cleaner configuration.
- Added rope scaling support for improved model compatibility.
- Improved optimizer defaults (Adam epsilon adjusted for better numerical stability).
- Improved performance for SFT
- Performance improvement for Qwen 30B_a3B on B200
Bug Fixes
- Removed incompatible generation_config propagation from providers.
- Fixed configuration overrides for DeepSeek and Qwen models.

… in base class (#2052) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

…config logging (#2184) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

Signed-off-by: oliver könig <okoenig@nvidia.com>

… in base class for Nemotron models (#2225) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

…retrain recipes (#2288) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: malay-nagda <malayn@nvidia.com>

Signed-off-by: Malay Nagda <malayn@nvidia.com>

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

Signed-off-by: oliver könig <okoenig@nvidia.com>

coderabbitai · 2026-02-24T10:47:24Z

📝 Walkthrough

Walkthrough

This PR refactors Megatron Bridge recipes and model providers to use parameterless APIs with post-instantiation configuration mutation. Major changes include centralizing HF↔Megatron config translation on MegatronModelBridge, updating all model bridges to use super().provider_bridge() with in-place provider configuration, removing generation_config propagation, adding rope_scaling support, introducing new Kimi K2 and Nemotron 3 Nano configurations, and updating optimizer epsilon defaults from 1e-5 to 1e-8.

Changes

Cohort / File(s)	Summary
Model Bridge Core Architecture `src/megatron/bridge/models/conversion/model_bridge.py`	Introduces comprehensive HF↔Megatron config translation infrastructure: adds CONFIG_MAPPING, YARN_ROPE_SCALING_MAPPING, MLA_ROPE_SCALING_MAPPING, ACTIVATION_MAPPING; implements hf_config_to_provider_kwargs and megatron_to_hf_config methods; adds PROVIDER_CLASS, SOURCE_NAME, MODEL_TYPE attributes; extends register_bridge and create_bridge_decorator to accept provider and model_type parameters.
Individual Model Bridges `src/megatron/bridge/models/.../...bridge.py`	Updates ~15 bridge files (Gemma, Llama, DeepSeek, Qwen, GLM, Nemotron, NemotronH, OlMoE, MistralMLA, GPT-OSS) to: register with provider and model_type; replace explicit provider construction with super().provider_bridge() + in-place attribute assignment; change return types to GPTModelProvider or MLAModelProvider where applicable.
Model Providers `src/megatron/bridge/models/...provider.py`	Updates provider classes (GPTModelProvider adds rope_scaling/rope_scaling_factor/rotary_scaling_factor; DeepSeekProvider shifts to MLAModelProvider; Llama providers remove manual RoPE scaling; removes generation_config from multiple providers); introduces new MLAModelProvider.
Recipe Common Base `src/megatron/bridge/recipes/common.py`	Adds new _pretrain_common() helper that returns a ConfigContainer with comprehensive default training configurations (optimizer, scheduler, DDP, logger, etc.), serving as a shared base for all pretrain recipes.
Recipe Implementations `src/megatron/bridge/recipes/.../...py` (~15 files)	Replaces parameterized pretrain config functions with parameterless factories that use _pretrain_common() and explicitly assign model, tokenizer, dataset, parallelism, MoE, training, and precision fields post-instantiation; removes TypedDict-based kwargs and internal helper functions.
Performance Configuration Scripts `scripts/performance/configs/.../...llm_pretrain.py`	Updates config builders (Llama, DeepSeek, GPT-OSS, Qwen, Nemotron, Kimi) to call pretrain_config() without arguments and then assign cfg.mixed_precision, cfg.model fields via post-creation mutation instead of constructor parameters.
Kimi K2 New Configurations `scripts/performance/configs/kimi/*`	Adds new Kimi K2 performance preset module with init.py (conditional exports), kimi_llm_pretrain.py (5 GPU-variant pretrain functions), and kimi_workload_base_configs.py (base config and 13 preset variants); includes _get_kimi_k2_pipeline_layout helper.
Nemotron 3 Nano New Configurations `scripts/performance/configs/nemotronh/nemotron_3_nano*`	Adds Nemotron 3 Nano performance module with init.py (conditional exports), nemotron_3_nano_llm_pretrain.py (5 GPU-variant functions), and nemotron_3_nano_workload_base_configs.py (11 preset variants); supports BF16 and FP8 precision variants.
Optimizer & Config Utilities `src/megatron/bridge/recipes/utils/optimizer_utils.py`, `src/megatron/bridge/training/config.py`, `src/megatron/bridge/training/setup.py`	Updates Adam epsilon default from 1e-5 to 1e-8; adds log_non_default_values() method to TrainingConfig for logging config deviations.
Overrides & Path Handling `scripts/performance/utils/overrides.py`, `scripts/performance/utils/utils.py`	Adds Kimi K2 pipeline layout inference in user overrides; updates get_library_recipe to use parameterless recipe functions and configure output directories post-instantiation.
Example & Test Updates `examples/quantization/pretrain_quantized_llama3_8b.py`, `tests/functional_tests/...`, `tests/unit_tests/....py`	Updates example to use parameterless pretrain_config() with post-creation HF path injection; removes provider-mapping validation tests; adds perf config integration tests; updates numerous test assertions to reflect new provider types (GPTModelProvider vs model-specific), parameterless APIs, and removal of generation_config expectations.
Removed Test Modules `tests/functional_tests/models/.../test_*_provider.py` (7 files)	Removes provider equivalence tests that validated bridge-generated configs against predefined provider classes for DeepSeek, Gemma, GLM, Llama Nemotron, NemotronH, and Qwen variants.

Sequence Diagrams

sequenceDiagram
    participant Client as Config Builder
    participant Recipe as pretrain_config()
    participant Common as _pretrain_common()
    participant HFBridge as AutoBridge
    participant Provider as GPTModelProvider
    participant Config as ConfigContainer

    Client->>Recipe: Call pretrain_config() [no args]
    Recipe->>Common: cfg = _pretrain_common()
    Common->>Config: Create ConfigContainer w/ defaults
    Common-->>Recipe: Return cfg
    Recipe->>HFBridge: Load HF model (if hf_path provided)
    HFBridge->>Provider: Convert to Megatron provider
    Provider-->>Recipe: Return provider
    Recipe->>Config: cfg.model = provider
    Recipe->>Config: cfg.mixed_precision = precision_config
    Recipe->>Config: cfg.dataset = {...}
    Recipe->>Config: cfg.train = {...}
    Recipe-->>Client: Return fully configured cfg

sequenceDiagram
    participant HFModel as HF Pretrained
    participant Bridge as Model Bridge
    participant SuperClass as MegatronModelBridge
    participant Provider as GPTModelProvider
    participant Translator as Config Translator

    HFModel->>Bridge: Load model & config
    Bridge->>SuperClass: Call super().provider_bridge()
    SuperClass->>Translator: Apply hf_config_to_provider_kwargs()
    Translator->>Translator: Map CONFIG_MAPPING
    Translator->>Translator: Apply ACTIVATION_MAPPING
    Translator->>Translator: Handle rope_scaling variants
    Translator-->>SuperClass: Return provider_kwargs dict
    SuperClass->>Provider: Instantiate with base defaults
    Provider-->>Bridge: Return provider
    Bridge->>Provider: Set rope_scaling = ...
    Bridge->>Provider: Set activation_func = ...
    Bridge->>Provider: Set model-specific fields
    Bridge-->>HFModel: Return configured provider

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

PR #2052: Implements identical provider_bridge refactor with CONFIG_MAPPING/PROVIDER_CLASS/activation mappings and shift to GPTModelProvider/MLAModelProvider with in-place configuration across all bridges.
PR #2067: Introduces _pretrain_common helper and converts multiple recipe files to parameterless ConfigContainer-returning factories using the same pattern.
PR #2097: Adds Kimi K2 performance config modules (scripts/performance/configs/kimi/*) with identical structure and _get_kimi_k2_pipeline_layout integration.

Suggested labels

r0.3.0, Run CICD, cherry-pick

Suggested reviewers

malay-nagda

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR with major breaking API changes (recipe signature refactoring, provider implementations, optimizer epsilon modification) lacks documented test results or validation evidence.	Add test results, CI validation, convergence analysis for adam_eps change (1e-5→1e-8), and regression testing documentation to PR description.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title '260201: Cherrypick various changes' is descriptive and specific about the nature of the work (cherry-picking multiple upstream changes), though it does not enumerate all affected areas.
Docstring Coverage	✅ Passed	Docstring coverage is 87.68% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ko3n1g/cp/260201

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (14)

tests/unit_tests/recipes/test_deepseek_recipes.py (1)

87-98: ⚠️ Potential issue | 🟡 Minor

Add a unit category marker for this test.

The updated test should be tagged to satisfy test categorization requirements.

✅ Suggested change

 `@pytest.mark.parametrize`("recipe_func", _DEEPSEEK_RECIPE_FUNCS)
+@pytest.mark.unit
 def test_each_deepseek_recipe_builds_config(recipe_func: Callable, monkeypatch: pytest.MonkeyPatch):

As per coding guidelines, "Use pytest.mark to categorize tests (unit, integration, system)".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/recipes/test_deepseek_recipes.py` around lines 87 - 98, This
test needs a pytest unit marker: add `@pytest.mark.unit` above the test function
that contains the code using cfg and _assert_basic_config (the test in
tests/unit_tests/recipes/test_deepseek_recipes.py that asserts tokenizer
behavior), and ensure pytest is imported (import pytest) at the top of the file
if not already present; keep the existing assertions and logic unchanged.

tests/functional_tests/models/deepseek/test_deepseek_conversion.py (5)

96-97: ⚠️ Potential issue | 🟡 Minor

Remove the duplicate save_pretrained call.

model.save_pretrained(model_dir, safe_serialization=True) is called identically on two consecutive lines — the second write is redundant and re-serialises all weights for no benefit.
🐛 Proposed fix
-        model.save_pretrained(model_dir, safe_serialization=True)
         model.save_pretrained(model_dir, safe_serialization=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/deepseek/test_deepseek_conversion.py` around
lines 96 - 97, The test contains a duplicated call to
model.save_pretrained(model_dir, safe_serialization=True): remove the redundant
second invocation so the model is only serialized once; keep the remaining call
to model.save_pretrained with safe_serialization=True to preserve the intended
behavior.
96-97: ⚠️ Potential issue | 🟡 Minor

Remove the duplicate save_pretrained call.

model.save_pretrained(model_dir, safe_serialization=True) is called identically on two consecutive lines. The second call writes the same files again, wasting I/O and potentially overwriting any intermediate state.
🐛 Proposed fix
-        model.save_pretrained(model_dir, safe_serialization=True)
         model.save_pretrained(model_dir, safe_serialization=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/deepseek/test_deepseek_conversion.py` around
lines 96 - 97, The test contains a duplicate call to
model.save_pretrained(model_dir, safe_serialization=True); remove the redundant
second invocation so the model is only saved once (keep the first call that uses
model.save_pretrained with model_dir and safe_serialization=True) to avoid
unnecessary I/O and potential overwrites.
92-93: ⚠️ Potential issue | 🟡 Minor

Narrow the except clause — swallowing all exceptions hides real failures.

Per coding guidelines, the except clause should be limited to the smallest set of expected errors. A bare except Exception: pass will silently absorb OSError, ConnectionError, permission errors, etc., making CI failures hard to diagnose.
♻️ Suggested fix
-        except Exception:
-            pass
+        except OSError:
+            # Non-fatal: fallback tokenizer is optional for conversion tests
+            pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/deepseek/test_deepseek_conversion.py` around
lines 92 - 93, The bare "except Exception: pass" in the test should be narrowed:
replace it with explicit expected exceptions (for example FileNotFoundError,
OSError, ConnectionError or the specific exception your try block can raise) and
handle them (or log them) instead of silently swallowing; for anything truly
unexpected either re-raise or log and fail the test. Locate the try/... except
block that currently uses "except Exception: pass" and change it to "except
<ExpectedError1>, <ExpectedError2> as e: ..." (or catch Exception as e and
re-raise unless isinstance(e, (ExpectedError1, ...))) so only anticipated errors
are suppressed and unexpected failures surface.
92-93: ⚠️ Potential issue | 🟡 Minor

Narrow the except clause — bare except Exception: pass silently swallows all failures.

This violates the project guideline: "limit the except clause to the smallest set of errors possible." A broad silent swallow hides connectivity issues, permission errors, etc. At minimum, catch only the expected failure mode and log it.
♻️ Suggested fix
-        except Exception:
-            pass
+        except OSError:
+            # Network or file-system error; tokenizer is optional for conversion tests
+            pass
69-80: ⚠️ Potential issue | 🟡 Minor

Replace deprecated use_auth_token with token parameter.

The use_auth_token parameter is deprecated in transformers (removed in v5.0.0). Replace with token on line 76:
token=None,
This parameter appears in multiple test files and should be updated consistently across the codebase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/models/deepseek/test_deepseek_conversion.py` around
lines 69 - 80, Update the deprecated parameter in the
get_class_from_dynamic_module call: replace the removed use_auth_token argument
with the new token argument (i.e., change use_auth_token=None to token=None) in
the invocation of get_class_from_dynamic_module so the call signature matches
transformers v5+; search other test files for additional uses of use_auth_token
and update them similarly to token to keep the codebase consistent.

tests/unit_tests/models/qwen/test_qwen3_moe_bridge.py (1)

118-126: ⚠️ Potential issue | 🟡 Minor

Add pytest.mark category to these unit tests.

Please mark these tests as unit tests to align with repo standards.
✅ Suggested change
+@pytest.mark.unit
 class TestQwen3MoEBridge:
As per coding guidelines: Use 'pytest.mark' to categorize tests (unit, integration, system).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/models/qwen/test_qwen3_moe_bridge.py` around lines 118 -
126, Mark the test function test_provider_bridge_basic with the unit category
using pytest.mark: add `@pytest.mark.unit` above the test definition, ensure
pytest is imported in the test module, and apply the same decoration to other
tests in this file that are unit tests (e.g., any tests exercising
Qwen3MoEBridge.provider_bridge) so they comply with the repo's pytest.mark
categorization.

tests/unit_tests/recipes/utils/test_optimizer_utils.py (1)

15-15: ⚠️ Potential issue | 🟡 Minor

Typo in module docstring: "optinizer_utils" → "optimizer_utils".

✏️ Fix typo

-"""Tests for optinizer_utils module."""
+"""Tests for optimizer_utils module."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/recipes/utils/test_optimizer_utils.py` at line 15, Fix the
typo in the module docstring of the test file test_optimizer_utils.py: change
the word "optinizer_utils" to "optimizer_utils" in the top-level docstring so
the module name is spelled correctly.

src/megatron/bridge/recipes/qwen/qwen2.py (1)

16-16: 🛠️ Refactor suggestion | 🟠 Major

Use modern union syntax per coding guidelines.

This file uses Optional[T], Union[X, Y], and List[T] from typing throughout (e.g., Qwen2CommonKwargs, Qwen2FinetuneKwargs, _qwen2_finetune_common signature), while llama3.py in the same PR correctly uses T | None, X | Y, and list[T]. The coding guidelines require the modern syntax.
♻️ Example fix for the import and TypedDict
-from typing import List, Optional, Union
+# No typing import needed for these; use built-in generics and | syntax
 class Qwen2CommonKwargs(TypedDict, total=False):
-    dir: Optional[str]
+    dir: str | None
-    data_paths: Optional[List[str]]
+    data_paths: list[str] | None
-    pipeline_dtype: Optional[torch.dtype]
+    pipeline_dtype: torch.dtype | None
-    precision_config: Optional[Union[MixedPrecisionConfig, str]]
+    precision_config: MixedPrecisionConfig | str | None
     # ... apply same pattern throughout
As per coding guidelines: "Use 'T | None' for nullable types instead of 'Optional[T]'", "Use 'X | Y' for union types instead of 'Union[X, Y]'", and "Use built-in generics (list, dict, tuple) instead of typing equivalents".

Also applies to: 40-78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/qwen/qwen2.py` at line 16, Replace usages of
typing generics with modern built-in and PEP 604 union syntax across this
module: remove List, Optional, Union imports and use list[T], T | None, and X |
Y respectively. Update the TypedDicts Qwen2CommonKwargs and Qwen2FinetuneKwargs
and any function signatures such as _qwen2_finetune_common to use list[...]
instead of List[...], T | None instead of Optional[T], and A | B instead of
Union[A, B]; ensure any default None annotations remain compatible. Keep only
necessary imports from typing (e.g., TypedDict) and run a quick typecheck to
confirm no remaining typing.List/Optional/Union usages.

tests/unit_tests/models/llama/test_llama_bridge.py (1)

37-122: ⚠️ Potential issue | 🟡 Minor

Add type hints to fixtures and test methods

The new fixtures/tests omit argument and return type hints. Please annotate fixtures with concrete return types and tests with -> None (and type fixture parameters) to satisfy typing rules; apply this consistently across the module.

💡 Example (apply broadly)

-    def llama_3_2_1b_config_dict(self):
+    def llama_3_2_1b_config_dict(self) -> dict[str, object]:
         """Create a sample Llama 3.2 1B configuration with RoPE scaling."""
         return {
             "architectures": ["LlamaForCausalLM"],
             ...
         }

-    def test_provider_bridge_returns_gpt_provider(self, mock_pretrained_llama_2):
+    def test_provider_bridge_returns_gpt_provider(
+        self, mock_pretrained_llama_2: PreTrainedCausalLM
+    ) -> None:
         """Test that provider_bridge returns a GPTModelProvider for Llama 2."""
         bridge = LlamaBridge()

As per coding guidelines, Use type hints for function arguments and return types.

Also applies to: 123-341, 343-458, 461-515, 535-598, 601-638, 706-855

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests/models/llama/test_llama_bridge.py` around lines 37 - 122,
Several fixtures and test functions in this module (e.g.,
llama_3_2_1b_config_dict, llama_2_7b_config_dict, llama_config, llama_2_config,
mock_pretrained_llama, mock_pretrained_llama_2 and the test functions referenced
later) are missing type hints; update each fixture signature to include
parameter types for injected fixtures and an explicit return type (e.g., ->
Dict[str, Any] or -> LlamaConfig or -> Mock[PreTrainedCausalLM] as appropriate),
and annotate all test functions with -> None; ensure imports for typing (Dict,
Any, Optional, Iterator, etc.) are present if needed and apply the same pattern
consistently across the module.

src/megatron/bridge/recipes/kimi/kimi_k2.py (1)

25-46: ⚠️ Potential issue | 🟡 Minor

Fix _get_kimi_k2_pipeline_layout typing and docstring

vp_size is treated as nullable, but the signature doesn’t reflect that, and the return type is missing. Also, the docstring should follow Google style.

💡 Suggested change

-def _get_kimi_k2_pipeline_layout(pp_size: int, vp_size: int):
-    """Get pipeline layout for Kimi-K2 based on PP and VP size."""
+def _get_kimi_k2_pipeline_layout(pp_size: int, vp_size: int | None) -> list[list[str]] | None:
+    """Get pipeline layout for Kimi-K2 based on PP and VP size.
+
+    Args:
+        pp_size: Pipeline parallel size.
+        vp_size: Virtual pipeline size (defaults to 1 when None).
+
+    Returns:
+        Pipeline layout per stage, or None when no layout is required.
+    """

As per coding guidelines, Use type hints for function arguments and return types; Use 'T | None' for nullable types instead of 'Optional[T]'; Use Google style docstrings (parseable by Sphinx) for classes and functions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/kimi/kimi_k2.py` around lines 25 - 46, Update the
_get_kimi_k2_pipeline_layout signature to use explicit nullable typing for
vp_size (use int | None) and add a return type of list[list[str]] | None; update
the docstring to Google style with a brief description, Args (pp_size, vp_size),
and Returns (pipeline layout or None). Keep the existing logic that normalizes
vp_size to 1 when None (vp_size = 1 if vp_size is None else vp_size) and ensure
the map and returned layout types match the annotated return type. Use the 'T |
None' form (not Optional[T]) and ensure the function name
_get_kimi_k2_pipeline_layout is the one you modify.

src/megatron/bridge/models/llama/llama_bridge.py (1)

15-31: ⚠️ Potential issue | 🟡 Minor

Rename module logger to comply with global naming rule

Module-level globals should be upper snake case with a G_ prefix.
Suggested fix
-logger = logging.getLogger(__name__)
+G_LOGGER = logging.getLogger(__name__)
As per coding guidelines: Use upper snake_case and prefix 'G' for global variables (e.g., G_MY_GLOBAL).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/llama/llama_bridge.py` around lines 15 - 31, The
module-level logger variable `logger` violates the global naming rule; rename it
to an upper snake-case global prefixed with G (e.g., `G_LOGGER`) and update all
references in this module (e.g., any use of `logger` in this file) to the new
name; ensure you only change the variable name and not the logging import or
behavior so existing logging calls (info/debug/error) continue to work with
`G_LOGGER`.

src/megatron/bridge/models/deepseek/deepseek_provider.py (1)

16-109: ⚠️ Potential issue | 🟡 Minor

Use PEP 604 unions and built‑in generics in type hints

Replace Optional, Union, and List with T | None, X | Y, and list[T] respectively, and remove unused typing imports.

Suggested fix

-from typing import Callable, List, Optional, Union
+from typing import Callable
@@
-    q_lora_rank: Optional[int] = 1536
+    q_lora_rank: int | None = 1536
@@
-    moe_layer_freq: Union[int, List[int]] = field(default_factory=lambda: [0] + [1] * 59)  # first layer is dense
+    moe_layer_freq: int | list[int] = field(default_factory=lambda: [0] + [1] * 59)  # first layer is dense
@@
-    q_lora_rank: Optional[int] = None
+    q_lora_rank: int | None = None
@@
-    moe_layer_freq: Union[int, List[int]] = field(default_factory=lambda: [0] + [1] * 26)  # first layer is dense
+    moe_layer_freq: int | list[int] = field(default_factory=lambda: [0] + [1] * 26)  # first layer is dense
@@
-    moe_layer_freq: Union[int, List[int]] = field(
+    moe_layer_freq: int | list[int] = field(
         default_factory=lambda: [0] * 3 + [1] * 58
     )  # first three layers are dense
@@
-    moe_layer_freq: Union[int, List[int]] = field(default_factory=lambda: [0] * 1 + [1] * 26)  # first layer is dense
+    moe_layer_freq: int | list[int] = field(default_factory=lambda: [0] * 1 + [1] * 26)  # first layer is dense

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/deepseek/deepseek_provider.py` around lines 16 -
109, Update type hints to modern PEP 604 and builtin generics and remove unused
typing imports: replace Optional[int] with int | None for fields like
DeepSeekModelProvider.q_lora_rank and DeepSeekV2LiteModelProvider.q_lora_rank,
replace Union[int, List[int]] with int | list[int] for
DeepSeekV2ModelProvider.moe_layer_freq, and replace List[int] with list[int]
wherever used (e.g., the lambda default factories). Remove unused imports
Optional, Union, List from the typing import list and tidy up the imports
accordingly.

src/megatron/bridge/models/qwen/qwen2_bridge.py (1)

28-57: ⚠️ Potential issue | 🟡 Minor

Add type hints and Google-style docstring to provider_bridge

The method currently lacks type annotations and a complete docstring. Per coding guidelines, all public methods must include type hints for arguments and return types, plus Google-style docstrings with Args and Returns sections.

Suggested fix

+from megatron.bridge.models.gpt_provider import GPTModelProvider
+from megatron.bridge.models.hf_pretrained.causal_lm import PreTrainedCausalLM
 from megatron.bridge.models.conversion.mapping_registry import MegatronMappingRegistry
 from megatron.bridge.models.conversion.model_bridge import MegatronModelBridge
 from megatron.bridge.models.conversion.param_mapping import (
     AutoMapping,
     GatedMLPMapping,
     QKVMapping,
 )
@@
-    def provider_bridge(self, hf_pretrained):
-        """Convert HuggingFace Qwen2 config to GPTModelProvider."""
+    def provider_bridge(self, hf_pretrained: PreTrainedCausalLM) -> GPTModelProvider:
+        """Convert HuggingFace Qwen2 config to GPTModelProvider.
+
+        Args:
+            hf_pretrained: HuggingFace PreTrainedCausalLM instance.
+
+        Returns:
+            GPTModelProvider configured for Qwen2.
+        """
         provider = super().provider_bridge(hf_pretrained)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/qwen/qwen2_bridge.py` around lines 28 - 57, The
provider_bridge method in Qwen2Bridge lacks type annotations and a Google-style
docstring; update the signature to include type hints (e.g., def
provider_bridge(self, hf_pretrained: Union[str, "PreTrainedModel"]) ->
"GPTModelProvider") and add a Google-style docstring with Args and Returns
sections describing hf_pretrained and the returned GPTModelProvider, and
import/forward-reference typing symbols (Union, PreTrainedModel,
GPTModelProvider) as needed to satisfy annotations.

♻️ Duplicate comments (7)

scripts/performance/configs/nemotronh/nemotronh_llm_pretrain.py (1)

34-35: Same unused mock parameter as in the llama31 configs.

Same issue as flagged in llama31_llm_pretrain.py — the mock: bool = True parameter is declared but unused in all five function signatures.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/nemotronh/nemotronh_llm_pretrain.py` around lines
34 - 35, The parameter mock: bool = True is declared but never used; remove the
unused mock parameter from the function signature(s) (e.g.,
nemotronh_56b_pretrain_config_gb300) and any other pretrain config functions in
this file so signatures no longer include mock, or alternatively wire the mock
flag into the function logic if a mocked behavior is intended—update all
referenced function definitions and any callers to match the new signature
(adjust imports/usage accordingly).

src/megatron/bridge/recipes/qwen/qwen3.py (5)

257-335: Same long-line issue repeats in this recipe block.

As per coding guidelines, "Maximum line length is 119 characters (matching ruff configuration)".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/qwen/qwen3.py` around lines 257 - 335, The
qwen3_4b_pretrain_config function contains lines longer than the 119-character
limit (e.g., the AutoBridge.from_hf_pretrained assignment, long
comment/documentation strings, and some cfg assignments like
cfg.tokenizer.tokenizer_model and dataset.blend); shorten them by breaking long
string literals and expressions across multiple lines (use implicit continuation
with parentheses or assign parts to intermediate variables), wrap long comments
to multiple shorter lines, and split chained attribute assignments into multiple
statements where needed so every line ≤119 chars while keeping the same
identifiers (qwen3_4b_pretrain_config, AutoBridge.from_hf_pretrained,
cfg.tokenizer.tokenizer_model, cfg.dataset.blend, etc.).

500-570: Same long-line issue repeats in this recipe block.

As per coding guidelines, "Maximum line length is 119 characters (matching ruff configuration)".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/qwen/qwen3.py` around lines 500 - 570, The
qwen3_32b_pretrain_config function contains several lines exceeding the
119-character limit (e.g., the long assignment/comment lines such as cfg.model =
AutoBridge.from_hf_pretrained("Qwen/Qwen3-32B").to_megatron_provider(load_weights=False),
the cfg.dataset.blend explanatory comment, and the checkpoint override comment
block); wrap these long expressions and comments across multiple shorter
strings/lines and/or split chained calls and long comments into concatenated
pieces so each source line stays <=119 characters while preserving the same
values and wording (edit the qwen3_32b_pretrain_config function, focusing on the
long lines noted above and similar long comments/assignments).

419-497: Same long-line issue repeats in this recipe block.

As per coding guidelines, "Maximum line length is 119 characters (matching ruff configuration)".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/qwen/qwen3.py` around lines 419 - 497, The
function qwen3_14b_pretrain_config contains multiple lines exceeding the
119-char limit (e.g., the AutoBridge.from_hf_pretrained assignment, long comment
lines, and long cfg.* assignment/comments). Fix by wrapping long string literals
and long chained assignments across lines (use implicit continuation with
parentheses for the AutoBridge.from_hf_pretrained call and split long comments
into multiple shorter comment lines), and break any long cfg.* comments or
example lines into shorter lines; target symbols to edit include
qwen3_14b_pretrain_config, cfg.model =
AutoBridge.from_hf_pretrained("Qwen/Qwen3-14B"), cfg.tokenizer.tokenizer_model,
and long comment lines shown in the block so every line <= 119 chars.

338-416: Same long-line issue repeats in this recipe block.

As per coding guidelines, "Maximum line length is 119 characters (matching ruff configuration)".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/qwen/qwen3.py` around lines 338 - 416, The
qwen3_8b_pretrain_config function contains lines exceeding the 119-char limit
(repeat violation). Fix by breaking long comment strings and long assignments
inside qwen3_8b_pretrain_config: wrap/line-break long comments into multiple
shorter comment lines, split long expressions (e.g., the cfg.dataset.blend
example tuple, long inline comments for FP8/mixed_precision) into multiple
concatenated strings or separate local variables, and prefer short helper
variables for long values (e.g., model_id = "Qwen/Qwen3-8B") used in cfg.model
and cfg.tokenizer assignments so no single source line exceeds 119 chars; update
any other long cfg.* assignment lines similarly. Ensure each modified line is
<=119 chars and preserves the same values and behavior.

176-255: Same long-line issue repeats in this recipe block.

As per coding guidelines, "Maximum line length is 119 characters (matching ruff configuration)".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/qwen/qwen3.py` around lines 176 - 255, Several
lines in qwen3_1p7b_pretrain_config exceed the 119-character limit; locate and
break up long statements such as the chained call cfg.model =
AutoBridge.from_hf_pretrained("Qwen/Qwen3-1.7B").to_megatron_provider(load_weights=False),
the long comment/assignment for cfg.dataset.blend, and the long checkpoint
example comment lines so each physical line is <=119 chars. Fix by splitting
long strings/comments across multiple concatenated literals or by introducing a
short local variable (e.g., pretrained_id = "Qwen/Qwen3-1.7B") and using that in
calls, and wrap long comments into multiple lines; ensure no functional change
to qwen3_1p7b_pretrain_config, cfg.dataset.blend, or the checkpoint comment
examples.

src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py (1)

112-113: Same docstring‑style guidance as above applies here.

As per coding guidelines, "Use Google style docstrings (parseable by Sphinx) for classes and functions".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py` around lines 112 -
113, The function-level docstring above the line "global_name =
task.global_param_name" is not Google-style; update the docstring to a parseable
Google style (short description, blank line, Args: with parameter descriptions,
Returns: with return type/meaning, and Raises: if applicable) for the function
in deepseek_v3_bridge.py that handles adding the rotary embedding inverse
frequency parameter (the routine that references task.global_param_name / local
variable global_name). Ensure the docstring explains the purpose, documents
parameters (e.g., task, any flags), describes the return value (if any), and
notes exceptions so Sphinx can parse it.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c106d06 and cb434ff.

📒 Files selected for processing (129)

examples/quantization/pretrain_quantized_llama3_8b.py
scripts/performance/configs/deepseek/deepseek_llm_pretrain.py
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py
scripts/performance/configs/kimi/__init__.py
scripts/performance/configs/kimi/kimi_llm_pretrain.py
scripts/performance/configs/kimi/kimi_workload_base_configs.py
scripts/performance/configs/llama/llama31_llm_pretrain.py
scripts/performance/configs/llama/llama3_llm_pretrain.py
scripts/performance/configs/nemotronh/__init__.py
scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py
scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py
scripts/performance/configs/nemotronh/nemotronh_llm_pretrain.py
scripts/performance/configs/qwen/qwen3_llm_pretrain.py
scripts/performance/configs/qwen/qwen3_workload_base_configs.py
scripts/performance/utils/overrides.py
scripts/performance/utils/utils.py
src/megatron/bridge/models/conversion/model_bridge.py
src/megatron/bridge/models/deepseek/common.py
src/megatron/bridge/models/deepseek/deepseek_provider.py
src/megatron/bridge/models/deepseek/deepseek_v2_bridge.py
src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py
src/megatron/bridge/models/gemma/gemma2_bridge.py
src/megatron/bridge/models/gemma/gemma3_bridge.py
src/megatron/bridge/models/gemma/gemma_bridge.py
src/megatron/bridge/models/glm/glm45_bridge.py
src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py
src/megatron/bridge/models/gpt_provider.py
src/megatron/bridge/models/llama/llama_bridge.py
src/megatron/bridge/models/llama/llama_provider.py
src/megatron/bridge/models/llama_nemotron/llama_nemotron_bridge.py
src/megatron/bridge/models/mistral/mistral_bridge.py
src/megatron/bridge/models/mla_provider.py
src/megatron/bridge/models/nemotron/nemotron_bridge.py
src/megatron/bridge/models/nemotron_vl/nemotron_vl_provider.py
src/megatron/bridge/models/nemotronh/__init__.py
src/megatron/bridge/models/nemotronh/nemotron_h_bridge.py
src/megatron/bridge/models/olmoe/olmoe_bridge.py
src/megatron/bridge/models/qwen/qwen2_bridge.py
src/megatron/bridge/models/qwen/qwen3_bridge.py
src/megatron/bridge/models/qwen/qwen3_moe_bridge.py
src/megatron/bridge/models/qwen/qwen3_next_bridge.py
src/megatron/bridge/models/qwen_vl/qwen25_vl_bridge.py
src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py
src/megatron/bridge/recipes/common.py
src/megatron/bridge/recipes/deepseek/deepseek_v2.py
src/megatron/bridge/recipes/deepseek/deepseek_v3.py
src/megatron/bridge/recipes/gemma/gemma2.py
src/megatron/bridge/recipes/gemma/gemma3.py
src/megatron/bridge/recipes/glm/glm45.py
src/megatron/bridge/recipes/gpt/gpt3_175b.py
src/megatron/bridge/recipes/gpt_oss/gpt_oss.py
src/megatron/bridge/recipes/kimi/kimi_k2.py
src/megatron/bridge/recipes/llama/llama2.py
src/megatron/bridge/recipes/llama/llama3.py
src/megatron/bridge/recipes/moonlight/moonlight_16b.py
src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py
src/megatron/bridge/recipes/nemotronh/nemotron_nano_v2.py
src/megatron/bridge/recipes/nemotronh/nemotronh.py
src/megatron/bridge/recipes/olmoe/olmoe_7b.py
src/megatron/bridge/recipes/qwen/qwen2.py
src/megatron/bridge/recipes/qwen/qwen3.py
src/megatron/bridge/recipes/qwen/qwen3_moe.py
src/megatron/bridge/recipes/qwen/qwen3_next.py
src/megatron/bridge/recipes/utils/optimizer_utils.py
src/megatron/bridge/training/config.py
src/megatron/bridge/training/setup.py
tests/functional_tests/models/deepseek/test_deepseek_conversion.py
tests/functional_tests/models/deepseek/test_deepseek_provider_mapping.py
tests/functional_tests/models/gemma/test_gemma2_provider.py
tests/functional_tests/models/gemma/test_gemma3_provider.py
tests/functional_tests/models/gemma/test_gemma_provider.py
tests/functional_tests/models/glm/test_glm45_provider.py
tests/functional_tests/models/llama_nemotron/test_llama_nemotron_provider.py
tests/functional_tests/models/nemotronh/test_nemotron_h_provider.py
tests/functional_tests/models/qwen/test_qwen2_provider.py
tests/functional_tests/models/qwen/test_qwen3_moe_provider.py
tests/functional_tests/models/qwen/test_qwen3_provider.py
tests/functional_tests/quantization/test_qat_workflow.py
tests/functional_tests/recipes/test_llama_recipes_distill_3b-1b.py
tests/functional_tests/recipes/test_perf_config_integration.py
tests/functional_tests/recipes/utils.py
tests/functional_tests/training/test_callbacks.py
tests/functional_tests/training/test_decentralized_pg.py
tests/functional_tests/training/test_finetune_dora.py
tests/functional_tests/training/test_finetune_lora.py
tests/functional_tests/training/test_inprocess_restart.py
tests/functional_tests/training/test_nvrx_straggler.py
tests/functional_tests/training/test_pretrain.py
tests/functional_tests/training/test_pretrain_resume.py
tests/functional_tests/training/test_sft.py
tests/functional_tests/training/test_tensor_inspect.py
tests/unit_tests/models/deepseek/test_deepseek_bridges.py
tests/unit_tests/models/deepseek/test_deepseek_provider.py
tests/unit_tests/models/gemma/test_gemma2_bridge.py
tests/unit_tests/models/gemma/test_gemma3_bridge.py
tests/unit_tests/models/gemma/test_gemma_bridge.py
tests/unit_tests/models/glm/test_glm45_bridge.py
tests/unit_tests/models/gpt_oss/test_gpt_oss_bridges.py
tests/unit_tests/models/llama/test_llama_bridge.py
tests/unit_tests/models/llama_nemotron/test_llama_nemotron_bridge.py
tests/unit_tests/models/mistral/test_mistral_model_bridge.py
tests/unit_tests/models/nemotron/test_nemotron_bridge.py
tests/unit_tests/models/nemotronh/test_nemotron_h_bridge.py
tests/unit_tests/models/olmoe/test_olmoe_bridge.py
tests/unit_tests/models/qwen/test_qwen3_bridge.py
tests/unit_tests/models/qwen/test_qwen3_moe_bridge.py
tests/unit_tests/models/qwen/test_qwen3_next_bridge.py
tests/unit_tests/models/qwen_vl/test_qwen25_vl_bridge.py
tests/unit_tests/models/test_gpt_provider.py
tests/unit_tests/recipes/gemma/test_gemma2_recipes.py
tests/unit_tests/recipes/gpt/test_gpt3_175b.py
tests/unit_tests/recipes/kimi/test_kimi_k2.py
tests/unit_tests/recipes/nemotronh/test_nemotron_3_nano.py
tests/unit_tests/recipes/nemotronh/test_nemotron_nano_v2.py
tests/unit_tests/recipes/nemotronh/test_nemotronh.py
tests/unit_tests/recipes/qwen/test_qwen2_recipes.py
tests/unit_tests/recipes/test_deepseek_recipes.py
tests/unit_tests/recipes/test_gemma3_recipes.py
tests/unit_tests/recipes/test_glm45_recipes.py
tests/unit_tests/recipes/test_llama_recipes.py
tests/unit_tests/recipes/test_moonlight_recipes.py
tests/unit_tests/recipes/test_nemotronh_recipes.py
tests/unit_tests/recipes/test_olmoe_recipes.py
tests/unit_tests/recipes/test_qwen_recipes.py
tests/unit_tests/recipes/test_run_plugins.py
tests/unit_tests/recipes/utils/test_optimizer_utils.py
tests/unit_tests/training/test_log_non_default_values.py
tests/unit_tests/training/test_setup.py

💤 Files with no reviewable changes (23)

src/megatron/bridge/models/qwen/qwen3_next_bridge.py
src/megatron/bridge/models/nemotron_vl/nemotron_vl_provider.py
tests/unit_tests/models/gemma/test_gemma2_bridge.py
tests/functional_tests/models/glm/test_glm45_provider.py
src/megatron/bridge/models/mistral/mistral_bridge.py
tests/functional_tests/models/qwen/test_qwen3_moe_provider.py
src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py
tests/functional_tests/models/gemma/test_gemma3_provider.py
tests/unit_tests/models/gemma/test_gemma3_bridge.py
tests/functional_tests/models/qwen/test_qwen2_provider.py
tests/functional_tests/models/llama_nemotron/test_llama_nemotron_provider.py
tests/functional_tests/models/gemma/test_gemma_provider.py
src/megatron/bridge/models/qwen_vl/qwen25_vl_bridge.py
tests/unit_tests/models/test_gpt_provider.py
tests/functional_tests/models/deepseek/test_deepseek_provider_mapping.py
src/megatron/bridge/models/deepseek/common.py
tests/functional_tests/models/nemotronh/test_nemotron_h_provider.py
tests/unit_tests/models/mistral/test_mistral_model_bridge.py
tests/functional_tests/models/gemma/test_gemma2_provider.py
tests/functional_tests/models/qwen/test_qwen3_provider.py
tests/unit_tests/models/gemma/test_gemma_bridge.py
tests/unit_tests/models/qwen_vl/test_qwen25_vl_bridge.py
tests/unit_tests/models/qwen/test_qwen3_next_bridge.py

coderabbitai · 2026-02-24T11:06:12Z

examples/quantization/pretrain_quantized_llama3_8b.py

    if args.hf_path:
        logger.info(f"Using custom HuggingFace path: {args.hf_path}")
-        recipe_kwargs["hf_path"] = args.hf_path
+        # Import AutoBridge to create a new model provider with the custom HF path
+        from megatron.bridge.models import AutoBridge

-    cfg: ConfigContainer = pretrain_config(**recipe_kwargs)
-    logger.info("Loaded base configuration")
+        cfg.model = AutoBridge.from_hf_pretrained(args.hf_path).to_megatron_provider(load_weights=False)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for AutoBridge class and its methods find . -type f -name "*.py" -path "*/megatron/bridge/models/*" | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1120

🏁 Script executed:

#!/bin/bash # Look for AutoBridge definition and from_hf_config method rg "class AutoBridge|def from_hf_config|def from_hf_pretrained" --type=py -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1977

🏁 Script executed:

#!/bin/bash # Check the specific file for import placement cat -n examples/quantization/pretrain_quantized_llama3_8b.py | sed -n '150,170p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1095

🏁 Script executed:

#!/bin/bash # Examine the from_hf_pretrained and from_hf_config implementations sed -n '148,230p' src/megatron/bridge/models/conversion/auto_bridge.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3593

🏁 Script executed:

#!/bin/bash # Also check the to_megatron_provider implementation rg "def to_megatron_provider" -A 20 --type=py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 32005

🏁 Script executed:

#!/bin/bash # Check what PreTrainedCausalLM.from_pretrained does and if it loads weights rg "class AutoBridge" -A 50 src/megatron/bridge/models/conversion/auto_bridge.py | head -80

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2443

🏁 Script executed:

#!/bin/bash # Get the full implementation of from_hf_pretrained sed -n '194,250p' src/megatron/bridge/models/conversion/auto_bridge.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2566

🏁 Script executed:

#!/bin/bash # Check if there's any rank guard usage in the AutoBridge or related code rg "get_rank|rank_zero|dist.get_rank" src/megatron/bridge/models/conversion/ -A 2 -B 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1787

🏁 Script executed:

#!/bin/bash # Get the full to_megatron_provider implementation sed -n '238,320p' src/megatron/bridge/models/conversion/auto_bridge.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3635

🏁 Script executed:

#!/bin/bash # Check if there are any attribute accesses on hf_pretrained that would fail if it's just a config rg "self.hf_pretrained\." src/megatron/bridge/models/conversion/auto_bridge.py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 528

🏁 Script executed:

#!/bin/bash # Search for to_megatron_provider in AutoBridge specifically rg "def to_megatron_provider" src/megatron/bridge/models/conversion/auto_bridge.py -A 80

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4006

🏁 Script executed:

#!/bin/bash # Check the config property and how hf_pretrained.config is accessed rg "\.config" src/megatron/bridge/models/conversion/auto_bridge.py -B 2 -A 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 506

🏁 Script executed:

#!/bin/bash # Check the beginning of the file to see how imports are organized head -30 examples/quantization/pretrain_quantized_llama3_8b.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1327

🏁 Script executed:

#!/bin/bash # Verify if AutoBridge is already imported at the top rg "^import|^from" examples/quantization/pretrain_quantized_llama3_8b.py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 684

All distributed ranks load the full HF model unnecessarily — OOM risk.

AutoBridge.from_hf_pretrained(args.hf_path) loads the entire model with all weights via PreTrainedCausalLM.from_pretrained(), and because there is no rank guard, every torchrun worker independently loads the full model in parallel. For LLaMA 3-8B that is ~16 GB per rank; with 8 ranks this consumes ~128 GB total VRAM and will cause OOM before pretraining starts.

Use AutoBridge.from_hf_config() instead to load only the configuration, then call to_megatron_provider(load_weights=False) to defer weight loading until after distributed initialization:

🛡️ Proposed fix

- # Import AutoBridge to create a new model provider with the custom HF path - from megatron.bridge.models import AutoBridge - - cfg.model = AutoBridge.from_hf_pretrained(args.hf_path).to_megatron_provider(load_weights=False) + # Import AutoBridge to create a new model provider with the custom HF path + from megatron.bridge.models import AutoBridge + from transformers import AutoConfig + + hf_cfg = AutoConfig.from_pretrained(args.hf_path) + cfg.model = AutoBridge.from_hf_config(hf_cfg).to_megatron_provider(load_weights=False)

Also move the AutoBridge import to the top of the file with other megatron.bridge.* imports to follow the import organization guideline.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/quantization/pretrain_quantized_llama3_8b.py` around lines 160 - 165, Replace the blocking call to AutoBridge.from_hf_pretrained(args.hf_path) (which loads full weights on every rank) with AutoBridge.from_hf_config(args.hf_path) so only the model config is read and then call .to_megatron_provider(load_weights=False) to defer weight loading until after distributed init; update the code that currently does AutoBridge.from_hf_pretrained(...).to_megatron_provider(load_weights=False) to use AutoBridge.from_hf_config(...) instead, and move the `from megatron.bridge.models import AutoBridge` import to the top with other megatron.bridge imports to follow import organization.

coderabbitai · 2026-02-24T11:06:12Z

scripts/performance/configs/kimi/kimi_llm_pretrain.py

+def kimi_k2_pretrain_config_gb300(
+    precision: str = "bf16", mock: bool = True, config_variant: str = "v1"
+) -> ConfigContainer:
+    """GB300, baseline config."""
+    base_cfg = get_workload_base_config(
+        model_family_name="kimi",
+        model_recipe_name="kimi_k2",
+        gpu="gb300",
+        compute_dtype=precision.upper(),
+        task="pretrain",
+        config_variant=config_variant,
+    )
+
+    cfg = pretrain_config()
+    precision_config = get_precision_config(precision)
+    cfg.mixed_precision = precision_config
+
+    if base_cfg.moe_flex_dispatcher_backend is not None:
+        cfg.model.moe_flex_dispatcher_backend = base_cfg.moe_flex_dispatcher_backend
+
+    if base_cfg.pp_layout:
+        cfg.model.pipeline_model_parallel_layout = base_cfg.pp_layout
+    else:
+        # Recompute layout based on updated PP/VP sizes
+        pp_size = base_cfg.pipeline_model_parallel_size
+        vp_size = base_cfg.virtual_pipeline_model_parallel_size
+        layout = _get_kimi_k2_pipeline_layout(pp_size, vp_size)
+        cfg.model.pipeline_model_parallel_layout = layout
+
+    set_kimi_k2_common_configs(cfg)
+    set_workload_base_configs(cfg, base_cfg)
+
+    cfg.comm_overlap.overlap_grad_reduce = True
+
+    # Setting num_workers and pin_memory to 0 and False respectively gives better performance.
+    # we are debugging this and might change this in the future.
+    cfg.dataset.num_workers = 0
+    cfg.dataset.pin_memory = False
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the file and search for pretrain_config() definition and comm_overlap usage find . -name "kimi_llm_pretrain.py" -type f | head -5

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 126

🏁 Script executed:

#!/bin/bash # Search for pretrain_config function definition rg "def pretrain_config" -A 10 -B 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

🏁 Script executed:

#!/bin/bash # Check for comm_overlap initialization in the file rg "comm_overlap" -n scripts/performance/configs/kimi/kimi_llm_pretrain.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 278

🏁 Script executed:

#!/bin/bash # Check all 4 function signatures to see if mock is used rg "def kimi_k2_pretrain_config" -A 40 scripts/performance/configs/kimi/kimi_llm_pretrain.py | head -150

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 5316

🏁 Script executed:

#!/bin/bash # Search where pretrain_config is defined - likely in another module rg "def pretrain_config\(" --type py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

🏁 Script executed:

#!/bin/bash # Get the top of the file to see imports head -50 scripts/performance/configs/kimi/kimi_llm_pretrain.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1890

🏁 Script executed:

#!/bin/bash # Check if mock is used in any of the functions rg "mock" scripts/performance/configs/kimi/kimi_llm_pretrain.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 370

🏁 Script executed:

#!/bin/bash # Find pretrain_config definition in the entire codebase rg "def pretrain_config\(" -B 2 -A 5 --type py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

🏁 Script executed:

#!/bin/bash # Check what module pretrain_config likely comes from rg "pretrain_config" -B 3 scripts/performance/configs/kimi/kimi_llm_pretrain.py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 494

🏁 Script executed:

#!/bin/bash # Find the kimi_k2.py file find . -name "kimi_k2.py" -type f

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 116

🏁 Script executed:

#!/bin/bash # Search for kimi_k2_pretrain_config definition rg "def kimi_k2_pretrain_config\(" -A 20 --type py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1734

🏁 Script executed:

#!/bin/bash # Check ConfigContainer for comm_overlap field rg "comm_overlap" -B 2 -A 2 --type py | head -50

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4103

🏁 Script executed:

#!/bin/bash # Find _pretrain_common definition rg "def _pretrain_common\(" -A 30 --type py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2472

🏁 Script executed:

#!/bin/bash # Get full content of kimi_k2.py to understand the config flow wc -l src/megatron/bridge/recipes/kimi/kimi_k2.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 118

🏁 Script executed:

#!/bin/bash # Check what _pretrain_common returns and if it initializes comm_overlap rg "_pretrain_common" -B 2 -A 2 --type py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 50383

🏁 Script executed:

#!/bin/bash # Search for comm_overlap initialization in common.py rg "comm_overlap" src/megatron/bridge/recipes/common.py -B 2 -A 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 251

🏁 Script executed:

#!/bin/bash # Get the full ConfigContainer initialization in _pretrain_common sed -n '/def _pretrain_common/,/return cfg/p' src/megatron/bridge/recipes/common.py | head -100

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3456

🏁 Script executed:

#!/bin/bash # Check the entire file structure of kimi_llm_pretrain.py to see all 4 functions cat scripts/performance/configs/kimi/kimi_llm_pretrain.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 7142

Guard comm_overlap access and remove unused mock parameter

cfg.comm_overlap.overlap_grad_reduce = ... will raise AttributeError since _pretrain_common() initializes comm_overlap=None. Guard with a null check before assignment. Also, mock is unused across all four config builder functions—remove the parameter to avoid a misleading API signature.

Suggested fix (apply to each config builder)

- cfg.comm_overlap.overlap_grad_reduce = True + if cfg.comm_overlap is not None: + cfg.comm_overlap.overlap_grad_reduce = True

Also applies to lines 87-125 (gb200), 129-163 (b200), 166-200 (h100). Remove mock: bool = True parameter from all four functions.

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 46-46: Unused function argument: mock

(ARG001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/kimi/kimi_llm_pretrain.py` around lines 45 - 83, The cfg.comm_overlap assignment can raise AttributeError because _pretrain_common() may set comm_overlap=None; update the config builders (e.g., kimi_k2_pretrain_config_gb300) to check if cfg.comm_overlap is not None before setting cfg.comm_overlap.overlap_grad_reduce, and remove the unused mock: bool = True parameter from this function and the other three config builders (the GB200, B200, H100 variants) so the signatures no longer include mock and callers reflect the updated signature.

coderabbitai · 2026-02-24T11:06:12Z

scripts/performance/configs/kimi/kimi_workload_base_configs.py

+    moe_flex_dispatcher_backend="hybridep",
+    moe_a2a_overlap=False,
+    cuda_graph_impl="transformer_engine",
+    cuda_graph_scope=["attn", "moe_router", "moe_preprocess"],


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n --type=py "cuda_graph_scope" -C 3 -g '!*test*'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 50383

🏁 Script executed:

# Check the WorkloadBaseConfig definition rg -A 20 "class WorkloadBaseConfig" scripts/performance/utils/utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1046

🏁 Script executed:

# Get more lines from WorkloadBaseConfig to see cuda_graph_scope definition sed -n '40,70p' scripts/performance/utils/utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1065

🏁 Script executed:

# Check the exact class definition and decorator sed -n '30,60p' scripts/performance/utils/utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1386

🏁 Script executed:

# Check lines before the class definition to see if it's a dataclass sed -n '25,35p' scripts/performance/utils/utils.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 622

Fix type annotation for cuda_graph_scope in WorkloadBaseConfig.

The cuda_graph_scope field is declared as Optional[str] = None in scripts/performance/utils/utils.py line 55, but lines 37 and 57 of this file assign list values like ["attn", "moe_router", "moe_preprocess"]. The downstream override handler in scripts/performance/utils/overrides.py accepts Optional[str | List[str]] and converts strings to lists, indicating the field should support both types. Update the type annotation in WorkloadBaseConfig to Optional[str | List[str]] = None to reflect actual usage.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/kimi/kimi_workload_base_configs.py` at line 37, The WorkloadBaseConfig field cuda_graph_scope is declared as Optional[str] but is assigned lists (e.g. ["attn", "moe_router", "moe_preprocess"]) and must accept both strings and lists; update the type annotation on the cuda_graph_scope attribute in the WorkloadBaseConfig class to Optional[str | List[str]] = None so it matches usage and the downstream override handler that accepts Optional[str | List[str]].

coderabbitai · 2026-02-24T11:06:12Z

src/megatron/bridge/models/conversion/model_bridge.py

+        is_mla_provider = self.PROVIDER_CLASS is not None and issubclass(self.PROVIDER_CLASS, MLAModelProvider)
+        rope_scaling = getattr(hf_config, "rope_scaling", None)
+
+        if rope_scaling is not None and isinstance(rope_scaling, dict):
+            rope_type = rope_scaling.get("type") or rope_scaling.get("rope_type")
+            if rope_type == "yarn":
+                # Check if this is an MLA provider (uses direct field names)
+                # or a GPT provider (uses yarn_ prefixed field names)
+                if is_mla_provider:
+                    # MLA models: use direct field names (mscale, rotary_scaling_factor, etc.)
+                    mla_params = {}
+                    for hf_key, megatron_key in self.MLA_ROPE_SCALING_MAPPING:
+                        value = rope_scaling.get(hf_key)
+                        if value is not None:
+                            mla_params[megatron_key] = value
+                    if mla_params:
+                        provider_kwargs["_mla_rope_params"] = mla_params
+                else:
+                    # GPT models: use yarn_ prefixed field names
+                    yarn_params = {"position_embedding_type": "yarn"}
+                    for hf_key, megatron_key in self.YARN_ROPE_SCALING_MAPPING:
+                        yarn_params[megatron_key] = rope_scaling.get(hf_key)
+                    if "truncate" in rope_scaling:
+                        yarn_params["yarn_correction_range_round_to_int"] = rope_scaling["truncate"]
+                    if yarn_params:
+                        provider_kwargs["_yarn_params"] = yarn_params


⚠️ Potential issue | 🟠 Major

Skip None values when building YARN params.

Right now missing YARN keys get set to None, which can overwrite provider defaults and break config validation downstream.

🐛 Proposed fix

- for hf_key, megatron_key in self.YARN_ROPE_SCALING_MAPPING: - yarn_params[megatron_key] = rope_scaling.get(hf_key) + for hf_key, megatron_key in self.YARN_ROPE_SCALING_MAPPING: + value = rope_scaling.get(hf_key) + if value is not None: + yarn_params[megatron_key] = value

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/conversion/model_bridge.py` around lines 371 - 396, The YARN param builder currently assigns None for missing keys (in the loop over YARN_ROPE_SCALING_MAPPING), which can overwrite provider defaults; change the loop in model_bridge.py that iterates self.YARN_ROPE_SCALING_MAPPING to only set yarn_params[megatron_key] = value when value is not None (mirroring the MLA branch), and only add provider_kwargs["_yarn_params"] when yarn_params contains actual scaling entries (i.e., beyond the initial "position_embedding_type") or when a non-None truncate is present, so missing HF keys do not produce None-valued overrides.

coderabbitai · 2026-02-24T11:06:12Z

src/megatron/bridge/models/conversion/model_bridge.py

+        # Handle YARN rope scaling: check if provider has yarn_* params and build rope_scaling dict
+        yarn_rotary_scaling_factor = getattr(provider, "yarn_rotary_scaling_factor", None)
+        if yarn_rotary_scaling_factor is not None:
+            if "rope_scaling" not in hf_config:
+                hf_config["rope_scaling"] = {}
+            hf_config["rope_scaling"]["rope_type"] = "yarn"
+
+            for hf_key, megatron_key in cls.YARN_ROPE_SCALING_MAPPING:
+                value = getattr(provider, megatron_key, None)
+                if value is not None:
+                    hf_config["rope_scaling"][hf_key] = value
+
+            yarn_correction_range_round_to_int = getattr(provider, "yarn_correction_range_round_to_int", None)
+            if yarn_correction_range_round_to_int is not None:
+                hf_config["rope_scaling"]["truncate"] = yarn_correction_range_round_to_int
+


⚠️ Potential issue | 🟠 Major

Export MLA rope scaling to HF configs.

The new MLA rope-scaling fields are handled when importing HF configs, but they are not emitted on export. That loses critical settings for MLA models.

🐛 Proposed fix

+ from megatron.bridge.models.mla_provider import MLAModelProvider + + if isinstance(provider, MLAModelProvider): + mla_rope_scaling: dict[str, float] = {} + for hf_key, megatron_key in cls.MLA_ROPE_SCALING_MAPPING: + value = getattr(provider, megatron_key, None) + if value is not None: + mla_rope_scaling[hf_key] = value + if mla_rope_scaling: + hf_config["rope_scaling"] = {"rope_type": "yarn", **mla_rope_scaling} + else: yarn_rotary_scaling_factor = getattr(provider, "yarn_rotary_scaling_factor", None) if yarn_rotary_scaling_factor is not None: if "rope_scaling" not in hf_config: hf_config["rope_scaling"] = {} hf_config["rope_scaling"]["rope_type"] = "yarn"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/conversion/model_bridge.py` around lines 504 - 519, Export currently doesn't emit MLA rope-scaling fields so rope_scaling settings are lost; in the export path that builds hf_config (the same code area using cls.YARN_ROPE_SCALING_MAPPING and provider attributes), detect MLA/yarn-related provider attributes (yarn_rotary_scaling_factor, yarn_correction_range_round_to_int and the keys listed in cls.YARN_ROPE_SCALING_MAPPING) and populate hf_config["rope_scaling"] with rope_type="yarn" and the mapped hf keys and truncate flag when present; update the export logic that reads from provider into hf_config so it mirrors the import handling shown here (use the provider attribute names from the mapping and yarn_* attributes to set hf_config["rope_scaling"]).

coderabbitai · 2026-02-24T11:06:12Z

src/megatron/bridge/recipes/gpt_oss/gpt_oss.py

+    cfg.model.tensor_model_parallel_size = 2
+    cfg.model.pipeline_model_parallel_size = 4
+    cfg.model.pipeline_model_parallel_layout = None
+    cfg.model.pipeline_dtype = None
+    cfg.model.virtual_pipeline_model_parallel_size = None


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the target file find . -type f -name "gpt_oss.py" | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 119

🏁 Script executed:

# Read the file to see lines 272-276 in context cat -n ./src/megatron/bridge/recipes/gpt_oss/gpt_oss.py | sed -n '260,290p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1642

🏁 Script executed:

# Search for 20B config and other pipeline_dtype references in the file cat -n ./src/megatron/bridge/recipes/gpt_oss/gpt_oss.py | grep -E "(20B|pipeline_dtype)"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 709

🏁 Script executed:

# Check the 20B pre-training config settings cat -n ./src/megatron/bridge/recipes/gpt_oss/gpt_oss.py | sed -n '133,170p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1891

🏁 Script executed:

# Check the 120B config context and finetuning configs cat -n ./src/megatron/bridge/recipes/gpt_oss/gpt_oss.py | sed -n '250,290p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2045

🏁 Script executed:

# Check finetuning configs cat -n ./src/megatron/bridge/recipes/gpt_oss/gpt_oss.py | sed -n '361,420p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2819

🏁 Script executed:

# Continue reading the _gpt_oss_finetune_common function cat -n ./src/megatron/bridge/recipes/gpt_oss/gpt_oss.py | sed -n '404,460p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2860

🏁 Script executed:

# Search for other recipe files to see how they handle pipeline_dtype find ./src/megatron/bridge/recipes -type f -name "*.py" | head -10

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 556

🏁 Script executed:

# Search for pipeline_dtype in other recipe files rg "pipeline_dtype" ./src/megatron/bridge/recipes --type py -B 2 -A 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 50383

Set pipeline_dtype to torch.bfloat16 for PP=4 configs.

Both the 20B and 120B pre-training configurations use pipeline parallelism (PP=4) but have pipeline_dtype = None, which violates the pattern established across other recipe files. When pipeline model parallelism > 1, pipeline_dtype must be explicitly set to torch.bfloat16.

🔧 Fix for both configs

- cfg.model.pipeline_dtype = None + cfg.model.pipeline_dtype = torch.bfloat16

Apply this fix to both:

Line 156 (20B pre-training config)

Line 275 (120B pre-training config)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cfg.model.tensor_model_parallel_size = 2

cfg.model.pipeline_model_parallel_size = 4

cfg.model.pipeline_model_parallel_layout = None

cfg.model.pipeline_dtype = None

cfg.model.virtual_pipeline_model_parallel_size = None

cfg.model.tensor_model_parallel_size = 2

cfg.model.pipeline_model_parallel_size = 4

cfg.model.pipeline_model_parallel_layout = None

cfg.model.pipeline_dtype = torch.bfloat16

cfg.model.virtual_pipeline_model_parallel_size = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/recipes/gpt_oss/gpt_oss.py` around lines 272 - 276, The pipeline_dtype is left as None for configs where cfg.model.pipeline_model_parallel_size == 4; update those occurrences in src/megatron/bridge/recipes/gpt_oss/gpt_oss.py by setting cfg.model.pipeline_dtype = torch.bfloat16 (ensure torch is imported/available in the module) for both the 20B and 120B pre-training config blocks where cfg.model.pipeline_model_parallel_size is 4 so pipeline_dtype is explicitly set whenever pipeline parallelism > 1.

coderabbitai · 2026-02-24T11:06:12Z

src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py

+    # Optimizer hyperparameters
+    cfg.optimizer.lr = 1.6e-3
+    cfg.optimizer.weight_decay = 0.1
+    cfg.scheduler.min_lr = 1.6e-5


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check whether min_lr lives on optimizer or scheduler rg -n "\.(optimizer|scheduler)\.min_lr" --type=py -g 'src/megatron/bridge/recipes/**'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 717

🏁 Script executed:

#!/bin/bash # Look at the context around line 131 in nemotron_3_nano.py sed -n '120,140p' src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 795

🏁 Script executed:

#!/bin/bash # Search for scheduler and optimizer config definitions and their min_lr usage rg -n "min_lr" --type=py -B2 -A2 src/megatron/bridge/recipes/gpt/gpt3_175b.py | head -30

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 221

Change cfg.scheduler.min_lr to cfg.optimizer.min_lr at line 131.

All other recipes (GPT 175B, LLaMA3, GLM45, Nemotron Nano V2 VL) set cfg.optimizer.min_lr. This file is the only outlier using cfg.scheduler.min_lr, which is inconsistent with the established pattern.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py` at line 131, Replace the inconsistent assignment of cfg.scheduler.min_lr with cfg.optimizer.min_lr in the Nemotron 3 Nano recipe: locate the line setting cfg.scheduler.min_lr (currently 1.6e-5) and change it to assign that value to cfg.optimizer.min_lr instead, and scan the file for any other uses of cfg.scheduler.min_lr to align them with the optimizer config pattern used in other recipes.

coderabbitai · 2026-02-24T11:06:13Z

src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py

+    cfg.optimizer.lr = 1.6e-3
+    cfg.optimizer.weight_decay = 0.1
+    cfg.scheduler.min_lr = 1.6e-5
+    cfg.scheduler.warmup_iters = 333


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find SchedulerConfig class definition find . -name "*.py" -type f | xargs grep -l "class SchedulerConfig" | head -5

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 111

🏁 Script executed:

# Check warmup_iters vs lr_warmup_iters usage in recipes rg -n "scheduler\.(lr_)?warmup_iters" --type=py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 5255

🏁 Script executed:

# Look at the specific file to confirm line 132 sed -n '125,140p' src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 600

🏁 Script executed:

# Search for _pretrain_common to understand what it sets find . -name "*.py" -type f | xargs grep -l "_pretrain_common" | head -5

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 310

🏁 Script executed:

# View SchedulerConfig class and its attributes sed -n '1600,1650p' src/megatron/bridge/training/config.py | grep -A 50 "class SchedulerConfig"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

🏁 Script executed:

# Look for attribute definitions in SchedulerConfig rg -A 30 "class SchedulerConfig" src/megatron/bridge/training/config.py | head -60

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1370

Fix incorrect scheduler attribute name on line 132.

Change cfg.scheduler.warmup_iters = 333 to cfg.scheduler.lr_warmup_iters = 333. The SchedulerConfig class defines lr_warmup_iters as the valid attribute (default: 0), not warmup_iters. Using the wrong name silently creates a new attribute without actually setting the scheduler's warmup iterations, leaving it at the default value of 0 instead of the intended 333. All other recipes correctly use cfg.scheduler.lr_warmup_iters.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py` at line 132, The line sets a non-existent scheduler attribute (cfg.scheduler.warmup_iters) so the scheduler's warmup stays at default; change that assignment to use the SchedulerConfig's real field name by replacing cfg.scheduler.warmup_iters = 333 with cfg.scheduler.lr_warmup_iters = 333 so the scheduler uses 333 warmup iterations (refer to cfg.scheduler and SchedulerConfig's lr_warmup_iters).

…rate Moonlight recipe to AutoBridge (#2428) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: oliver könig <okoenig@nvidia.com>

Signed-off-by: oliver könig <okoenig@nvidia.com>

…2571) Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: oliver könig <okoenig@nvidia.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g · 2026-03-03T11:03:53Z

/ok to test e39b6da

Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g · 2026-03-03T12:04:48Z

/ok to test 8fe60aa

Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g · 2026-03-03T12:53:09Z

/ok to test 44468af

Signed-off-by: oliver könig <okoenig@nvidia.com>

yaoyu-33 added 5 commits February 24, 2026 10:37

[model, refactor] refactor: Centralize provider_bridge config mapping…

4cbe4eb

… in base class (#2052) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

[recipe, training] fix: Correct adam_eps default and add non-default …

7cf6851

…config logging (#2184) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

Fix performance config scripts for parameterless recipe API (#2201)

70436be

Signed-off-by: oliver könig <okoenig@nvidia.com>

[model, refactor] refactor: Centralize provider_bridge config mapping…

d4fd66a

… in base class for Nemotron models (#2225) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix: call apply_flex_dispatcher_backend with correct backend in MoE p…

199723c

…retrain recipes (#2288) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot bot had a problem deploying to nemo-ci February 24, 2026 10:42 Error

copy-pr-bot bot had a problem deploying to test February 24, 2026 10:42 Error

malay-nagda added 2 commits February 24, 2026 10:43

kimi k2 recipe intro (#2097)

6f6a4d8

Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: malay-nagda <malayn@nvidia.com>

nemotron3 nano recipes (#2301)

5ec81cb

Signed-off-by: Malay Nagda <malayn@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci February 24, 2026 10:44 Error

copy-pr-bot bot had a problem deploying to test February 24, 2026 10:44 Error

Update Qwen3 30B H100 Base Configs with HybridEP (#2477)

f29a269

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci February 24, 2026 10:45 Error

copy-pr-bot bot had a problem deploying to test February 24, 2026 10:45 Error

Fix DeepSeek-V3 H100 large scale config (#2401)

f10feb7

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci February 24, 2026 10:46 Error

fix: all2all for qwen3-next H100 (#2479)

cb434ff

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci February 24, 2026 10:47 Inactive

copy-pr-bot bot temporarily deployed to test February 24, 2026 10:47 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 24, 2026 10:51 Inactive

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

copy-pr-bot bot had a problem deploying to nemo-ci February 24, 2026 11:26 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 11:07 Inactive

copy-pr-bot bot temporarily deployed to test February 25, 2026 11:08 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 11:17 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 11:27 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 11:38 Inactive

This was referenced Mar 3, 2026

cp: scaling up gbs as number of gpus scales up (2553) into r0.3.0 #2559

Closed

cp: Fix DeepSeek-V3 H100 large scale config (2401) into r0.3.0 #2483

Closed

Merge remote-tracking branch 'origin/r0.3.0' into ko3n1g/cp/260201

a727503

Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g mentioned this pull request Mar 3, 2026

cp: disable CG for 8B SFT (2508) into r0.3.0 #2513

Closed

copy-pr-bot bot had a problem deploying to nemo-ci March 3, 2026 11:04 Error

copy-pr-bot bot temporarily deployed to test March 3, 2026 11:04 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 3, 2026 11:10 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 3, 2026 11:44 Failure

fix test

8fe60aa

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot temporarily deployed to test March 3, 2026 12:05 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 3, 2026 12:06 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 3, 2026 12:06 Error

copy-pr-bot bot temporarily deployed to nemo-ci March 3, 2026 12:13 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 3, 2026 12:23 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 3, 2026 12:23 Failure

copy-pr-bot bot temporarily deployed to nemo-ci March 3, 2026 12:23 Inactive

bump modelopt

44468af

Signed-off-by: oliver könig <okoenig@nvidia.com>

disable

7a8ff8d

Signed-off-by: oliver könig <okoenig@nvidia.com>

Conversation

ko3n1g commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

ko3n1g commented Mar 3, 2026

Uh oh!

ko3n1g commented Mar 3, 2026

Uh oh!

ko3n1g commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ko3n1g commented Feb 24, 2026 •

edited

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading