Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
e82164f
Add anymodel directories to feature/puzzletron
danielkorzekwa Mar 4, 2026
2099df3
Make any_model conversion working.
danielkorzekwa Mar 5, 2026
eb5cf8a
Update child_init.py with anymodel version
danielkorzekwa Mar 5, 2026
c9de41c
fix attention pruning
danielkorzekwa Mar 5, 2026
3c1bc1f
Add trust_remote_code to load_model_config (default to false)
danielkorzekwa Mar 5, 2026
8357136
Make activation scoring working
danielkorzekwa Mar 5, 2026
6cc2194
Comment all tested models aside of llama_3_1_8b_instruct
danielkorzekwa Mar 5, 2026
ee4e1e3
Delete not needed decilm test
danielkorzekwa Mar 5, 2026
449b523
Fix broken tests
danielkorzekwa Mar 5, 2026
fb27bba
Update puzzletron_nas_pluging to any_model version
danielkorzekwa Mar 5, 2026
b350f82
Correct test resources used by tests.
danielkorzekwa Mar 5, 2026
fafe5a3
Disable puzzletron tests (will be enabled after all any_model logic i…
danielkorzekwa Mar 5, 2026
e988248
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 6, 2026
c717852
Comment out not implemented models.
danielkorzekwa Mar 6, 2026
030f126
format python docs
danielkorzekwa Mar 6, 2026
8dcdfbf
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 6, 2026
70df0df
Use trust_remote_code in force_cache_dynamic_modules()
danielkorzekwa Mar 6, 2026
bb56662
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 6, 2026
ecd953e
Fix anymodel pruning
danielkorzekwa Mar 6, 2026
ee8f538
Fix buid docs issue.
danielkorzekwa Mar 6, 2026
c9b76a1
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 6, 2026
6e3af61
Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…
danielkorzekwa Mar 6, 2026
0ad6d92
Merging build_library_and_stats
danielkorzekwa Mar 6, 2026
995eb1a
Merging anymodel: calc_one_block_scores
danielkorzekwa Mar 6, 2026
34081c9
Mering any_model: calc_one_block_scores
danielkorzekwa Mar 6, 2026
ed5c00f
merge any_model: mip_and_realize_models
danielkorzekwa Mar 6, 2026
993b5ec
Add all anymodel models but gptoss
danielkorzekwa Mar 6, 2026
6e9f03b
Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)
danielkorzekwa Mar 9, 2026
e8b7a7d
merge anymodel for nemotron-3-nano-30b-a3b-base-bf16
danielkorzekwa Mar 9, 2026
47414d5
Clarify readme and avoid reusing the same reference in llama_converter.
danielkorzekwa Mar 9, 2026
a8305d8
Fix tied-embedding handling before writing the safetensors index.
danielkorzekwa Mar 9, 2026
68421a5
Fix NaN ranking currently selects NaNs as “best” experts by default.
danielkorzekwa Mar 9, 2026
d6b8028
Code clean up.
danielkorzekwa Mar 9, 2026
ecd2341
Code clean up.
danielkorzekwa Mar 10, 2026
f9d845d
code clean up
danielkorzekwa Mar 10, 2026
d171b01
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 10, 2026
722da90
Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…
danielkorzekwa Mar 10, 2026
934ab2f
code clean up
danielkorzekwa Mar 10, 2026
0f14ec3
Merge branch 'dkorzekwa/anymodel_pruning' into dkorzekwa/anymodel_bui…
danielkorzekwa Mar 10, 2026
dcb9e02
remove not needed comment
danielkorzekwa Mar 10, 2026
0c9ea5d
Merge branch 'dkorzekwa/anymodel_build_library_and_stats' into dkorze…
danielkorzekwa Mar 10, 2026
5b310e2
Merge branch 'dkorzekwa/any_model_calc_one_block_scores' into dkorzek…
danielkorzekwa Mar 10, 2026
4f82b1c
Merge branch 'dkorzekwa/mip_and_realize_models' into dkorzekwa/any_mo…
danielkorzekwa Mar 10, 2026
176a435
Fix a broken test_puzzletron test on 2 gpus.
danielkorzekwa Mar 10, 2026
02e2c9b
Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…
danielkorzekwa Mar 10, 2026
92c4419
Merge branch 'dkorzekwa/anymodel_pruning' into dkorzekwa/anymodel_bui…
danielkorzekwa Mar 10, 2026
aa1eb3e
Merge branch 'dkorzekwa/anymodel_build_library_and_stats' into dkorze…
danielkorzekwa Mar 10, 2026
2b84a96
Merge branch 'dkorzekwa/any_model_calc_one_block_scores' into dkorzek…
danielkorzekwa Mar 10, 2026
fb838c0
Merge branch 'dkorzekwa/mip_and_realize_models' into dkorzekwa/any_mo…
danielkorzekwa Mar 10, 2026
13378ff
Add gpt-oss model
danielkorzekwa Mar 11, 2026
47ca0e3
Add comments about a broken test
danielkorzekwa Mar 11, 2026
96112f7
Fix a broken gptoss test
danielkorzekwa Mar 12, 2026
cb6b182
Add mamba to puzzletron dependencies.
danielkorzekwa Mar 12, 2026
670bb34
Update mamba-ssm and casual-conv1d dependences (remove pinpoint versi…
danielkorzekwa Mar 13, 2026
0e1b591
Install mamba-ssm and causal-conv1d in testenv:cuda13-gpu-puzzletron
danielkorzekwa Mar 13, 2026
ca845ec
Fix installing dependencies in testenv:cuda13-gpu-puzzletron
danielkorzekwa Mar 13, 2026
be825bc
Fix anymodel for qwen3 8B in 2 gpus
danielkorzekwa Mar 13, 2026
7fd1afa
Fix pipeline parallelism issue for wen3-vl-30b-a3b-instruct-qwen3_vl-…
danielkorzekwa Mar 13, 2026
7d7b609
Fix multi-gpu issue for nemotron-nano-12b-v2
danielkorzekwa Mar 13, 2026
249af9d
Fix no_op in any_model
danielkorzekwa Mar 13, 2026
b80583c
Merge branch 'feature/puzzletron' into dkorzekwa/any_model_other_models
danielkorzekwa Mar 13, 2026
88b1b13
Merge any_model tutorial
danielkorzekwa Mar 13, 2026
c0da9c0
Merge mbridge distillation for any_model
danielkorzekwa Mar 13, 2026
1dd742e
Fix nemotron_h_model_descriptor.
danielkorzekwa Mar 14, 2026
4a6ebbe
Fix tox -e build-docs
danielkorzekwa Mar 14, 2026
585f0ed
pin mamba/casual-conv1d versions to fix failing assertion for test_pu…
danielkorzekwa Mar 14, 2026
7fb5d9a
Fix for installing mamba-ssm
danielkorzekwa Mar 14, 2026
75d3d69
Fix broken test for nemotron-3-nano-30b-a3b-base-bf16
danielkorzekwa Mar 14, 2026
0e5722d
code clean up
danielkorzekwa Mar 14, 2026
2dd9735
Make test_puzzletron test deterministic
danielkorzekwa Mar 15, 2026
3561de5
Comment out all models but nemotron-3-nano-30b-a3b-base-bf16 to check…
danielkorzekwa Mar 15, 2026
27866de
Implement Qwen3VLRemoveExpertsIndependentHook
danielkorzekwa Mar 15, 2026
a012fe6
Remove not needed nvidia licence header
danielkorzekwa Mar 16, 2026
52922a4
# Initialize weights to ensure all parameters are properly initialized
danielkorzekwa Mar 16, 2026
c234fb4
Fix non-deterministic test_puzzletron test
danielkorzekwa Mar 16, 2026
53dcd10
Fix for unsetting CUDA_VISIBLE_DEVICES
danielkorzekwa Mar 16, 2026
69d9648
increase numeric tolerance for test_puzzletron.py
danielkorzekwa Mar 17, 2026
4a692dc
Disable lm_loss assertion for nemotron-3-nano-30b-a3b-base-bf16 (not …
danielkorzekwa Mar 17, 2026
e795f0c
Removing incorrect licence file. gpt_oss_pruned_to_mxfp4.py was not a…
danielkorzekwa Mar 17, 2026
631306c
Fix hardcoded trust_remote_code
danielkorzekwa Mar 17, 2026
dc77be2
Merge branch 'dkorzekwa/any_model_other_models' into dkorzekwa/anymod…
danielkorzekwa Mar 17, 2026
b76e0ef
Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…
danielkorzekwa Mar 17, 2026
109b185
Merge branch 'dkorzekwa/anymodel_tutorial' into dkorzekwa/anymodel_mb…
danielkorzekwa Mar 17, 2026
5cadc65
Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_gptoss
danielkorzekwa Mar 17, 2026
151081c
Delete not needed yaml files for test_puzzletron.
danielkorzekwa Mar 17, 2026
36daa6d
Delete not needed mypy exclusion for removed hf_configs files.
danielkorzekwa Mar 17, 2026
960b8ce
Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…
danielkorzekwa Mar 17, 2026
854d96b
Merge branch 'dkorzekwa/anymodel_tutorial' into dkorzekwa/anymodel_mb…
danielkorzekwa Mar 17, 2026
b47f846
Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_tutorial
danielkorzekwa Mar 17, 2026
13f5edc
Merge branch 'dkorzekwa/anymodel_tutorial' into dkorzekwa/anymodel_mb…
danielkorzekwa Mar 17, 2026
f2c1578
Fix a broken mbridge distillation test for anymodel
danielkorzekwa Mar 17, 2026
3592eec
Code clean up.
danielkorzekwa Mar 17, 2026
f06cb20
Use all available GPUs for test_distill_hf
danielkorzekwa Mar 17, 2026
ad31b09
use extend_cmd_parts
danielkorzekwa Mar 17, 2026
0505916
code clean up.
danielkorzekwa Mar 17, 2026
7016857
Improve naming of --hf_export_path and --hf_export_path
danielkorzekwa Mar 18, 2026
7ede076
Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_mbridgedist
danielkorzekwa Mar 19, 2026
24ba700
Use Nemo container for mbridge distillation test
kevalmorabia97 Mar 19, 2026
44186c7
fix a broken test
danielkorzekwa Mar 20, 2026
81f6d4e
Fix typos in README.
danielkorzekwa Mar 20, 2026
a5b715f
Fix trust_remote_code=true issue
danielkorzekwa Mar 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions .github/workflows/_example_tests_runner.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,15 @@ jobs:
apt-get update && apt-get install -y git-lfs
git lfs install --system

pip install ".${{ inputs.pip_install_extras }}"
# use `python -m pip` instead of `pip` to avoid conflicts with system pip for nemo containers
python -m pip install ".${{ inputs.pip_install_extras }}"

if [[ "${{ inputs.example }}" == *"diffusers"* ]]; then
echo "Uninstalling apex for diffusers: T5 Int8 (PixArt) + Apex is not supported as per https://github.com/huggingface/transformers/issues/21391"
pip uninstall -y apex || true
python -m pip uninstall -y apex || true
fi

find examples/${{ inputs.example }} -name "requirements.txt" | while read req_file; do pip install -r "$req_file" || exit 1; done
find examples/${{ inputs.example }} -name "requirements.txt" | while read req_file; do python -m pip install -r "$req_file" || exit 1; done
- name: Run tests
run: |
echo "Running tests for: ${{ inputs.example }}"
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/example_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@ jobs:
match_pattern: "^DCO$|^linux$" # Wait for DCO and Unit tests / linux to pass
delay: 300s

##### TensorRT-LLM Example Tests #####
trtllm-pr:
##### NeMo Example Tests #####
nemo-pr:
needs: [check-file-changes, wait-checks]
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
strategy:
Expand All @@ -67,7 +67,7 @@ jobs:
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5"
docker_image: "nvcr.io/nvidia/nemo:26.02"
example: ${{ matrix.example }}
pip_install_extras: "[hf,puzzletron,dev-test]"
runner: linux-amd64-gpu-rtxpro6000-latest-2
Expand All @@ -76,13 +76,13 @@ jobs:
example-pr-required-check:
# Run even if example tests are skipped
if: ${{ startsWith(github.ref, 'refs/heads/pull-request/') && always() }}
needs: [check-file-changes, trtllm-pr]
needs: [check-file-changes, nemo-pr]
runs-on: ubuntu-latest
steps:
- name: Required GPU tests did not succeed
if: |
needs.check-file-changes.result != 'success' ||
(needs.check-file-changes.outputs.any_changed == 'true' && (
needs.trtllm-pr.result != 'success'
needs.nemo-pr.result != 'success'
))
run: exit 1
2 changes: 1 addition & 1 deletion examples/puzzletron/mbridge_distillation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ torchrun --nproc_per_node=8 examples/puzzletron/mbridge_distillation/distill_hf.

- Add `--trust_remote_code` if student or teacher checkpoints need HuggingFace custom modeling code.
- The distilled Megatron-Bridge checkpoint will be saved to `--output_dir/checkpoints/iter_<train_iters>`.
- Add `--hf-export-path` to automatically export the final checkpoint to HuggingFace format after distillation. When using `--hf-export-path`, you must also provide `--hf-model` to specify the HuggingFace model ID to use as a template for export (e.g., `meta-llama/Llama-3.1-8B-Instruct`). The `--hf-model` should match the base architecture of the student model. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation).
- Add `--hf-export-path` (or `--hf_export_path`) to automatically export the final checkpoint to HuggingFace format after distillation. When exporting, you must also provide `--hf-model` / `--hf_model` as the HuggingFace model ID for the export template (e.g., `meta-llama/Llama-3.1-8B-Instruct`). It should match the base architecture of the student model. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation).
- For production use, use larger datasets like [Nemotron-Pretraining-SFT-v1](https://huggingface.co/datasets/nvidia/Nemotron-Pretraining-SFT-v1) and train for more iterations. See the [Megatron-Bridge distillation tutorial](https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/megatron_bridge#distillation) for best practices.

## MMLU Evaluation Results
Expand Down
3 changes: 3 additions & 0 deletions examples/puzzletron/mbridge_distillation/distill_hf.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ def get_args():
parser.add_argument("--wandb_exp_name", type=str, help="Wandb experiment name (optional)")
# Export arguments
parser.add_argument(
"--hf_export_path",
"--hf-export-path",
type=str,
default=None,
Expand All @@ -153,6 +154,7 @@ def get_args():
),
)
parser.add_argument(
"--hf_model",
"--hf-model",
type=str,
required=True,
Expand Down Expand Up @@ -307,6 +309,7 @@ def _build_model_provider(hf_path):
train_iters=args.train_iters,
hf_export_path=args.hf_export_path,
hf_model=args.hf_model,
trust_remote_code=args.trust_remote_code,
)
except Exception as e:
print(f"⚠️ Export failed: {e}")
Expand Down
35 changes: 35 additions & 0 deletions modelopt/torch/puzzletron/export/mbridge/__init__.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much everything in this PR seems like we should instead merge to M-Bridge. Are we confident enough to upstream these changes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not confident, e.g., we would need to talk to mbrdige/megatron-lm people on that first, align with their plans for heterogenous support. Let's think about it once puzzletron is in main.

We also have to do support for gpt-oss and mamba, so it is not the best time to merge it to mcore

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nemo:26.04 container code freeze is in 2 weeks. Lets make sure we raise a PR for required changes to M-Bridge before that so we can see what can and cannot be upstreamed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlikely have time for it in next 2 weeks

Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Megatron-Bridge adapters for Puzzletron AnyModel checkpoints.

This module provides bridges for converting Puzzletron AnyModel checkpoints
(heterogeneous layer architectures) to Megatron-Core format via Megatron-Bridge.
"""

# Import to register bridges (side effect)
from modelopt.torch.puzzletron.export.mbridge.base import HeterogeneousBridgeMixin
from modelopt.torch.puzzletron.export.mbridge.llama import ( # noqa: F401
PuzzletronLlamaAnyModelBridge,
)
from modelopt.torch.puzzletron.export.mbridge.qwen3 import ( # noqa: F401
PuzzletronQwen3AnyModelBridge,
)

__all__ = [
"HeterogeneousBridgeMixin",
"PuzzletronLlamaAnyModelBridge",
"PuzzletronQwen3AnyModelBridge",
]
142 changes: 142 additions & 0 deletions modelopt/torch/puzzletron/export/mbridge/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
#!/usr/bin/env python3
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
Mixin class for bridges that support heterogeneous layer architectures.

This module provides a mixin class for converting models with block_configs
(heterogeneous layer configurations) to Megatron-Core format via Megatron-Bridge.
"""

import dataclasses
import json
from collections.abc import Callable
from dataclasses import dataclass, fields

from megatron.bridge.models.gpt_provider import GPTModelProvider
from megatron.bridge.models.hf_pretrained.causal_lm import PreTrainedCausalLM
from megatron.bridge.models.transformer_config import HeterogeneousTransformerConfig
from megatron.core.models.gpt.heterogeneous.heterogeneous_layer_specs import (
get_gpt_heterogeneous_layer_spec,
)
from megatron.core.transformer.spec_utils import ModuleSpec


def heterogeneous_layer_spec(config) -> ModuleSpec:
"""Get GPT heterogeneous layer spec using Transformer Engine."""
return get_gpt_heterogeneous_layer_spec(config, use_te=True)


@dataclass
class GenericHeterogeneousProvider(GPTModelProvider, HeterogeneousTransformerConfig):
"""Generic provider for AnyModel checkpoints with block_configs."""

# Heterogeneous configuration fields
heterogeneous_layers_config_path: str | None = None
heterogeneous_layers_config_encoded_json: str = ""
transformer_layer_spec: ModuleSpec | Callable = heterogeneous_layer_spec
Comment on lines +47 to +50

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't let the parent provider overwrite the heterogeneous layer spec.

Line 50 sets GenericHeterogeneousProvider.transformer_layer_spec to heterogeneous_layer_spec, but Lines 93-113 copy the parent's transformer_layer_spec straight back into provider_kwargs. That means the returned provider can still build the vanilla layer layout instead of the heterogeneous one.

Proposed fix
-        provider_kwargs = dataclasses.asdict(parent_provider)
+        provider_kwargs = {
+            field.name: getattr(parent_provider, field.name)
+            for field in fields(parent_provider)
+            if field.init
+        }
@@
         # Only keep kwargs that are valid fields
         provider_kwargs = {k: v for k, v in provider_kwargs.items() if k in valid_fields}
+        provider_kwargs["transformer_layer_spec"] = heterogeneous_layer_spec
 
         provider_kwargs["heterogeneous_layers_config_encoded_json"] = (
             self._build_heterogeneous_config_json(hf_pretrained.config)
         )

Also applies to: 91-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/export/mbridge/base.py` around lines 47 - 50, The
parent provider's transformer_layer_spec is being copied back into
provider_kwargs and overwrites the heterogeneous_layer_spec set on
GenericHeterogeneousProvider; update the provider construction logic (the code
that builds provider_kwargs around provider_kwargs / provider_kwargs update in
the factory method that handles parent providers) so it does not copy or
override transformer_layer_spec from the parent into provider_kwargs if
GenericHeterogeneousProvider.transformer_layer_spec was explicitly set to
heterogeneous_layer_spec (i.e., only inherit transformer_layer_spec when the
child has none), and ensure heterogeneous_layers_config_path /
heterogeneous_layers_config_encoded_json are likewise preserved on the returned
provider instead of being clobbered by the parent.


def __getattr__(self, name: str):
"""Handle missing attributes for OmegaConf compatibility.

Returns empty list for per_block_parameters if not yet initialized (before finalize()).
This allows OmegaConf to serialize/deserialize configs without errors. Actual usage
should call finalize() first to set per_block_parameters as a real attribute.
"""
if name == "per_block_parameters":
# Return existing attribute if set, otherwise [] for OmegaConf compatibility
try:
return object.__getattribute__(self, name)
except AttributeError:
return []
raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{name}'")


class HeterogeneousBridgeMixin:
"""Mixin for bridges supporting heterogeneous layer architectures (block_configs).

Must be used with multiple inheritance alongside a model-specific bridge.
Example: class PuzzletronLlamaAnyModelBridge(HeterogeneousBridgeMixin, LlamaBridge)
"""

def provider_bridge(self, hf_pretrained: PreTrainedCausalLM) -> GPTModelProvider:
"""Convert HF AnyModel config to Megatron GPTModelProvider.

This method:
1. Calls the parent bridge's provider_bridge() to get a GPTModelProvider with all
model-specific settings (e.g., LlamaBridge sets normalization="RMSNorm", etc.)
2. Converts the provider to a dict and filters to only fields accepted by
GenericHeterogeneousProvider (which inherits from GPTModelProvider, so all valid
GPTModelProvider fields are preserved)
3. Adds heterogeneous configuration and returns GenericHeterogeneousProvider

All parameters from the parent bridge (e.g., LlamaBridge) are maintained because
GenericHeterogeneousProvider inherits from GPTModelProvider, which includes all
the fields that the parent bridge sets.
"""

parent_provider = super().provider_bridge(hf_pretrained) # type: ignore[misc]

provider_kwargs = dataclasses.asdict(parent_provider)

# Filter to only fields that GenericHeterogeneousProvider accepts.
# GenericHeterogeneousProvider inherits from GPTModelProvider, so it includes all
# GPTModelProvider fields. Model-specific fields from subclasses (e.g., MistralModelProvider,
# GPTOSSModelProvider) are filtered out because GenericHeterogeneousProvider only inherits
# from GPTModelProvider, not from model-specific subclasses.
#
# Note: This logic may not work for bridges like MistralBridge or GPTOSSBridge if they
# use model-specific parameters not supported by GenericHeterogeneousProvider (e.g.,
# scale_factor, yarn_rotary_scaling_factor, moe_* parameters). In such cases, create a
# model-specific heterogeneous provider that inherits from the model-specific provider.
valid_fields = {f.name for f in fields(GenericHeterogeneousProvider)}

# Only keep kwargs that are valid fields
provider_kwargs = {k: v for k, v in provider_kwargs.items() if k in valid_fields}

provider_kwargs["heterogeneous_layers_config_encoded_json"] = (
self._build_heterogeneous_config_json(hf_pretrained.config)
)
return GenericHeterogeneousProvider(**provider_kwargs)

def _build_heterogeneous_config_json(self, hf_config) -> str:
"""Build heterogeneous layers config JSON from HF config."""

hf_config_dict = json.loads(hf_config.to_json_string())

mcore_block_configs = [
self._convert_block_config(block) for block in hf_config_dict["block_configs"]
]
return json.dumps({"block_configs": mcore_block_configs}, ensure_ascii=False)

def _convert_block_config(self, block: dict) -> dict:
"""Convert a single block config from HF format to MCore format."""
return {
"attention": self._convert_attention_config(block["attention"]),
"ffn": self._convert_ffn_config(block["ffn"]),
}

def _convert_attention_config(self, attention_config: dict) -> dict:
"""Convert attention config from HF format to MCore format."""
attention_config = attention_config.copy()
attention_config["num_query_groups"] = attention_config.pop("num_key_value_heads")
return attention_config

def _convert_ffn_config(self, ffn_config: dict) -> dict:
"""Convert FFN/MLP config from HF format to MCore format."""
ffn_config = ffn_config.copy()
ffn_config["ffn_hidden_size"] = ffn_config.pop("intermediate_size")
return ffn_config
Loading
Loading