Skip to content

moe quant patch for merge miss match#3483

Merged
winglian merged 13 commits into
axolotl-ai-cloud:mainfrom
ved1beta:moe-merge-patch
Mar 16, 2026
Merged

moe quant patch for merge miss match#3483
winglian merged 13 commits into
axolotl-ai-cloud:mainfrom
ved1beta:moe-merge-patch

Conversation

@ved1beta

@ved1beta ved1beta commented Mar 10, 2026

Copy link
Copy Markdown
Member

Description

Training with quantize_moe_experts=true + two lora_target_parameters on the same expert module (e.g. mlp.experts.gate_up_proj and mlp.experts.down_proj) produces a size mismatch when merging the adapter back.

patch_peft_target_parameters_matching() (moe_quant.py:234)
The existing PEFT patch now wraps the original_inject call inside _sorted_named_params_ctx(), so both training and merge paths always process parameters in the same alphabetical order → same nesting → consistent adapter keys.

Motivation and Context

issue reported on discord https://discord.com/channels/1104757954588196865/1111279858136383509/1480085723733561344

How has this been tested?

includes test_adapter_save_load_roundtrip_no_size_mismatch

AI Usage Disclaimer

claude wrote tests

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved parameter wrapper handling consistency between training and model merging operations
    • Enhanced MOE expert quantization patch to handle additional configuration scenarios correctly
    • Fixed non-deterministic parameter ordering in certain workflows
  • Tests

    • Added comprehensive tests validating parameter wrapper behavior across different configurations

@coderabbitai

coderabbitai Bot commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 27422268-e16d-40fe-a415-5d85693b9331

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The changes enhance MOE quantization and PEFT target parameter matching by introducing deterministic parameter sorting, conditional patching logic, and comprehensive tests to ensure consistent ParamWrapper nesting behavior during training and merging operations.

Changes

Cohort / File(s) Summary
Patch Manager Logic
src/axolotl/loaders/patch_manager.py
Enhanced _apply_moe_expert_quantization_patch with conditional imports and guards against non-quantization configurations. Added logic to patch PEFT parameter targeting when lora_target_parameters is present, regardless of quantization setting.
MOE Quantization Monkeypatch
src/axolotl/monkeypatch/moe_quant.py
Introduced _sorted_named_params_ctx() context manager to ensure deterministic parameter ordering. Extended patch_peft_target_parameters_matching() with enhanced documentation and updated _patched_inject_parameters to use the new context manager for consistent ParamWrapper nesting.
Test Suite
tests/utils/schemas/validation/test_moe_quant.py
Added comprehensive test class TestConsistentParamWrapperNesting with multiple test methods validating consistent ParamWrapper nesting between training and merge paths, including adapter save/load roundtrip and patch idempotency verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

under review, scheduled_release

Suggested reviewers

  • winglian
  • NanoCode012
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.31% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'moe quant patch for merge miss match' is partially related to the changeset. It references the main area being modified (moe quant patch) and alludes to a merge mismatch issue, but uses vague phrasing ('miss match' appears to be a misspelling of 'mismatch') and lacks specificity about what is being fixed. Consider revising the title to be more specific and clear, such as 'Fix ParamWrapper nesting inconsistency between training and merge paths for MOE quantization' or 'Ensure consistent parameter ordering in MOE quantization patches for adapter merge'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can suggest fixes for GitHub Check annotations.

Configure the reviews.tools.github-checks setting to adjust the time to wait for GitHub Checks to complete.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/utils/schemas/validation/test_moe_quant.py`:
- Around line 279-281: Mypy complains about the dynamic attribute
_axolotl_patched on the function patch_peft_target_parameters_matching; fix it
by adding a mypy ignore for undefined attributes on the assignment/clear sites:
when setting patch_peft_target_parameters_matching._axolotl_patched = True and
when clearing it (patch_peft_target_parameters_matching._axolotl_patched =
False) add a trailing comment "# type: ignore[attr-defined]". Apply the same
ignore at every place this dynamic attribute is assigned (the earlier set and
the finally/cleanup clear) so mypy stops reporting the attribute-defined error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d27ce30a-15f3-4581-8dab-11034b582189

📥 Commits

Reviewing files that changed from the base of the PR and between cf4d550 and a7fa611.

📒 Files selected for processing (3)
  • src/axolotl/loaders/patch_manager.py
  • src/axolotl/monkeypatch/moe_quant.py
  • tests/utils/schemas/validation/test_moe_quant.py

Comment on lines +279 to +281
finally:
BaseTuner._inject_parameters = original
patch_peft_target_parameters_matching._axolotl_patched = False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix mypy type error for dynamic function attribute.

The pipeline is failing due to mypy not recognizing the dynamically-set _axolotl_patched attribute on the function. This pattern is used in both line 156 and line 281.

🔧 Proposed fix using type: ignore comment
                 finally:
                     BaseTuner._inject_parameters = original
-                    patch_peft_target_parameters_matching._axolotl_patched = False
+                    patch_peft_target_parameters_matching._axolotl_patched = False  # type: ignore[attr-defined]

Apply the same fix at line 156:

         finally:
             BaseTuner._inject_parameters = original
-            patch_peft_target_parameters_matching._axolotl_patched = False
+            patch_peft_target_parameters_matching._axolotl_patched = False  # type: ignore[attr-defined]

And at line 423:

         finally:
             BaseTuner._inject_parameters = original_inject
-            patch_peft_target_parameters_matching._axolotl_patched = False
+            patch_peft_target_parameters_matching._axolotl_patched = False  # type: ignore[attr-defined]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
finally:
BaseTuner._inject_parameters = original
patch_peft_target_parameters_matching._axolotl_patched = False
finally:
BaseTuner._inject_parameters = original
patch_peft_target_parameters_matching._axolotl_patched = False # type: ignore[attr-defined]
🧰 Tools
🪛 GitHub Actions: lint

[error] 281-281: mypy error: 'Callable[[], Any]' has no attribute '_axolotl_patched' (attr-defined).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/schemas/validation/test_moe_quant.py` around lines 279 - 281,
Mypy complains about the dynamic attribute _axolotl_patched on the function
patch_peft_target_parameters_matching; fix it by adding a mypy ignore for
undefined attributes on the assignment/clear sites: when setting
patch_peft_target_parameters_matching._axolotl_patched = True and when clearing
it (patch_peft_target_parameters_matching._axolotl_patched = False) add a
trailing comment "# type: ignore[attr-defined]". Apply the same ignore at every
place this dynamic attribute is assigned (the earlier set and the
finally/cleanup clear) so mypy stops reporting the attribute-defined error.

@codecov

codecov Bot commented Mar 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.00000% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/axolotl/monkeypatch/moe_quant.py 79.06% 9 Missing ⚠️
src/axolotl/loaders/patch_manager.py 40.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

replace_parameter_8bit(mod, pname)
_moe_load_state["count"] += 1

# Release the bf16 tensor so CUDA memory is freed immediately.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this comment. It's good to keep this note as it's the change that reduces vram cost in case we refactor in future.

Comment thread src/axolotl/loaders/patch_manager.py Outdated
Comment on lines +413 to +418
"""Patch transformers weight loading to quantize MoE expert params on-the-fly.

Also patches PEFT's _inject_parameters whenever lora_target_parameters is set
(even without quantize_moe_experts) to ensure consistent ParamWrapper nesting
order between training and merge, preventing adapter key mismatches.
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify the comments here


@torch.no_grad()
def forward(self, quantized_param: torch.Tensor) -> torch.Tensor:
# Flatten 3D+ to 2D for BnB's dequant, then reshape back.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this. Maybe instead of as a comment, can use as fn docstring

module, param_name, Bnb8bitParametrization(row_stats), unsafe=True
)

# Cache dequantized values during forward to avoid redundant dequantization.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this

Comment thread src/axolotl/monkeypatch/moe_quant.py Outdated
# Sequential loading ensures only ONE bf16 expert tensor is on-GPU at a time.
# Force sequential tensor loading so we can quantize-and-free one expert at a time.
# Without this, transformers pre-fetches all bf16 expert tensors to GPU simultaneously.
os.environ["HF_DEACTIVATE_ASYNC_LOAD"] = "1"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR but found a user saying that this is useful to also have on for QLoRA in general

Comment thread src/axolotl/monkeypatch/moe_quant.py Outdated
Comment on lines +148 to +157
"""Fix PEFT's _inject_parameters for suffix matching and portable adapter ordering.

1. Expands short suffix targets (e.g. "mlp.experts.gate_up_proj") to full module
paths so the parametrized branch can match them.

2. Makes the parametrized branch iterate module.parametrizations in insertion order
instead of PEFT's sorted(target_names), matching the standard branch. This ensures
adapters saved during training load correctly with vanilla PEFT, vLLM, and other
tools without requiring this patch.
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify

from peft.utils.integrations import init_empty_weights
from peft.utils.other import _get_submodules

def _patched_inject_parameters(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make sure to manually review this fn changes incase it introduce some edge case issue?

The code I provided for this was generated without me verifying.

The concept would be: copy the upstream peft fn and just remove the sorted path to reuse the target_modules insert order flow.

@NanoCode012 NanoCode012 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add a e2e test that trains an adapter (a few steps), then attempt to merge and ensure it doesn't fail?

@zerofata

Copy link
Copy Markdown

Just tried this and still got error.

root@7e53ee18d5fa:/workspace/axolotl# python3 -m axolotl.cli.merge_lora sft-writing.yml \
    --lora_model_dir="./GLM-Air-v4-SFT-1-writing" \
    --gpu_memory_limit=0
[2026-03-10 21:39:56,579] [WARNING] [torchao] Skipping import of cpp extensions due to incompatible torch version 2.9.0+cu126 for torchao version 0.16.0             Please see https://github.com/pytorch/ao/issues/2919 for more info
[2026-03-10 21:39:58,983] [INFO] [axolotl.integrations.base] Attempting to load plugin: axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
[2026-03-10 21:39:59,972] [INFO] [axolotl.integrations.base] Plugin loaded successfully: axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
[2026-03-10 21:40:00,023] [INFO] [axolotl.utils.schemas.validation] explicitly setting `eval_sample_packing` to match `sample_packing`
[2026-03-10 21:40:00,023] [WARNING] [axolotl.utils.schemas.validation] sample_packing without flash, sdp, xformers, sage, or flex attention does not handle cross sample decontamination.
[2026-03-10 21:40:00,023] [INFO] [axolotl.utils.schemas.validation] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing
[2026-03-10 21:40:00,167] [INFO] [axolotl.cli.config] config:
{
  "activation_offloading": false,
  "adapter": "qlora",
  "axolotl_config_path": "sft-writing.yml",
  "base_model": "ApocalypseParty/GLM-Air-v4-SFT-1-merged",
  "base_model_config": "ApocalypseParty/GLM-Air-v4-SFT-1-merged",
  "batch_size": 8,
  "bf16": true,
  "capabilities": {
    "bf16": true,
    "compute_capability": "sm_90",
    "fp8": true,
    "n_gpu": 1,
    "n_node": 1
  },
  "chat_template": "jinja",
  "chat_template_jinja": "./glm_air.jinja",
  "context_parallel_size": 1,
  "cut_cross_entropy": true,
  "dataloader_num_workers": 1,
  "dataloader_pin_memory": true,
  "dataloader_prefetch_factor": 256,
  "dataset_num_proc": 48,
  "dataset_prepared_path": "last_run_prepared",
  "datasets": [
    {
      "chat_template": "tokenizer_default",
      "message_property_mappings": {
        "content": "content",
        "role": "role"
      },
      "path": "./data/dataset_writing.jsonl",
      "trust_remote_code": false,
      "type": "chat_template"
    }
  ],
  "ddp": false,
  "device": "cuda:0",
  "device_map": "auto",
  "dion_rank_fraction": 1.0,
  "dion_rank_multiple_of": 1,
  "eaft_alpha": 1.0,
  "eaft_k": 20,
  "env_capabilities": {
    "torch_version": "2.9.0"
  },
  "eot_tokens": [
    "<|user|>",
    "<|endoftext|>"
  ],
  "eval_batch_size": 2,
  "eval_causal_lm_metrics": [
    "sacrebleu",
    "comet",
    "ter",
    "chrf"
  ],
  "eval_max_new_tokens": 128,
  "eval_sample_packing": true,
  "eval_table_size": 0,
  "experimental_skip_move_to_device": true,
  "flash_attention": false,
  "fp16": false,
  "generate_samples": false,
  "generation_do_sample": true,
  "generation_max_new_tokens": 50,
  "generation_prompt_ratio": 0.5,
  "generation_temperature": 0.7,
  "gpu_memory_limit": 0,
  "gradient_accumulation_steps": 4,
  "gradient_checkpointing": false,
  "include_tkps": true,
  "learning_rate": 9e-06,
  "lisa_layers_attribute": "model.layers",
  "load_best_model_at_end": false,
  "load_in_4bit": false,
  "load_in_8bit": false,
  "local_rank": 0,
  "logging_steps": 1,
  "lora_alpha": 32,
  "lora_dropout": 0.0,
  "lora_mlp_kernel": false,
  "lora_model_dir": "./GLM-Air-v4-SFT-1-writing",
  "lora_o_kernel": false,
  "lora_qkv_kernel": false,
  "lora_r": 16,
  "lora_target_modules": [
    "q_proj",
    "v_proj",
    "k_proj",
    "o_proj"
  ],
  "lora_target_parameters": [
    "mlp.experts.gate_up_proj",
    "mlp.experts.down_proj"
  ],
  "loraplus_lr_embedding": 1e-06,
  "lr_scheduler": "cosine",
  "mean_resizing_embeddings": false,
  "merge_lora": true,
  "micro_batch_size": 2,
  "model_config_type": "glm4_moe",
  "num_epochs": 8.0,
  "num_generation_samples": 3,
  "optimizer": "adamw_torch_8bit",
  "otel_metrics_host": "localhost",
  "otel_metrics_port": 8000,
  "output_dir": "./GLM-Air-v4-SFT-1-writing",
  "pad_to_sequence_len": true,
  "plugins": [
    "axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin"
  ],
  "pretrain_multipack_attn": true,
  "profiler_steps_start": 0,
  "qlora_sharded_model_loading": false,
  "quantize_moe_experts": false,
  "ray_num_workers": 1,
  "resources_per_worker": {
    "GPU": 1
  },
  "sample_packing": true,
  "sample_packing_bin_size": 200,
  "sample_packing_group_size": 100000,
  "save_only_model": false,
  "save_safetensors": true,
  "save_steps": 0.125,
  "saves_per_epoch": 1,
  "sequence_len": 4096,
  "shuffle_before_merging_datasets": false,
  "shuffle_merged_datasets": true,
  "skip_prepare_dataset": false,
  "streaming_multipack_buffer_size": 10000,
  "strict": false,
  "tensor_parallel_size": 1,
  "tf32": false,
  "tiled_mlp_use_original_mlp": true,
  "tokenizer_config": "ApocalypseParty/GLM-Air-v4-SFT-1-merged",
  "tokenizer_save_jinja_files": true,
  "torch_dtype": "torch.bfloat16",
  "train_on_inputs": false,
  "trl": {
    "log_completions": false,
    "mask_truncated_completions": false,
    "ref_model_mixup_alpha": 0.9,
    "ref_model_sync_steps": 64,
    "scale_rewards": true,
    "sync_ref_model": false,
    "use_vllm": false,
    "vllm_server_host": "0.0.0.0",
    "vllm_server_port": 8000
  },
  "use_otel_metrics": false,
  "use_ray": false,
  "use_wandb": true,
  "val_set_size": 0.0,
  "vllm": {
    "device": "auto",
    "dtype": "auto",
    "gpu_memory_utilization": 0.9,
    "host": "0.0.0.0",
    "port": 8000
  },
  "wandb_name": "GLM-Air-v4-SFT-1-writing",
  "wandb_project": "GLM-Air-v4-SFT",
  "warmup_ratio": 0.1,
  "weight_decay": 0.0,
  "world_size": 1
}
[2026-03-10 21:40:00,169] [INFO] [axolotl.cli.utils.load] loading tokenizer... ApocalypseParty/GLM-Air-v4-SFT-1-merged
[2026-03-10 21:40:02,057] [INFO] [axolotl.cli.utils.load] loading model...
[2026-03-10 21:40:02,109] [INFO] [axolotl.loaders.patch_manager] Applying multipack dataloader patch for sample packing...
[2026-03-10 21:40:02,119] [WARNING] [py.warnings] /usr/local/lib/python3.11/dist-packages/torch/__init__.py:1551: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)
  return _C._get_float32_matmul_precision()

[2026-03-10 21:40:02,129] [INFO] [axolotl.integrations.cut_cross_entropy] Applying Cut Cross Entropy to model type: glm4_moe
[2026-03-10 21:40:02,137] [INFO] [axolotl.monkeypatch.moe_quant] Patched PEFT _inject_parameters for parametrized module suffix matching
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 735/735 [00:32<00:00, 22.97it/s]
[2026-03-10 21:40:39,021] [INFO] [axolotl.loaders.model] Converting modules to torch.bfloat16
[2026-03-10 21:40:39,888] [WARNING] [py.warnings] /usr/local/lib/python3.11/dist-packages/peft/tuners/tuners_utils.py:212: UserWarning: Unsupported layer type '<class 'transformers.models.glm4_moe.modeling_glm4_moe.Glm4MoeNaiveMoe'>' encountered, proceed at your own risk.
  warnings.warn(f"Unsupported layer type '{type(module)}' encountered, proceed at your own risk.", UserWarning)

[2026-03-10 21:40:53,100] [ERROR] [axolotl.telemetry.errors] Error captured in telemetry. Run ID: 509605c1-bbd5-44a7-a7d7-d5b396bb4d7c
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/axolotl/src/axolotl/cli/merge_lora.py", line 94, in <module>
    fire.Fire(do_cli)
  File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/cli/merge_lora.py", line 90, in do_cli
    do_merge_lora(cfg=parsed_cfg)
  File "/workspace/axolotl/src/axolotl/telemetry/errors.py", line 127, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/cli/merge_lora.py", line 26, in do_merge_lora
    model, tokenizer, processor = load_model_and_tokenizer(cfg=cfg)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/cli/utils/load.py", line 45, in load_model_and_tokenizer
    model, _ = model_loader.load()
               ^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/telemetry/errors.py", line 127, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/loaders/model.py", line 186, in load
    lora_config = self._load_adapters()
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/loaders/model.py", line 396, in _load_adapters
    self.model, lora_config = load_adapter(
                              ^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/telemetry/errors.py", line 127, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/loaders/adapter.py", line 193, in load_adapter
    peft_model, lora_config = load_lora(model, cfg, inference=inference)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/loaders/adapter.py", line 154, in load_lora
    model = PeftModel.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/peft/peft_model.py", line 568, in from_pretrained
    load_result = model.load_adapter(
                  ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/peft/peft_model.py", line 1368, in load_adapter
    load_result = set_peft_model_state_dict(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py", line 565, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 2629, in load_state_dict
    raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        size mismatch for base_model.model.model.layers.1.mlp.experts.base_layer.lora_A.default.weight: copying a param with shape torch.Size([2048, 4096]) from checkpoint, the shape in current model is torch.Size([2048, 2816]).
        size mismatch for base_model.model.model.layers.1.mlp.experts.base_layer.lora_B.default.weight: copying a param with shape torch.Size([1408, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 2048]).
        size mismatch for base_model.model.model.layers.1.mlp.experts.lora_A.default.weight: copying a param with shape torch.Size([2048, 2816]) from checkpoint, the shape in current model is torch.Size([2048, 4096]).
        size mismatch for base_model.model.model.layers.1.mlp.experts.lora_B.default.weight: copying a param with shape torch.Size([4096, 2048]) from checkpoint, the shape in current model is torch.Size([1408, 2048]).
        size mismatch for base_model.model.model.layers.2.mlp.experts.base_layer.lora_A.default.weight: copying a param with shape torch.Size([2048, 4096]) from checkpoint, the shape in current model is torch.Size([2048, 2816]).
        size mismatch for base_model.model.model.layers.2.mlp.experts.base_layer.lora_B.default.weight: copying a param with shape torch.Size([1408, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 2048]).
        size mismatch for base_model.model.model.layers.2.mlp.experts.lora_A.default.weight: copying a param with shape torch.Size([2048, 2816]) from checkpoint, the shape in current model is torch.Size([2048, 4096]).
        size mismatch for base_model.model.model.layers.2.mlp.experts.lora_B.default.weight: copying a param with shape torch.Size([4096, 2048]) from checkpoint, the shape in current model is torch.Size([1408, 2048]).
        size mismatch for base_model.model.model.layers.3.mlp.experts.base_layer.lora_A.default.weight: copying a param with shape torch.Size([2048, 4096]) from checkpoint, the shape in current model is torch.Size([2048, 2816]).
        size mismatch for base_model.model.model.layers.3.mlp.experts.base_layer.lora_B.default.weight: copying a param with shape torch.Size([1408, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 2048]).
        size mismatch for base_model.model.model.layers.3.mlp.experts.lora_A.default.weight: copying a param with shape torch.Size([2048, 2816]) from checkpoint, the shape in current model is torch.Size([2048, 4096]).
        size mismatch for base_model.model.model.layers.3.mlp.experts.lora_B.default.weight: copying a param with shape torch.Size([4096, 2048]) from checkpoint, the shape in current model is torch.Size([1408, 2048]).

@ved1beta

Copy link
Copy Markdown
Member Author

you tried on the latest commit right ?? , i remember working fine on my end 🤔

@zerofata

Copy link
Copy Markdown

Was using the below repo / branch.

git clone https://github.com/ved1beta/axolotl
git checkout moe-merge-patch

@ved1beta

Copy link
Copy Markdown
Member Author

dw , looking into it

@ved1beta

Copy link
Copy Markdown
Member Author

some uncommited changes like always 🤕 , it works now thannks for reporting @zerofata
tested with 4.7 flash and 4.5 air + e2e

@NanoCode012 NanoCode012 added the hold don't merge this yet label Mar 13, 2026
@NanoCode012 NanoCode012 removed the hold don't merge this yet label Mar 13, 2026
@winglian winglian merged commit a806704 into axolotl-ai-cloud:main Mar 16, 2026
18 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants