fix: refactor kernels patch to drop routing and inject into Expert by NanoCode012 · Pull Request #3651 · axolotl-ai-cloud/axolotl

NanoCode012 · 2026-05-13T08:27:50Z

registry

Description

~~Waiting on #3632 to get in first to refactor how it handles experts there.~~

Cleans up our kernels implementation to reduce our impact, let upstream handle the routing. We just apply to the MoE block now. This increases our support surface while reducing the burden of rewriting routing for new archs.

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

coderabbitai · 2026-05-13T08:27:56Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 61b8c73b-2a81-4b1d-a72a-6def18eb8f25

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR refactors the SonicMoE and ScatterMoE expert kernel integration from model-specific monkey-patching to Transformers' ExpertsInterface dispatch registry. Configuration validators auto-select kernels based on use_scattermoe/use_sonicmoe flags. A new LoRA-aware SonicMoE expert forward implementation validates config, resolves PEFT-wrapped weights, and materializes effective weights. The plugin simplifies registration by directly calling register_*_experts() functions. Old routing, weight-conversion, and block-level MoE modules are removed entirely.

Changes

MoE Expert Kernel Registration and LoRA Support

Layer / File(s)	Summary
Documentation and conceptual model: patching → registration `docs/optimizations.qmd`, `src/axolotl/integrations/kernels/README.md`, `src/axolotl/integrations/kernels/constants.py`	Updated expert-kernels and parallelism documentation; removed `SPARSE_MOE_BLOCK` mapping and block-level MoE infrastructure, keeping only experts-only model support and conceptual shift to Transformers `ALL_EXPERTS_FUNCTIONS` dispatch.
Configuration validation and expert-implementation auto-selection `src/axolotl/integrations/kernels/args.py`	New `check_sonicmoe_ep_unsupported` validator rejects SonicMoE with expert parallelism. Updated `check_experts_implementation` to auto-select kernel from `use_scattermoe`/`use_sonicmoe` flags, handle mismatches with warnings, and validate against unified `_VALID_EXPERTS_IMPLS` set.
ScatterMoE idempotent registration to Transformers `src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py`	Refactors `register_scattermoe_experts()` to be idempotent via module flag; registers `"scattermoe"` into `ALL_EXPERTS_FUNCTIONS`; removes legacy `patch_gemma4_scattermoe` monkeypatch helper and Pre-TrainedModel patching.
SonicMoE LoRA-aware expert forward implementation `src/axolotl/integrations/kernels/libs/sonicmoe/experts.py`	New `sonicmoe_experts_forward_with_lora` validates SonicMoE gating/CUDA requirements; resolves expert weights and optional LoRA from PEFT wrappers (module-level and per-parameter); materializes effective weights via `MoELoRAMaterialize`; permutes layouts and dispatches to upstream `_sonicmoe_wrapper`. Includes helpers for PEFT ParamWrapper unwrapping and LoRA tuple extraction.
SonicMoE package reorganization: exports and LoRA documentation `src/axolotl/integrations/kernels/libs/sonicmoe/__init__.py`, `src/axolotl/integrations/kernels/libs/sonicmoe/lora.py`	Exports `register_sonicmoe_experts` and `sonicmoe_experts_forward_with_lora` instead of `patch_sonicmoe`. Removes `unwrap_gate_lora` helper; simplifies `MoELoRAMaterialize` docstring. Deletes `routing.py`, `weight_converter.py`, `patch.py`, and `gemma4_experts.py` (old patching/routing infrastructure).
KernelsPlugin pre_model_load: direct expert registration and experts_implementation setup `src/axolotl/integrations/kernels/plugin.py`	Simplifies pre_model_load to compute `ep_active`, directly call `register_scattermoe_experts()` / `register_sonicmoe_experts()`, validate SonicMoE GPU compatibility, and set `cfg.experts_implementation` only when EP inactive. Removes `moe_model_type` branching, experts-only model handling, and model-specific patching. Deletes `_kernelize_model` and `_register_experts_interface` helpers.
Expert-parallel integration: import path and SonicMoE disabling `src/axolotl/integrations/expert_parallel/experts_fn.py`	Updates `scattermoe` local kernel import to `scattermoe_lora.experts` module. Disables `sonicmoe` by raising `NotImplementedError` in expert-parallel contexts.
End-to-end SonicMoE test refactoring: model construction and correctness validation `tests/e2e/integrations/test_sonicmoe.py`, `tests/e2e/integrations/test_sonicmoe_lora.py`	Introduces `_is_hopper_or_newer()` compute-capability gating and refactors model construction via `_build_model()` / `_build_sonic_model()` helpers that call `register_sonicmoe_experts()` and set `config._experts_implementation`. Removes manual patching/interleaving/teardown. Refactors forward correctness, gradient correctness, training convergence, and adapter-update tests to use centralized model builders and LoRA helpers.
Unit test reorganization: registration, LoRA validation, and removal of old routing/converter tests `tests/integrations/test_sonicmoe.py`, `tests/integrations/test_gemma4_moe.py`, `tests/integrations/test_sonicmoe_lora.py`, removed: `tests/integrations/test_routing_parity.py`, `tests/integrations/test_sonicmoe_gradients.py`	Refocuses unit tests on `KernelsArgs.check_experts_implementation` auto-selection, `register_sonicmoe_experts` idempotent registration to `ALL_EXPERTS_FUNCTIONS`, and LoRA materialization (shape/identity, scaling, PEFT einsum equivalence, gradient flow). Removes routing parity, weight-converter, and gradient-check test suites. Updates Gemma4 tests to validate experts-only model classification instead of routing functions. Removes `unwrap_gate_lora` coverage and mock LoRA-gate builder.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

axolotl-ai-cloud/axolotl#3474: Upstream ScatterMoE/LoRA expert kernel registration and dispatch work that this PR integrates and extends.
axolotl-ai-cloud/axolotl#3377: Earlier kernels/experts_implementation infrastructure that this PR builds on to add unified ScatterMoE/SonicMoE dispatch.
axolotl-ai-cloud/axolotl#3408: Overlapping documentation updates to README.md clarifying expert kernel configuration and selection.

Suggested reviewers

winglian

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: refactor kernels patch to drop routing and inject into Expert' directly and clearly describes the main change: a refactoring of the kernels patch system that removes routing logic and integrates functionality into the Expert interface.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/refactor-kernels

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-22T08:24:09Z

📖 Documentation Preview: https://6a15dfdeb7265aa320961597--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 1f3c5d2

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py (1)
143-146: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle one-sided PEFT wrapping here.

This branch runs when either projection is wrapped, but _unwrap_experts_lora() only returns LoRA tensors when gate_up_proj is the wrapper. A config that targets only down_proj will silently drop its LoRA delta from the ScatterMoE path.
Suggested fix
-    if not isinstance(getattr(experts, "gate_up_proj", None), ParamWrapper):
+    if not isinstance(getattr(experts, "gate_up_proj", None), ParamWrapper) and not isinstance(
+        getattr(experts, "down_proj", None), ParamWrapper
+    ):
         return experts, None, None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py` around
lines 143 - 146, The current branch detects PEFT via _has_peft_wrapper(self) but
always expects _unwrap_experts_lora(self) to return a gate_up_proj LoRA tensor,
which drops LoRA deltas when only down_proj is wrapped; update the logic in the
block that sets gup_lora and down_lora to handle one-sided PEFT wrappers by
calling _unwrap_experts_lora(self) and then checking which return positions are
None, assigning whichever of gate_up_proj (gup_lora) or down_proj (down_lora) is
present instead of assuming gate_up_proj is wrapped; ensure the ScatterMoE path
uses the non-None tensor(s) so a config that targets only down_proj is
preserved.
src/axolotl/integrations/expert_parallel/experts_fn.py (1)
69-77: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't register deep_ep_sonicmoe while it always raises.

_sonicmoe_local() is now an unconditional NotImplementedError, but the registry and name mapping still advertise deep_ep_sonicmoe as a selectable experts implementation. That turns an unsupported mode into a guaranteed runtime failure instead of rejecting it during setup.
Suggested fix
 _LOCAL_KERNELS = {
     "eager": _eager_local,
     "grouped_mm": _grouped_mm_local,
     "scattermoe": _scattermoe_local,
-    "sonicmoe": _sonicmoe_local,
 }
@@
 REGISTRY = {
     "deep_ep": deep_ep_experts_forward,
     "deep_ep_grouped_mm": deep_ep_grouped_mm_experts_forward,
     "deep_ep_scattermoe": deep_ep_scattermoe_experts_forward,
-    "deep_ep_sonicmoe": deep_ep_sonicmoe_experts_forward,
 }
@@
     return {
         "eager": "deep_ep",
         "grouped_mm": "deep_ep_grouped_mm",
         "scattermoe": "deep_ep_scattermoe",
-        "sonicmoe": "deep_ep_sonicmoe",
     }[kernel]
Also applies to: 228-238, 264-271
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/expert_parallel/experts_fn.py` around lines 69 - 77,
The registry advertises a selectable implementation
("sonicmoe"/"deep_ep_sonicmoe") while _sonicmoe_local() always raises, so remove
or conditionally register that entry to avoid runtime failures: delete the
"sonicmoe" / "deep_ep_sonicmoe" entries from _LOCAL_KERNELS and any
corresponding mapping or _GLOBAL_KERNELS/name mapping that references
deep_ep_sonicmoe (or only register it behind a feature flag), and ensure any
factory/path that selects by name returns a clear setup-time error if the mode
is requested but not implemented; use the function name _sonicmoe_local to
locate the implementation and update all registry dictionaries that reference
it.
tests/e2e/integrations/test_sonicmoe_lora.py (1)
55-85: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Keep the LoRA E2E path config-driven as well.

This helper forces SonicMoE on via code instead of the supported YAML/config path, so the LoRA suite won't catch breakage in the real feature-toggle flow. For an end-to-end test, I'd route activation through the same config surface users rely on instead of mutating _experts_implementation directly.

As per coding guidelines, "Config-driven architecture: all training features are toggled via YAML configuration, not code changes".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/integrations/test_sonicmoe_lora.py` around lines 55 - 85, The test
helper hardcodes SonicMoE by setting config._experts_implementation and calling
register_sonicmoe_experts inside _create_tiny_qwen3_config/_build_sonic_model;
change it to drive expert activation from the same config surface the app uses
(e.g., accept a test config parameter or read the YAML-driven flag) and only set
config._experts_implementation/register_sonicmoe_experts when that config flag
indicates sonicmoe is enabled. Update _create_tiny_qwen3_config to derive
_experts_implementation from the provided test configuration and update
_build_sonic_model to call register_sonicmoe_experts conditionally based on that
same config value so the E2E path matches the real feature-toggle flow.
tests/e2e/integrations/test_sonicmoe.py (1)
44-75: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Exercise the public YAML-driven toggle path here.

These helpers hardcode config._experts_implementation and manual expert registration, so this E2E suite no longer covers the supported config-driven flow that maps user YAML to kernel selection. A regression in validator/plugin wiring could still pass here while real training breaks.

As per coding guidelines, "Config-driven architecture: all training features are toggled via YAML configuration, not code changes".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/integrations/test_sonicmoe.py` around lines 44 - 75, The test
helpers currently bypass the public YAML-driven toggle by calling
register_sonicmoe_experts() and writing to the private
config._experts_implementation; change them to exercise the public config path
instead: stop calling register_sonicmoe_experts() and stop setting
config._experts_implementation, and instead set the public config key (e.g.,
config.experts_implementation) or load a tiny YAML snippet via the project's
config loader so the normal YAML->plugin wiring runs and selects the sonicmoe
kernel; keep using _create_tiny_qwen3_config and _build_model but ensure they
drive kernel selection through the public config/YAML path rather than manual
registration or private attributes.

🧹 Nitpick comments (2)

src/axolotl/integrations/kernels/constants.py (1)
1-2: ⚡ Quick win

Trim comments/docstrings to short WHY-focused notes only.

Line 1, Line 6, and Line 13 currently describe WHAT the code does; in this path, comments should be one short line and only capture non-obvious rationale.
♻️ Proposed cleanup
-"""Diagnostic helpers for MoE kernel integrations (kernel dispatch itself
-is architecture-agnostic via the ExpertsInterface)."""
+"""Helpers for experts-only MoE class resolution."""
@@
-# Models where MoE is embedded in the decoder layer (no separate SparseMoeBlock).
+# Keep explicit map for experts-only model layouts.
@@
-    """Resolve the Experts class for a known model type, or ``None``."""
+    """Resolve experts class; return None when model type is unmapped."""
As per coding guidelines src/axolotl/**/*.py: “Only add comments when explaining the WHY behind non-obvious logic… Do not comment on WHAT code does… Comments should be a maximum of one short line.”

Also applies to: 6-6, 13-13
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/kernels/constants.py` around lines 1 - 2, The
module-level docstring and inline comments are describing WHAT the code does;
replace the multi-line docstring and the comments at the other noted locations
with a single short WHY-focused line each (e.g., one-line rationale for the MoE
kernel diagnostic helpers) so they explain non-obvious rationale only and not
implementation details; update the top-level docstring and the comments around
any diagnostic helpers/constants in this file to be a concise one-line WHY note.
src/axolotl/integrations/kernels/args.py (1)
8-14: ⚡ Quick win

Reduce inline comments to short WHY-only notes.

Line 8-14 and Line 78-80 are explanatory WHAT-comments and exceed the one-line guidance for this path.

As per coding guidelines src/axolotl/**/*.py: “Only add comments when explaining the WHY behind non-obvious logic… Do not comment on WHAT code does… Comments should be a maximum of one short line.”

Also applies to: 78-80
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/kernels/args.py` around lines 8 - 14, The block
listing valid experts_implementation values is a WHAT-style multi-line comment;
replace it with a single short WHY-focused comment explaining why callers care
about valid expert implementations and move the detailed list into the module
docstring or README; do the same for the later multi-line WHAT comment around
lines 78-80 (locate the comment near any references to experts_implementation or
related kernel selection logic such as the variable/function that
reads/validates experts_implementation) — keep each in-file comment to one short
line that explains the rationale only.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/axolotl/integrations/kernels/args.py`:
- Line 57: The before-validator currently evaluates raw dict values directly
(e.g., data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1)
which mis-handles string inputs like "false" or "2" and can raise TypeError;
update the before-validator to explicitly coerce/parse these fields first (e.g.,
parse use_sonicmoe into a boolean using a safe string-to-bool helper and coerce
expert_parallel_size to int with a fallback) and then perform the comparison,
adjusting any helper or validator functions used by the validator; also replace
the multi-line comment blocks that document option values with short single-line
comments (split or rephrase the content around the validator and the option
docs) to comply with the one-line comment guideline.

In `@src/axolotl/integrations/kernels/plugin.py`:
- Around line 82-83: The branch that enables SonicMoE (cfg.use_sonicmoe) must
fail immediately when the host lacks GPU support: call
_check_sonicmoe_gpu_compat() and if it reports incompatibility (or returns a
falsy value) raise an explicit exception (e.g., RuntimeError/ValueError) or
return an error so plugin setup aborts rather than continuing; alternatively
modify _check_sonicmoe_gpu_compat() to raise on CPU-only hosts and ensure the
cfg.use_sonicmoe path relies on that behavior so the misconfiguration is
detected early instead of failing later in sonicmoe_experts_forward_with_lora.
- Around line 73-92: pre_model_load currently only registers experts if
cfg.use_scattermoe or cfg.use_sonicmoe are true, which breaks when the YAML sets
cfg.experts_implementation directly; update pre_model_load to also inspect
cfg.experts_implementation (values "scattermoe" and "sonicmoe") and call
register_scattermoe_experts() or register_sonicmoe_experts() as appropriate when
the corresponding boolean is false, preserve the existing ep_active check and
assignment to cfg.experts_implementation, and keep LOG.info messages unchanged
so the registry is initialized whenever either the legacy booleans or the
config-driven experts_implementation indicates scattermoe/sonicmoe.

---

Outside diff comments:
In `@src/axolotl/integrations/expert_parallel/experts_fn.py`:
- Around line 69-77: The registry advertises a selectable implementation
("sonicmoe"/"deep_ep_sonicmoe") while _sonicmoe_local() always raises, so remove
or conditionally register that entry to avoid runtime failures: delete the
"sonicmoe" / "deep_ep_sonicmoe" entries from _LOCAL_KERNELS and any
corresponding mapping or _GLOBAL_KERNELS/name mapping that references
deep_ep_sonicmoe (or only register it behind a feature flag), and ensure any
factory/path that selects by name returns a clear setup-time error if the mode
is requested but not implemented; use the function name _sonicmoe_local to
locate the implementation and update all registry dictionaries that reference
it.

In `@src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py`:
- Around line 143-146: The current branch detects PEFT via
_has_peft_wrapper(self) but always expects _unwrap_experts_lora(self) to return
a gate_up_proj LoRA tensor, which drops LoRA deltas when only down_proj is
wrapped; update the logic in the block that sets gup_lora and down_lora to
handle one-sided PEFT wrappers by calling _unwrap_experts_lora(self) and then
checking which return positions are None, assigning whichever of gate_up_proj
(gup_lora) or down_proj (down_lora) is present instead of assuming gate_up_proj
is wrapped; ensure the ScatterMoE path uses the non-None tensor(s) so a config
that targets only down_proj is preserved.

In `@tests/e2e/integrations/test_sonicmoe_lora.py`:
- Around line 55-85: The test helper hardcodes SonicMoE by setting
config._experts_implementation and calling register_sonicmoe_experts inside
_create_tiny_qwen3_config/_build_sonic_model; change it to drive expert
activation from the same config surface the app uses (e.g., accept a test config
parameter or read the YAML-driven flag) and only set
config._experts_implementation/register_sonicmoe_experts when that config flag
indicates sonicmoe is enabled. Update _create_tiny_qwen3_config to derive
_experts_implementation from the provided test configuration and update
_build_sonic_model to call register_sonicmoe_experts conditionally based on that
same config value so the E2E path matches the real feature-toggle flow.

In `@tests/e2e/integrations/test_sonicmoe.py`:
- Around line 44-75: The test helpers currently bypass the public YAML-driven
toggle by calling register_sonicmoe_experts() and writing to the private
config._experts_implementation; change them to exercise the public config path
instead: stop calling register_sonicmoe_experts() and stop setting
config._experts_implementation, and instead set the public config key (e.g.,
config.experts_implementation) or load a tiny YAML snippet via the project's
config loader so the normal YAML->plugin wiring runs and selects the sonicmoe
kernel; keep using _create_tiny_qwen3_config and _build_model but ensure they
drive kernel selection through the public config/YAML path rather than manual
registration or private attributes.

---

Nitpick comments:
In `@src/axolotl/integrations/kernels/args.py`:
- Around line 8-14: The block listing valid experts_implementation values is a
WHAT-style multi-line comment; replace it with a single short WHY-focused
comment explaining why callers care about valid expert implementations and move
the detailed list into the module docstring or README; do the same for the later
multi-line WHAT comment around lines 78-80 (locate the comment near any
references to experts_implementation or related kernel selection logic such as
the variable/function that reads/validates experts_implementation) — keep each
in-file comment to one short line that explains the rationale only.

In `@src/axolotl/integrations/kernels/constants.py`:
- Around line 1-2: The module-level docstring and inline comments are describing
WHAT the code does; replace the multi-line docstring and the comments at the
other noted locations with a single short WHY-focused line each (e.g., one-line
rationale for the MoE kernel diagnostic helpers) so they explain non-obvious
rationale only and not implementation details; update the top-level docstring
and the comments around any diagnostic helpers/constants in this file to be a
concise one-line WHY note.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ca3e483-6b63-4a23-b116-761fa267edea

📥 Commits

Reviewing files that changed from the base of the PR and between cc25d3e and ca4ca95.

📒 Files selected for processing (21)

docs/optimizations.qmd
src/axolotl/integrations/expert_parallel/experts_fn.py
src/axolotl/integrations/kernels/README.md
src/axolotl/integrations/kernels/args.py
src/axolotl/integrations/kernels/constants.py
src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py
src/axolotl/integrations/kernels/libs/sonicmoe/__init__.py
src/axolotl/integrations/kernels/libs/sonicmoe/experts.py
src/axolotl/integrations/kernels/libs/sonicmoe/gemma4_experts.py
src/axolotl/integrations/kernels/libs/sonicmoe/lora.py
src/axolotl/integrations/kernels/libs/sonicmoe/patch.py
src/axolotl/integrations/kernels/libs/sonicmoe/routing.py
src/axolotl/integrations/kernels/libs/sonicmoe/weight_converter.py
src/axolotl/integrations/kernels/plugin.py
tests/e2e/integrations/test_sonicmoe.py
tests/e2e/integrations/test_sonicmoe_lora.py
tests/integrations/test_gemma4_moe.py
tests/integrations/test_routing_parity.py
tests/integrations/test_sonicmoe.py
tests/integrations/test_sonicmoe_gradients.py
tests/integrations/test_sonicmoe_lora.py

💤 Files with no reviewable changes (7)

tests/integrations/test_routing_parity.py
src/axolotl/integrations/kernels/libs/sonicmoe/weight_converter.py
src/axolotl/integrations/kernels/libs/sonicmoe/patch.py
src/axolotl/integrations/kernels/libs/sonicmoe/gemma4_experts.py
src/axolotl/integrations/kernels/libs/sonicmoe/routing.py
tests/integrations/test_sonicmoe_gradients.py
tests/integrations/test_sonicmoe_lora.py

coderabbitai · 2026-05-22T08:26:49Z

+    @classmethod
+    def check_sonicmoe_ep_unsupported(cls, data):
+        """SonicMoE + EP is not yet implemented (EP `_sonicmoe_local` raises)."""
+        if data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n 'bool\(data\.get\("use_scattermoe"\)\)|bool\(data\.get\("use_sonicmoe"\)\)|\(data\.get\("expert_parallel_size"\) or 1\) > 1' src/axolotl/integrations/kernels/args.py python - <<'PY' print("bool('false') =", bool("false")) print("bool('0') =", bool("0")) for v in ["2", 2, None]: try: print(f"{v!r}: {(v or 1) > 1}") except Exception as e: print(f"{v!r}: {type(e).__name__} -> {e}") PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 405

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="src/axolotl/integrations/kernels/args.py" # Show the relevant surrounding code and any long comment blocks near the referenced line ranges python - <<'PY' import itertools, pathlib path = pathlib.Path("src/axolotl/integrations/kernels/args.py") lines = path.read_text().splitlines() for start, end in [(1, 35), (45, 90), (70, 95), (75, 110)]: print(f"\n--- {start}-{end} ---") for i in range(start, min(end, len(lines)) + 1): print(f"{i:4d}: {lines[i-1]}") PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 6985

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n 'use_(sonicmoe|scattermoe)\s*:\s*' . -g'*.yml' -g'*.yaml' -g'*.json' || true rg -n 'expert_parallel_size\s*:\s*' . -g'*.yml' -g'*.yaml' || true rg -n 'model_validator\(mode="before"\)|mode="before"' src/axolotl/integrations/kernels/args.py src/axolotl -S || true rg -n 'str\(.+\)\.lower\(\).*(true|false)|parse.*bool|coerce.*bool|in \("true"' src/axolotl || true

Repository: axolotl-ai-cloud/axolotl

Length of output: 10349

Avoid pre-coercion truthiness/type issues in before-validators

src/axolotl/integrations/kernels/args.py line 57 and lines 69-70 can mis-handle raw string inputs: bool("false") evaluates to True, and ("2" or 1) > 1 raises TypeError (str vs int) before coercion.

Multi-line comment blocks at lines 8-14 and 78-80 violate the “comments should be max one short line” guideline (they document option values / behavior rather than “WHY”).

🐛 Proposed fix

@@ def check_sonicmoe_ep_unsupported(cls, data): """SonicMoE + EP is not yet implemented (EP `_sonicmoe_local` raises).""" - if data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1: + ep_size_raw = data.get("expert_parallel_size") + try: + ep_size = int(ep_size_raw) if ep_size_raw is not None else 1 + except (TypeError, ValueError): + ep_size = 1 + if data.get("use_sonicmoe") is True and ep_size > 1: raise ValueError( "use_sonicmoe=true is not supported with expert_parallel_size > 1. " "Use use_scattermoe=true under EP, or set expert_parallel_size=1." ) return data @@ - use_scattermoe = bool(data.get("use_scattermoe")) - use_sonicmoe = bool(data.get("use_sonicmoe")) + use_scattermoe = data.get("use_scattermoe") is True + use_sonicmoe = data.get("use_sonicmoe") is True

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/axolotl/integrations/kernels/args.py` at line 57, The before-validator currently evaluates raw dict values directly (e.g., data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1) which mis-handles string inputs like "false" or "2" and can raise TypeError; update the before-validator to explicitly coerce/parse these fields first (e.g., parse use_sonicmoe into a boolean using a safe string-to-bool helper and coerce expert_parallel_size to int with a fallback) and then perform the comparison, adjusting any helper or validator functions used by the validator; also replace the multi-line comment blocks that document option values with short single-line comments (split or rephrase the content around the validator and the option docs) to comply with the one-line comment guideline.

coderabbitai · 2026-05-22T08:26:49Z

        if cfg.use_scattermoe:
-            self._register_kernels()
-            if is_experts_only_model(moe_model_type):
-                # Models like Gemma4 where MoE is embedded in the decoder layer
-                # — register ScatterMoE in the ExpertsInterface so that
-                # @use_experts_implementation dispatches to it.
-                self._register_experts_interface()
-                if not ep_active:
-                    cfg.experts_implementation = "scattermoe"
-            elif ep_active:
-                LOG.info(
-                    "expert_parallel_size > 1: skipping SparseMoeBlock-level "
-                    "ScatterMoE patch; the deep_ep_scattermoe registered "
-                    "function handles the kernel under EP."
-                )
-            else:
-                self._kernelize_model(moe_model_type)
-        elif cfg.use_sonicmoe:
-            if not importlib.util.find_spec("sonicmoe"):
-                raise RuntimeError(
-                    "SonicMoE is not installed. See installation instructions at "
-                    "https://github.com/axolotl-ai-cloud/axolotl/blob/main/src/axolotl/integrations/kernels/README.md#sonicmoe-installation"
-                )
+            from axolotl.integrations.kernels.libs.scattermoe_lora.experts import (
+                register_scattermoe_experts,
+            )

+            register_scattermoe_experts()
+            if not ep_active:
+                cfg.experts_implementation = "scattermoe"
+            LOG.info("Registered 'scattermoe' in transformers ExpertsInterface")
+        elif cfg.use_sonicmoe:
            _check_sonicmoe_gpu_compat()

-            if is_experts_only_model(moe_model_type):
-                from axolotl.integrations.kernels.libs.sonicmoe.gemma4_experts import (
-                    patch_gemma4_sonicmoe,
-                )
-
-                LOG.info(
-                    f"Applying SonicMoE experts-level patch for model type: {moe_model_type}"
-                )
-                patch_gemma4_sonicmoe()
-            # TODO(EP+SonicMoE): grad norms explode during training. Re-enable
-            # once the root cause is identified. Same shape as the ScatterMoE
-            # branch above, but SonicMoE additionally needs the gate_up_proj
-            # interleave converter since its w1 layout is [g0, u0, g1, u1, ...]
-            # while the checkpoint stores it concatenated [gate..., up...].
-            #
-            # elif ep_active:
-            #     from axolotl.integrations.kernels.libs.sonicmoe.weight_converter import (
-            #         register_sonicmoe_weight_converter,
-            #     )
-            #
-            #     LOG.info(
-            #         "expert_parallel_size > 1: skipping SparseMoeBlock-level "
-            #         "SonicMoE patch; the deep_ep_sonicmoe registered function "
-            #         "handles the kernel under EP. Registering gate_up_proj "
-            #         "interleave converter."
-            #     )
-            #     register_sonicmoe_weight_converter(moe_model_type)
-            else:
-                from axolotl.integrations.kernels.libs.sonicmoe import patch_sonicmoe
-
-                LOG.info(f"Applying SonicMoE patches for model type: {moe_model_type}")
-                patch_sonicmoe(
-                    moe_model_type,
-                    torch_compile=bool(getattr(cfg, "torch_compile", False)),
-                    base_model_type=cfg.model_config_type,
-                )
-
-    def _register_kernels(self):
-        from kernels import (
-            LocalLayerRepository,
-            Mode,
-            register_kernel_mapping,
-        )
+            from axolotl.integrations.kernels.libs.sonicmoe.experts import (
+                register_sonicmoe_experts,
+            )

-        plugin_root = Path(__file__).parent
-        register_kernel_mapping(
-            {
-                "HFScatterMoEParallelExperts": {
-                    "cuda": {
-                        Mode.TRAINING: LocalLayerRepository(
-                            repo_path=plugin_root / "libs" / "scattermoe_lora",
-                            package_name="scattermoe_lora",
-                            layer_name="HFScatterMoEGatedMLP",
-                        ),
-                        Mode.INFERENCE: LocalLayerRepository(
-                            repo_path=plugin_root / "libs" / "scattermoe_lora",
-                            package_name="scattermoe_lora",
-                            layer_name="HFScatterMoEGatedMLP",
-                        ),
-                    },
-                }
-            }
-        )
+            register_sonicmoe_experts()
+            if not ep_active:
+                cfg.experts_implementation = "sonicmoe"
+            LOG.info("Registered 'sonicmoe' in transformers ExpertsInterface")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor cfg.experts_implementation when registering kernels.

pre_model_load() only installs the handler when the legacy use_scattermoe / use_sonicmoe booleans are set. In this stack the validator layer also accepts YAML that selects experts_implementation directly, so experts_implementation: scattermoe or sonicmoe without the matching boolean leaves the registry uninitialized and fails later at lookup time.

Suggested fix

+ requested_impl = getattr(cfg, "experts_implementation", None) - if cfg.use_scattermoe: + if cfg.use_scattermoe or requested_impl == "scattermoe": from axolotl.integrations.kernels.libs.scattermoe_lora.experts import ( register_scattermoe_experts, ) register_scattermoe_experts() if not ep_active: cfg.experts_implementation = "scattermoe" LOG.info("Registered 'scattermoe' in transformers ExpertsInterface") - elif cfg.use_sonicmoe: + elif cfg.use_sonicmoe or requested_impl == "sonicmoe": _check_sonicmoe_gpu_compat()

As per coding guidelines, **/*.py: Config-driven architecture: all training features are toggled via YAML configuration, not code changes.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if cfg.use_scattermoe:

self._register_kernels()

if is_experts_only_model(moe_model_type):

# Models like Gemma4 where MoE is embedded in the decoder layer

# — register ScatterMoE in the ExpertsInterface so that

# @use_experts_implementation dispatches to it.

self._register_experts_interface()

if not ep_active:

cfg.experts_implementation = "scattermoe"

elif ep_active:

LOG.info(

"expert_parallel_size > 1: skipping SparseMoeBlock-level "

"ScatterMoE patch; the deep_ep_scattermoe registered "

"function handles the kernel under EP."

)

else:

self._kernelize_model(moe_model_type)

elif cfg.use_sonicmoe:

if not importlib.util.find_spec("sonicmoe"):

raise RuntimeError(

"SonicMoE is not installed. See installation instructions at "

"https://github.com/axolotl-ai-cloud/axolotl/blob/main/src/axolotl/integrations/kernels/README.md#sonicmoe-installation"

)

from axolotl.integrations.kernels.libs.scattermoe_lora.experts import (

register_scattermoe_experts,

)

register_scattermoe_experts()

if not ep_active:

cfg.experts_implementation = "scattermoe"

LOG.info("Registered 'scattermoe' in transformers ExpertsInterface")

elif cfg.use_sonicmoe:

_check_sonicmoe_gpu_compat()

if is_experts_only_model(moe_model_type):

from axolotl.integrations.kernels.libs.sonicmoe.gemma4_experts import (

patch_gemma4_sonicmoe,

)

LOG.info(

f"Applying SonicMoE experts-level patch for model type: {moe_model_type}"

)

patch_gemma4_sonicmoe()

# TODO(EP+SonicMoE): grad norms explode during training. Re-enable

# once the root cause is identified. Same shape as the ScatterMoE

# branch above, but SonicMoE additionally needs the gate_up_proj

# interleave converter since its w1 layout is [g0, u0, g1, u1, ...]

# while the checkpoint stores it concatenated [gate..., up...].

#

# elif ep_active:

# from axolotl.integrations.kernels.libs.sonicmoe.weight_converter import (

# register_sonicmoe_weight_converter,

# )

#

# LOG.info(

# "expert_parallel_size > 1: skipping SparseMoeBlock-level "

# "SonicMoE patch; the deep_ep_sonicmoe registered function "

# "handles the kernel under EP. Registering gate_up_proj "

# "interleave converter."

# )

# register_sonicmoe_weight_converter(moe_model_type)

else:

from axolotl.integrations.kernels.libs.sonicmoe import patch_sonicmoe

LOG.info(f"Applying SonicMoE patches for model type: {moe_model_type}")

patch_sonicmoe(

moe_model_type,

torch_compile=bool(getattr(cfg, "torch_compile", False)),

base_model_type=cfg.model_config_type,

)

def _register_kernels(self):

from kernels import (

LocalLayerRepository,

Mode,

register_kernel_mapping,

)

from axolotl.integrations.kernels.libs.sonicmoe.experts import (

register_sonicmoe_experts,

)

plugin_root = Path(__file__).parent

register_kernel_mapping(

{

"HFScatterMoEParallelExperts": {

"cuda": {

Mode.TRAINING: LocalLayerRepository(

repo_path=plugin_root / "libs" / "scattermoe_lora",

package_name="scattermoe_lora",

layer_name="HFScatterMoEGatedMLP",

),

Mode.INFERENCE: LocalLayerRepository(

repo_path=plugin_root / "libs" / "scattermoe_lora",

package_name="scattermoe_lora",

layer_name="HFScatterMoEGatedMLP",

),

},

}

}

)

register_sonicmoe_experts()

if not ep_active:

cfg.experts_implementation = "sonicmoe"

LOG.info("Registered 'sonicmoe' in transformers ExpertsInterface")

requested_impl = getattr(cfg, "experts_implementation", None)

if cfg.use_scattermoe or requested_impl == "scattermoe":

from axolotl.integrations.kernels.libs.scattermoe_lora.experts import (

register_scattermoe_experts,

)

register_scattermoe_experts()

if not ep_active:

cfg.experts_implementation = "scattermoe"

LOG.info("Registered 'scattermoe' in transformers ExpertsInterface")

elif cfg.use_sonicmoe or requested_impl == "sonicmoe":

_check_sonicmoe_gpu_compat()

from axolotl.integrations.kernels.libs.sonicmoe.experts import (

register_sonicmoe_experts,

)

register_sonicmoe_experts()

if not ep_active:

cfg.experts_implementation = "sonicmoe"

LOG.info("Registered 'sonicmoe' in transformers ExpertsInterface")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/axolotl/integrations/kernels/plugin.py` around lines 73 - 92, pre_model_load currently only registers experts if cfg.use_scattermoe or cfg.use_sonicmoe are true, which breaks when the YAML sets cfg.experts_implementation directly; update pre_model_load to also inspect cfg.experts_implementation (values "scattermoe" and "sonicmoe") and call register_scattermoe_experts() or register_sonicmoe_experts() as appropriate when the corresponding boolean is false, preserve the existing ep_active check and assignment to cfg.experts_implementation, and keep LOG.info messages unchanged so the registry is initialized whenever either the legacy booleans or the config-driven experts_implementation indicates scattermoe/sonicmoe.

coderabbitai · 2026-05-22T08:26:49Z

+        elif cfg.use_sonicmoe:
            _check_sonicmoe_gpu_compat()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast for SonicMoE on CPU-only hosts.

This branch treats _check_sonicmoe_gpu_compat() as the capability gate, but that helper currently returns when CUDA is unavailable. use_sonicmoe therefore survives plugin setup and only dies much later in sonicmoe_experts_forward_with_lora, which makes the unsupported config harder to diagnose.

Suggested fix

- if not torch.cuda.is_available(): - return + if not torch.cuda.is_available(): + raise RuntimeError("SonicMoE requires a CUDA device.")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/axolotl/integrations/kernels/plugin.py` around lines 82 - 83, The branch that enables SonicMoE (cfg.use_sonicmoe) must fail immediately when the host lacks GPU support: call _check_sonicmoe_gpu_compat() and if it reports incompatibility (or returns a falsy value) raise an explicit exception (e.g., RuntimeError/ValueError) or return an error so plugin setup aborts rather than continuing; alternatively modify _check_sonicmoe_gpu_compat() to raise on CPU-only hosts and ensure the cfg.use_sonicmoe path relies on that behavior so the misconfiguration is detected early instead of failing later in sonicmoe_experts_forward_with_lora.

codecov · 2026-05-26T09:10:47Z

Codecov Report

❌ Patch coverage is 42.85714% with 64 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...lotl/integrations/kernels/libs/sonicmoe/experts.py	24.59%	46 Missing ⚠️
src/axolotl/integrations/kernels/plugin.py	0.00%	11 Missing ⚠️
src/axolotl/loaders/model.py	37.50%	5 Missing ⚠️
...axolotl/integrations/expert_parallel/experts_fn.py	0.00%	1 Missing ⚠️
src/axolotl/integrations/kernels/args.py	95.65%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

registry

NanoCode012 force-pushed the fix/refactor-kernels branch from 0616ed2 to ca4ca95 Compare May 22, 2026 08:16

NanoCode012 marked this pull request as ready for review May 22, 2026 08:17

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

NanoCode012 force-pushed the fix/refactor-kernels branch from ca4ca95 to 8e3ed29 Compare May 26, 2026 08:58

winglian added the scheduled_release This PR is slated for the upcoming release label May 26, 2026

NanoCode012 added 4 commits May 26, 2026 16:41

fix: refactor kernels patch to drop routing and inject into Expert

0a15e1d

registry

chore: add to optim doc

d2e42fd

feat: update sonicmoe version

a2815b9

chore: cleanup with DEEPEP and kernels compat

8ee58b1

winglian force-pushed the fix/refactor-kernels branch from 8e3ed29 to 8ee58b1 Compare May 26, 2026 16:41

gate/guard model expert setup

1f3c5d2

winglian merged commit 3f478db into main May 28, 2026
31 of 32 checks passed

winglian deleted the fix/refactor-kernels branch May 28, 2026 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: refactor kernels patch to drop routing and inject into Expert#3651

fix: refactor kernels patch to drop routing and inject into Expert#3651
winglian merged 5 commits into
mainfrom
fix/refactor-kernels

NanoCode012 commented May 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 22, 2026

Uh oh!

coderabbitai Bot May 22, 2026

Uh oh!

coderabbitai Bot May 22, 2026

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

NanoCode012 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NanoCode012 commented May 13, 2026 •

edited

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

codecov Bot commented May 26, 2026 •

edited

Loading