Skip to content

fix: refactor kernels patch to drop routing and inject into Expert#3651

Merged
winglian merged 5 commits into
mainfrom
fix/refactor-kernels
May 28, 2026
Merged

fix: refactor kernels patch to drop routing and inject into Expert#3651
winglian merged 5 commits into
mainfrom
fix/refactor-kernels

Conversation

@NanoCode012
Copy link
Copy Markdown
Collaborator

@NanoCode012 NanoCode012 commented May 13, 2026

registry

Description

Waiting on #3632 to get in first to refactor how it handles experts there.

Cleans up our kernels implementation to reduce our impact, let upstream handle the routing. We just apply to the MoE block now. This increases our support surface while reducing the burden of rewriting routing for new archs.

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 61b8c73b-2a81-4b1d-a72a-6def18eb8f25

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors the SonicMoE and ScatterMoE expert kernel integration from model-specific monkey-patching to Transformers' ExpertsInterface dispatch registry. Configuration validators auto-select kernels based on use_scattermoe/use_sonicmoe flags. A new LoRA-aware SonicMoE expert forward implementation validates config, resolves PEFT-wrapped weights, and materializes effective weights. The plugin simplifies registration by directly calling register_*_experts() functions. Old routing, weight-conversion, and block-level MoE modules are removed entirely.

Changes

MoE Expert Kernel Registration and LoRA Support

Layer / File(s) Summary
Documentation and conceptual model: patching → registration
docs/optimizations.qmd, src/axolotl/integrations/kernels/README.md, src/axolotl/integrations/kernels/constants.py
Updated expert-kernels and parallelism documentation; removed SPARSE_MOE_BLOCK mapping and block-level MoE infrastructure, keeping only experts-only model support and conceptual shift to Transformers ALL_EXPERTS_FUNCTIONS dispatch.
Configuration validation and expert-implementation auto-selection
src/axolotl/integrations/kernels/args.py
New check_sonicmoe_ep_unsupported validator rejects SonicMoE with expert parallelism. Updated check_experts_implementation to auto-select kernel from use_scattermoe/use_sonicmoe flags, handle mismatches with warnings, and validate against unified _VALID_EXPERTS_IMPLS set.
ScatterMoE idempotent registration to Transformers
src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py
Refactors register_scattermoe_experts() to be idempotent via module flag; registers "scattermoe" into ALL_EXPERTS_FUNCTIONS; removes legacy patch_gemma4_scattermoe monkeypatch helper and Pre-TrainedModel patching.
SonicMoE LoRA-aware expert forward implementation
src/axolotl/integrations/kernels/libs/sonicmoe/experts.py
New sonicmoe_experts_forward_with_lora validates SonicMoE gating/CUDA requirements; resolves expert weights and optional LoRA from PEFT wrappers (module-level and per-parameter); materializes effective weights via MoELoRAMaterialize; permutes layouts and dispatches to upstream _sonicmoe_wrapper. Includes helpers for PEFT ParamWrapper unwrapping and LoRA tuple extraction.
SonicMoE package reorganization: exports and LoRA documentation
src/axolotl/integrations/kernels/libs/sonicmoe/__init__.py, src/axolotl/integrations/kernels/libs/sonicmoe/lora.py
Exports register_sonicmoe_experts and sonicmoe_experts_forward_with_lora instead of patch_sonicmoe. Removes unwrap_gate_lora helper; simplifies MoELoRAMaterialize docstring. Deletes routing.py, weight_converter.py, patch.py, and gemma4_experts.py (old patching/routing infrastructure).
KernelsPlugin pre_model_load: direct expert registration and experts_implementation setup
src/axolotl/integrations/kernels/plugin.py
Simplifies pre_model_load to compute ep_active, directly call register_scattermoe_experts() / register_sonicmoe_experts(), validate SonicMoE GPU compatibility, and set cfg.experts_implementation only when EP inactive. Removes moe_model_type branching, experts-only model handling, and model-specific patching. Deletes _kernelize_model and _register_experts_interface helpers.
Expert-parallel integration: import path and SonicMoE disabling
src/axolotl/integrations/expert_parallel/experts_fn.py
Updates scattermoe local kernel import to scattermoe_lora.experts module. Disables sonicmoe by raising NotImplementedError in expert-parallel contexts.
End-to-end SonicMoE test refactoring: model construction and correctness validation
tests/e2e/integrations/test_sonicmoe.py, tests/e2e/integrations/test_sonicmoe_lora.py
Introduces _is_hopper_or_newer() compute-capability gating and refactors model construction via _build_model() / _build_sonic_model() helpers that call register_sonicmoe_experts() and set config._experts_implementation. Removes manual patching/interleaving/teardown. Refactors forward correctness, gradient correctness, training convergence, and adapter-update tests to use centralized model builders and LoRA helpers.
Unit test reorganization: registration, LoRA validation, and removal of old routing/converter tests
tests/integrations/test_sonicmoe.py, tests/integrations/test_gemma4_moe.py, tests/integrations/test_sonicmoe_lora.py, removed: tests/integrations/test_routing_parity.py, tests/integrations/test_sonicmoe_gradients.py
Refocuses unit tests on KernelsArgs.check_experts_implementation auto-selection, register_sonicmoe_experts idempotent registration to ALL_EXPERTS_FUNCTIONS, and LoRA materialization (shape/identity, scaling, PEFT einsum equivalence, gradient flow). Removes routing parity, weight-converter, and gradient-check test suites. Updates Gemma4 tests to validate experts-only model classification instead of routing functions. Removes unwrap_gate_lora coverage and mock LoRA-gate builder.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • winglian
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: refactor kernels patch to drop routing and inject into Expert' directly and clearly describes the main change: a refactoring of the kernels patch system that removes routing logic and integrates functionality into the Expert interface.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/refactor-kernels

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@NanoCode012 NanoCode012 force-pushed the fix/refactor-kernels branch from 0616ed2 to ca4ca95 Compare May 22, 2026 08:16
@NanoCode012 NanoCode012 marked this pull request as ready for review May 22, 2026 08:17
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

📖 Documentation Preview: https://6a15dfdeb7265aa320961597--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 1f3c5d2

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py (1)

143-146: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle one-sided PEFT wrapping here.

This branch runs when either projection is wrapped, but _unwrap_experts_lora() only returns LoRA tensors when gate_up_proj is the wrapper. A config that targets only down_proj will silently drop its LoRA delta from the ScatterMoE path.

Suggested fix
-    if not isinstance(getattr(experts, "gate_up_proj", None), ParamWrapper):
+    if not isinstance(getattr(experts, "gate_up_proj", None), ParamWrapper) and not isinstance(
+        getattr(experts, "down_proj", None), ParamWrapper
+    ):
         return experts, None, None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py` around
lines 143 - 146, The current branch detects PEFT via _has_peft_wrapper(self) but
always expects _unwrap_experts_lora(self) to return a gate_up_proj LoRA tensor,
which drops LoRA deltas when only down_proj is wrapped; update the logic in the
block that sets gup_lora and down_lora to handle one-sided PEFT wrappers by
calling _unwrap_experts_lora(self) and then checking which return positions are
None, assigning whichever of gate_up_proj (gup_lora) or down_proj (down_lora) is
present instead of assuming gate_up_proj is wrapped; ensure the ScatterMoE path
uses the non-None tensor(s) so a config that targets only down_proj is
preserved.
src/axolotl/integrations/expert_parallel/experts_fn.py (1)

69-77: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't register deep_ep_sonicmoe while it always raises.

_sonicmoe_local() is now an unconditional NotImplementedError, but the registry and name mapping still advertise deep_ep_sonicmoe as a selectable experts implementation. That turns an unsupported mode into a guaranteed runtime failure instead of rejecting it during setup.

Suggested fix
 _LOCAL_KERNELS = {
     "eager": _eager_local,
     "grouped_mm": _grouped_mm_local,
     "scattermoe": _scattermoe_local,
-    "sonicmoe": _sonicmoe_local,
 }
@@
 REGISTRY = {
     "deep_ep": deep_ep_experts_forward,
     "deep_ep_grouped_mm": deep_ep_grouped_mm_experts_forward,
     "deep_ep_scattermoe": deep_ep_scattermoe_experts_forward,
-    "deep_ep_sonicmoe": deep_ep_sonicmoe_experts_forward,
 }
@@
     return {
         "eager": "deep_ep",
         "grouped_mm": "deep_ep_grouped_mm",
         "scattermoe": "deep_ep_scattermoe",
-        "sonicmoe": "deep_ep_sonicmoe",
     }[kernel]

Also applies to: 228-238, 264-271

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/expert_parallel/experts_fn.py` around lines 69 - 77,
The registry advertises a selectable implementation
("sonicmoe"/"deep_ep_sonicmoe") while _sonicmoe_local() always raises, so remove
or conditionally register that entry to avoid runtime failures: delete the
"sonicmoe" / "deep_ep_sonicmoe" entries from _LOCAL_KERNELS and any
corresponding mapping or _GLOBAL_KERNELS/name mapping that references
deep_ep_sonicmoe (or only register it behind a feature flag), and ensure any
factory/path that selects by name returns a clear setup-time error if the mode
is requested but not implemented; use the function name _sonicmoe_local to
locate the implementation and update all registry dictionaries that reference
it.
tests/e2e/integrations/test_sonicmoe_lora.py (1)

55-85: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Keep the LoRA E2E path config-driven as well.

This helper forces SonicMoE on via code instead of the supported YAML/config path, so the LoRA suite won't catch breakage in the real feature-toggle flow. For an end-to-end test, I'd route activation through the same config surface users rely on instead of mutating _experts_implementation directly.

As per coding guidelines, "Config-driven architecture: all training features are toggled via YAML configuration, not code changes".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/integrations/test_sonicmoe_lora.py` around lines 55 - 85, The test
helper hardcodes SonicMoE by setting config._experts_implementation and calling
register_sonicmoe_experts inside _create_tiny_qwen3_config/_build_sonic_model;
change it to drive expert activation from the same config surface the app uses
(e.g., accept a test config parameter or read the YAML-driven flag) and only set
config._experts_implementation/register_sonicmoe_experts when that config flag
indicates sonicmoe is enabled. Update _create_tiny_qwen3_config to derive
_experts_implementation from the provided test configuration and update
_build_sonic_model to call register_sonicmoe_experts conditionally based on that
same config value so the E2E path matches the real feature-toggle flow.
tests/e2e/integrations/test_sonicmoe.py (1)

44-75: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Exercise the public YAML-driven toggle path here.

These helpers hardcode config._experts_implementation and manual expert registration, so this E2E suite no longer covers the supported config-driven flow that maps user YAML to kernel selection. A regression in validator/plugin wiring could still pass here while real training breaks.

As per coding guidelines, "Config-driven architecture: all training features are toggled via YAML configuration, not code changes".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/integrations/test_sonicmoe.py` around lines 44 - 75, The test
helpers currently bypass the public YAML-driven toggle by calling
register_sonicmoe_experts() and writing to the private
config._experts_implementation; change them to exercise the public config path
instead: stop calling register_sonicmoe_experts() and stop setting
config._experts_implementation, and instead set the public config key (e.g.,
config.experts_implementation) or load a tiny YAML snippet via the project's
config loader so the normal YAML->plugin wiring runs and selects the sonicmoe
kernel; keep using _create_tiny_qwen3_config and _build_model but ensure they
drive kernel selection through the public config/YAML path rather than manual
registration or private attributes.
🧹 Nitpick comments (2)
src/axolotl/integrations/kernels/constants.py (1)

1-2: ⚡ Quick win

Trim comments/docstrings to short WHY-focused notes only.

Line 1, Line 6, and Line 13 currently describe WHAT the code does; in this path, comments should be one short line and only capture non-obvious rationale.

♻️ Proposed cleanup
-"""Diagnostic helpers for MoE kernel integrations (kernel dispatch itself
-is architecture-agnostic via the ExpertsInterface)."""
+"""Helpers for experts-only MoE class resolution."""
@@
-# Models where MoE is embedded in the decoder layer (no separate SparseMoeBlock).
+# Keep explicit map for experts-only model layouts.
@@
-    """Resolve the Experts class for a known model type, or ``None``."""
+    """Resolve experts class; return None when model type is unmapped."""

As per coding guidelines src/axolotl/**/*.py: “Only add comments when explaining the WHY behind non-obvious logic… Do not comment on WHAT code does… Comments should be a maximum of one short line.”

Also applies to: 6-6, 13-13

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/kernels/constants.py` around lines 1 - 2, The
module-level docstring and inline comments are describing WHAT the code does;
replace the multi-line docstring and the comments at the other noted locations
with a single short WHY-focused line each (e.g., one-line rationale for the MoE
kernel diagnostic helpers) so they explain non-obvious rationale only and not
implementation details; update the top-level docstring and the comments around
any diagnostic helpers/constants in this file to be a concise one-line WHY note.
src/axolotl/integrations/kernels/args.py (1)

8-14: ⚡ Quick win

Reduce inline comments to short WHY-only notes.

Line 8-14 and Line 78-80 are explanatory WHAT-comments and exceed the one-line guidance for this path.

As per coding guidelines src/axolotl/**/*.py: “Only add comments when explaining the WHY behind non-obvious logic… Do not comment on WHAT code does… Comments should be a maximum of one short line.”

Also applies to: 78-80

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/kernels/args.py` around lines 8 - 14, The block
listing valid experts_implementation values is a WHAT-style multi-line comment;
replace it with a single short WHY-focused comment explaining why callers care
about valid expert implementations and move the detailed list into the module
docstring or README; do the same for the later multi-line WHAT comment around
lines 78-80 (locate the comment near any references to experts_implementation or
related kernel selection logic such as the variable/function that
reads/validates experts_implementation) — keep each in-file comment to one short
line that explains the rationale only.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/axolotl/integrations/kernels/args.py`:
- Line 57: The before-validator currently evaluates raw dict values directly
(e.g., data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1)
which mis-handles string inputs like "false" or "2" and can raise TypeError;
update the before-validator to explicitly coerce/parse these fields first (e.g.,
parse use_sonicmoe into a boolean using a safe string-to-bool helper and coerce
expert_parallel_size to int with a fallback) and then perform the comparison,
adjusting any helper or validator functions used by the validator; also replace
the multi-line comment blocks that document option values with short single-line
comments (split or rephrase the content around the validator and the option
docs) to comply with the one-line comment guideline.

In `@src/axolotl/integrations/kernels/plugin.py`:
- Around line 82-83: The branch that enables SonicMoE (cfg.use_sonicmoe) must
fail immediately when the host lacks GPU support: call
_check_sonicmoe_gpu_compat() and if it reports incompatibility (or returns a
falsy value) raise an explicit exception (e.g., RuntimeError/ValueError) or
return an error so plugin setup aborts rather than continuing; alternatively
modify _check_sonicmoe_gpu_compat() to raise on CPU-only hosts and ensure the
cfg.use_sonicmoe path relies on that behavior so the misconfiguration is
detected early instead of failing later in sonicmoe_experts_forward_with_lora.
- Around line 73-92: pre_model_load currently only registers experts if
cfg.use_scattermoe or cfg.use_sonicmoe are true, which breaks when the YAML sets
cfg.experts_implementation directly; update pre_model_load to also inspect
cfg.experts_implementation (values "scattermoe" and "sonicmoe") and call
register_scattermoe_experts() or register_sonicmoe_experts() as appropriate when
the corresponding boolean is false, preserve the existing ep_active check and
assignment to cfg.experts_implementation, and keep LOG.info messages unchanged
so the registry is initialized whenever either the legacy booleans or the
config-driven experts_implementation indicates scattermoe/sonicmoe.

---

Outside diff comments:
In `@src/axolotl/integrations/expert_parallel/experts_fn.py`:
- Around line 69-77: The registry advertises a selectable implementation
("sonicmoe"/"deep_ep_sonicmoe") while _sonicmoe_local() always raises, so remove
or conditionally register that entry to avoid runtime failures: delete the
"sonicmoe" / "deep_ep_sonicmoe" entries from _LOCAL_KERNELS and any
corresponding mapping or _GLOBAL_KERNELS/name mapping that references
deep_ep_sonicmoe (or only register it behind a feature flag), and ensure any
factory/path that selects by name returns a clear setup-time error if the mode
is requested but not implemented; use the function name _sonicmoe_local to
locate the implementation and update all registry dictionaries that reference
it.

In `@src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py`:
- Around line 143-146: The current branch detects PEFT via
_has_peft_wrapper(self) but always expects _unwrap_experts_lora(self) to return
a gate_up_proj LoRA tensor, which drops LoRA deltas when only down_proj is
wrapped; update the logic in the block that sets gup_lora and down_lora to
handle one-sided PEFT wrappers by calling _unwrap_experts_lora(self) and then
checking which return positions are None, assigning whichever of gate_up_proj
(gup_lora) or down_proj (down_lora) is present instead of assuming gate_up_proj
is wrapped; ensure the ScatterMoE path uses the non-None tensor(s) so a config
that targets only down_proj is preserved.

In `@tests/e2e/integrations/test_sonicmoe_lora.py`:
- Around line 55-85: The test helper hardcodes SonicMoE by setting
config._experts_implementation and calling register_sonicmoe_experts inside
_create_tiny_qwen3_config/_build_sonic_model; change it to drive expert
activation from the same config surface the app uses (e.g., accept a test config
parameter or read the YAML-driven flag) and only set
config._experts_implementation/register_sonicmoe_experts when that config flag
indicates sonicmoe is enabled. Update _create_tiny_qwen3_config to derive
_experts_implementation from the provided test configuration and update
_build_sonic_model to call register_sonicmoe_experts conditionally based on that
same config value so the E2E path matches the real feature-toggle flow.

In `@tests/e2e/integrations/test_sonicmoe.py`:
- Around line 44-75: The test helpers currently bypass the public YAML-driven
toggle by calling register_sonicmoe_experts() and writing to the private
config._experts_implementation; change them to exercise the public config path
instead: stop calling register_sonicmoe_experts() and stop setting
config._experts_implementation, and instead set the public config key (e.g.,
config.experts_implementation) or load a tiny YAML snippet via the project's
config loader so the normal YAML->plugin wiring runs and selects the sonicmoe
kernel; keep using _create_tiny_qwen3_config and _build_model but ensure they
drive kernel selection through the public config/YAML path rather than manual
registration or private attributes.

---

Nitpick comments:
In `@src/axolotl/integrations/kernels/args.py`:
- Around line 8-14: The block listing valid experts_implementation values is a
WHAT-style multi-line comment; replace it with a single short WHY-focused
comment explaining why callers care about valid expert implementations and move
the detailed list into the module docstring or README; do the same for the later
multi-line WHAT comment around lines 78-80 (locate the comment near any
references to experts_implementation or related kernel selection logic such as
the variable/function that reads/validates experts_implementation) — keep each
in-file comment to one short line that explains the rationale only.

In `@src/axolotl/integrations/kernels/constants.py`:
- Around line 1-2: The module-level docstring and inline comments are describing
WHAT the code does; replace the multi-line docstring and the comments at the
other noted locations with a single short WHY-focused line each (e.g., one-line
rationale for the MoE kernel diagnostic helpers) so they explain non-obvious
rationale only and not implementation details; update the top-level docstring
and the comments around any diagnostic helpers/constants in this file to be a
concise one-line WHY note.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ca3e483-6b63-4a23-b116-761fa267edea

📥 Commits

Reviewing files that changed from the base of the PR and between cc25d3e and ca4ca95.

📒 Files selected for processing (21)
  • docs/optimizations.qmd
  • src/axolotl/integrations/expert_parallel/experts_fn.py
  • src/axolotl/integrations/kernels/README.md
  • src/axolotl/integrations/kernels/args.py
  • src/axolotl/integrations/kernels/constants.py
  • src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/__init__.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/experts.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/gemma4_experts.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/lora.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/patch.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/routing.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/weight_converter.py
  • src/axolotl/integrations/kernels/plugin.py
  • tests/e2e/integrations/test_sonicmoe.py
  • tests/e2e/integrations/test_sonicmoe_lora.py
  • tests/integrations/test_gemma4_moe.py
  • tests/integrations/test_routing_parity.py
  • tests/integrations/test_sonicmoe.py
  • tests/integrations/test_sonicmoe_gradients.py
  • tests/integrations/test_sonicmoe_lora.py
💤 Files with no reviewable changes (7)
  • tests/integrations/test_routing_parity.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/weight_converter.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/patch.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/gemma4_experts.py
  • src/axolotl/integrations/kernels/libs/sonicmoe/routing.py
  • tests/integrations/test_sonicmoe_gradients.py
  • tests/integrations/test_sonicmoe_lora.py

@classmethod
def check_sonicmoe_ep_unsupported(cls, data):
"""SonicMoE + EP is not yet implemented (EP `_sonicmoe_local` raises)."""
if data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n 'bool\(data\.get\("use_scattermoe"\)\)|bool\(data\.get\("use_sonicmoe"\)\)|\(data\.get\("expert_parallel_size"\) or 1\) > 1' src/axolotl/integrations/kernels/args.py

python - <<'PY'
print("bool('false') =", bool("false"))
print("bool('0') =", bool("0"))
for v in ["2", 2, None]:
    try:
        print(f"{v!r}: {(v or 1) > 1}")
    except Exception as e:
        print(f"{v!r}: {type(e).__name__} -> {e}")
PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 405


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/axolotl/integrations/kernels/args.py"

# Show the relevant surrounding code and any long comment blocks near the referenced line ranges
python - <<'PY'
import itertools, pathlib
path = pathlib.Path("src/axolotl/integrations/kernels/args.py")
lines = path.read_text().splitlines()
for start, end in [(1, 35), (45, 90), (70, 95), (75, 110)]:
    print(f"\n--- {start}-{end} ---")
    for i in range(start, min(end, len(lines)) + 1):
        print(f"{i:4d}: {lines[i-1]}")
PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 6985


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n 'use_(sonicmoe|scattermoe)\s*:\s*' . -g'*.yml' -g'*.yaml' -g'*.json' || true
rg -n 'expert_parallel_size\s*:\s*' . -g'*.yml' -g'*.yaml' || true
rg -n 'model_validator\(mode="before"\)|mode="before"' src/axolotl/integrations/kernels/args.py src/axolotl -S || true
rg -n 'str\(.+\)\.lower\(\).*(true|false)|parse.*bool|coerce.*bool|in \("true"' src/axolotl || true

Repository: axolotl-ai-cloud/axolotl

Length of output: 10349


Avoid pre-coercion truthiness/type issues in before-validators

  • src/axolotl/integrations/kernels/args.py line 57 and lines 69-70 can mis-handle raw string inputs: bool("false") evaluates to True, and ("2" or 1) > 1 raises TypeError (str vs int) before coercion.
  • Multi-line comment blocks at lines 8-14 and 78-80 violate the “comments should be max one short line” guideline (they document option values / behavior rather than “WHY”).
🐛 Proposed fix
@@
     def check_sonicmoe_ep_unsupported(cls, data):
         """SonicMoE + EP is not yet implemented (EP `_sonicmoe_local` raises)."""
-        if data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1:
+        ep_size_raw = data.get("expert_parallel_size")
+        try:
+            ep_size = int(ep_size_raw) if ep_size_raw is not None else 1
+        except (TypeError, ValueError):
+            ep_size = 1
+        if data.get("use_sonicmoe") is True and ep_size > 1:
             raise ValueError(
                 "use_sonicmoe=true is not supported with expert_parallel_size > 1. "
                 "Use use_scattermoe=true under EP, or set expert_parallel_size=1."
             )
         return data
@@
-        use_scattermoe = bool(data.get("use_scattermoe"))
-        use_sonicmoe = bool(data.get("use_sonicmoe"))
+        use_scattermoe = data.get("use_scattermoe") is True
+        use_sonicmoe = data.get("use_sonicmoe") is True
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/kernels/args.py` at line 57, The before-validator
currently evaluates raw dict values directly (e.g., data.get("use_sonicmoe") and
(data.get("expert_parallel_size") or 1) > 1) which mis-handles string inputs
like "false" or "2" and can raise TypeError; update the before-validator to
explicitly coerce/parse these fields first (e.g., parse use_sonicmoe into a
boolean using a safe string-to-bool helper and coerce expert_parallel_size to
int with a fallback) and then perform the comparison, adjusting any helper or
validator functions used by the validator; also replace the multi-line comment
blocks that document option values with short single-line comments (split or
rephrase the content around the validator and the option docs) to comply with
the one-line comment guideline.

Comment on lines 73 to +92
if cfg.use_scattermoe:
self._register_kernels()
if is_experts_only_model(moe_model_type):
# Models like Gemma4 where MoE is embedded in the decoder layer
# — register ScatterMoE in the ExpertsInterface so that
# @use_experts_implementation dispatches to it.
self._register_experts_interface()
if not ep_active:
cfg.experts_implementation = "scattermoe"
elif ep_active:
LOG.info(
"expert_parallel_size > 1: skipping SparseMoeBlock-level "
"ScatterMoE patch; the deep_ep_scattermoe registered "
"function handles the kernel under EP."
)
else:
self._kernelize_model(moe_model_type)
elif cfg.use_sonicmoe:
if not importlib.util.find_spec("sonicmoe"):
raise RuntimeError(
"SonicMoE is not installed. See installation instructions at "
"https://github.com/axolotl-ai-cloud/axolotl/blob/main/src/axolotl/integrations/kernels/README.md#sonicmoe-installation"
)
from axolotl.integrations.kernels.libs.scattermoe_lora.experts import (
register_scattermoe_experts,
)

register_scattermoe_experts()
if not ep_active:
cfg.experts_implementation = "scattermoe"
LOG.info("Registered 'scattermoe' in transformers ExpertsInterface")
elif cfg.use_sonicmoe:
_check_sonicmoe_gpu_compat()

if is_experts_only_model(moe_model_type):
from axolotl.integrations.kernels.libs.sonicmoe.gemma4_experts import (
patch_gemma4_sonicmoe,
)

LOG.info(
f"Applying SonicMoE experts-level patch for model type: {moe_model_type}"
)
patch_gemma4_sonicmoe()
# TODO(EP+SonicMoE): grad norms explode during training. Re-enable
# once the root cause is identified. Same shape as the ScatterMoE
# branch above, but SonicMoE additionally needs the gate_up_proj
# interleave converter since its w1 layout is [g0, u0, g1, u1, ...]
# while the checkpoint stores it concatenated [gate..., up...].
#
# elif ep_active:
# from axolotl.integrations.kernels.libs.sonicmoe.weight_converter import (
# register_sonicmoe_weight_converter,
# )
#
# LOG.info(
# "expert_parallel_size > 1: skipping SparseMoeBlock-level "
# "SonicMoE patch; the deep_ep_sonicmoe registered function "
# "handles the kernel under EP. Registering gate_up_proj "
# "interleave converter."
# )
# register_sonicmoe_weight_converter(moe_model_type)
else:
from axolotl.integrations.kernels.libs.sonicmoe import patch_sonicmoe

LOG.info(f"Applying SonicMoE patches for model type: {moe_model_type}")
patch_sonicmoe(
moe_model_type,
torch_compile=bool(getattr(cfg, "torch_compile", False)),
base_model_type=cfg.model_config_type,
)

def _register_kernels(self):
from kernels import (
LocalLayerRepository,
Mode,
register_kernel_mapping,
)
from axolotl.integrations.kernels.libs.sonicmoe.experts import (
register_sonicmoe_experts,
)

plugin_root = Path(__file__).parent
register_kernel_mapping(
{
"HFScatterMoEParallelExperts": {
"cuda": {
Mode.TRAINING: LocalLayerRepository(
repo_path=plugin_root / "libs" / "scattermoe_lora",
package_name="scattermoe_lora",
layer_name="HFScatterMoEGatedMLP",
),
Mode.INFERENCE: LocalLayerRepository(
repo_path=plugin_root / "libs" / "scattermoe_lora",
package_name="scattermoe_lora",
layer_name="HFScatterMoEGatedMLP",
),
},
}
}
)
register_sonicmoe_experts()
if not ep_active:
cfg.experts_implementation = "sonicmoe"
LOG.info("Registered 'sonicmoe' in transformers ExpertsInterface")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor cfg.experts_implementation when registering kernels.

pre_model_load() only installs the handler when the legacy use_scattermoe / use_sonicmoe booleans are set. In this stack the validator layer also accepts YAML that selects experts_implementation directly, so experts_implementation: scattermoe or sonicmoe without the matching boolean leaves the registry uninitialized and fails later at lookup time.

Suggested fix
+        requested_impl = getattr(cfg, "experts_implementation", None)
-        if cfg.use_scattermoe:
+        if cfg.use_scattermoe or requested_impl == "scattermoe":
             from axolotl.integrations.kernels.libs.scattermoe_lora.experts import (
                 register_scattermoe_experts,
             )

             register_scattermoe_experts()
             if not ep_active:
                 cfg.experts_implementation = "scattermoe"
             LOG.info("Registered 'scattermoe' in transformers ExpertsInterface")
-        elif cfg.use_sonicmoe:
+        elif cfg.use_sonicmoe or requested_impl == "sonicmoe":
             _check_sonicmoe_gpu_compat()

As per coding guidelines, **/*.py: Config-driven architecture: all training features are toggled via YAML configuration, not code changes.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if cfg.use_scattermoe:
self._register_kernels()
if is_experts_only_model(moe_model_type):
# Models like Gemma4 where MoE is embedded in the decoder layer
# — register ScatterMoE in the ExpertsInterface so that
# @use_experts_implementation dispatches to it.
self._register_experts_interface()
if not ep_active:
cfg.experts_implementation = "scattermoe"
elif ep_active:
LOG.info(
"expert_parallel_size > 1: skipping SparseMoeBlock-level "
"ScatterMoE patch; the deep_ep_scattermoe registered "
"function handles the kernel under EP."
)
else:
self._kernelize_model(moe_model_type)
elif cfg.use_sonicmoe:
if not importlib.util.find_spec("sonicmoe"):
raise RuntimeError(
"SonicMoE is not installed. See installation instructions at "
"https://github.com/axolotl-ai-cloud/axolotl/blob/main/src/axolotl/integrations/kernels/README.md#sonicmoe-installation"
)
from axolotl.integrations.kernels.libs.scattermoe_lora.experts import (
register_scattermoe_experts,
)
register_scattermoe_experts()
if not ep_active:
cfg.experts_implementation = "scattermoe"
LOG.info("Registered 'scattermoe' in transformers ExpertsInterface")
elif cfg.use_sonicmoe:
_check_sonicmoe_gpu_compat()
if is_experts_only_model(moe_model_type):
from axolotl.integrations.kernels.libs.sonicmoe.gemma4_experts import (
patch_gemma4_sonicmoe,
)
LOG.info(
f"Applying SonicMoE experts-level patch for model type: {moe_model_type}"
)
patch_gemma4_sonicmoe()
# TODO(EP+SonicMoE): grad norms explode during training. Re-enable
# once the root cause is identified. Same shape as the ScatterMoE
# branch above, but SonicMoE additionally needs the gate_up_proj
# interleave converter since its w1 layout is [g0, u0, g1, u1, ...]
# while the checkpoint stores it concatenated [gate..., up...].
#
# elif ep_active:
# from axolotl.integrations.kernels.libs.sonicmoe.weight_converter import (
# register_sonicmoe_weight_converter,
# )
#
# LOG.info(
# "expert_parallel_size > 1: skipping SparseMoeBlock-level "
# "SonicMoE patch; the deep_ep_sonicmoe registered function "
# "handles the kernel under EP. Registering gate_up_proj "
# "interleave converter."
# )
# register_sonicmoe_weight_converter(moe_model_type)
else:
from axolotl.integrations.kernels.libs.sonicmoe import patch_sonicmoe
LOG.info(f"Applying SonicMoE patches for model type: {moe_model_type}")
patch_sonicmoe(
moe_model_type,
torch_compile=bool(getattr(cfg, "torch_compile", False)),
base_model_type=cfg.model_config_type,
)
def _register_kernels(self):
from kernels import (
LocalLayerRepository,
Mode,
register_kernel_mapping,
)
from axolotl.integrations.kernels.libs.sonicmoe.experts import (
register_sonicmoe_experts,
)
plugin_root = Path(__file__).parent
register_kernel_mapping(
{
"HFScatterMoEParallelExperts": {
"cuda": {
Mode.TRAINING: LocalLayerRepository(
repo_path=plugin_root / "libs" / "scattermoe_lora",
package_name="scattermoe_lora",
layer_name="HFScatterMoEGatedMLP",
),
Mode.INFERENCE: LocalLayerRepository(
repo_path=plugin_root / "libs" / "scattermoe_lora",
package_name="scattermoe_lora",
layer_name="HFScatterMoEGatedMLP",
),
},
}
}
)
register_sonicmoe_experts()
if not ep_active:
cfg.experts_implementation = "sonicmoe"
LOG.info("Registered 'sonicmoe' in transformers ExpertsInterface")
requested_impl = getattr(cfg, "experts_implementation", None)
if cfg.use_scattermoe or requested_impl == "scattermoe":
from axolotl.integrations.kernels.libs.scattermoe_lora.experts import (
register_scattermoe_experts,
)
register_scattermoe_experts()
if not ep_active:
cfg.experts_implementation = "scattermoe"
LOG.info("Registered 'scattermoe' in transformers ExpertsInterface")
elif cfg.use_sonicmoe or requested_impl == "sonicmoe":
_check_sonicmoe_gpu_compat()
from axolotl.integrations.kernels.libs.sonicmoe.experts import (
register_sonicmoe_experts,
)
register_sonicmoe_experts()
if not ep_active:
cfg.experts_implementation = "sonicmoe"
LOG.info("Registered 'sonicmoe' in transformers ExpertsInterface")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/kernels/plugin.py` around lines 73 - 92,
pre_model_load currently only registers experts if cfg.use_scattermoe or
cfg.use_sonicmoe are true, which breaks when the YAML sets
cfg.experts_implementation directly; update pre_model_load to also inspect
cfg.experts_implementation (values "scattermoe" and "sonicmoe") and call
register_scattermoe_experts() or register_sonicmoe_experts() as appropriate when
the corresponding boolean is false, preserve the existing ep_active check and
assignment to cfg.experts_implementation, and keep LOG.info messages unchanged
so the registry is initialized whenever either the legacy booleans or the
config-driven experts_implementation indicates scattermoe/sonicmoe.

Comment on lines +82 to 83
elif cfg.use_sonicmoe:
_check_sonicmoe_gpu_compat()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast for SonicMoE on CPU-only hosts.

This branch treats _check_sonicmoe_gpu_compat() as the capability gate, but that helper currently returns when CUDA is unavailable. use_sonicmoe therefore survives plugin setup and only dies much later in sonicmoe_experts_forward_with_lora, which makes the unsupported config harder to diagnose.

Suggested fix
-    if not torch.cuda.is_available():
-        return
+    if not torch.cuda.is_available():
+        raise RuntimeError("SonicMoE requires a CUDA device.")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/integrations/kernels/plugin.py` around lines 82 - 83, The branch
that enables SonicMoE (cfg.use_sonicmoe) must fail immediately when the host
lacks GPU support: call _check_sonicmoe_gpu_compat() and if it reports
incompatibility (or returns a falsy value) raise an explicit exception (e.g.,
RuntimeError/ValueError) or return an error so plugin setup aborts rather than
continuing; alternatively modify _check_sonicmoe_gpu_compat() to raise on
CPU-only hosts and ensure the cfg.use_sonicmoe path relies on that behavior so
the misconfiguration is detected early instead of failing later in
sonicmoe_experts_forward_with_lora.

@NanoCode012 NanoCode012 force-pushed the fix/refactor-kernels branch from ca4ca95 to 8e3ed29 Compare May 26, 2026 08:58
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

@winglian winglian added the scheduled_release This PR is slated for the upcoming release label May 26, 2026
@winglian winglian force-pushed the fix/refactor-kernels branch from 8e3ed29 to 8ee58b1 Compare May 26, 2026 16:41
@winglian winglian merged commit 3f478db into main May 28, 2026
31 of 32 checks passed
@winglian winglian deleted the fix/refactor-kernels branch May 28, 2026 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scheduled_release This PR is slated for the upcoming release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants