fix: refactor kernels patch to drop routing and inject into Expert#3651
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR refactors the SonicMoE and ScatterMoE expert kernel integration from model-specific monkey-patching to Transformers' ChangesMoE Expert Kernel Registration and LoRA Support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
0616ed2 to
ca4ca95
Compare
|
📖 Documentation Preview: https://6a15dfdeb7265aa320961597--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit 1f3c5d2 |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py (1)
143-146:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle one-sided PEFT wrapping here.
This branch runs when either projection is wrapped, but
_unwrap_experts_lora()only returns LoRA tensors whengate_up_projis the wrapper. A config that targets onlydown_projwill silently drop its LoRA delta from the ScatterMoE path.Suggested fix
- if not isinstance(getattr(experts, "gate_up_proj", None), ParamWrapper): + if not isinstance(getattr(experts, "gate_up_proj", None), ParamWrapper) and not isinstance( + getattr(experts, "down_proj", None), ParamWrapper + ): return experts, None, None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py` around lines 143 - 146, The current branch detects PEFT via _has_peft_wrapper(self) but always expects _unwrap_experts_lora(self) to return a gate_up_proj LoRA tensor, which drops LoRA deltas when only down_proj is wrapped; update the logic in the block that sets gup_lora and down_lora to handle one-sided PEFT wrappers by calling _unwrap_experts_lora(self) and then checking which return positions are None, assigning whichever of gate_up_proj (gup_lora) or down_proj (down_lora) is present instead of assuming gate_up_proj is wrapped; ensure the ScatterMoE path uses the non-None tensor(s) so a config that targets only down_proj is preserved.src/axolotl/integrations/expert_parallel/experts_fn.py (1)
69-77:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't register
deep_ep_sonicmoewhile it always raises.
_sonicmoe_local()is now an unconditionalNotImplementedError, but the registry and name mapping still advertisedeep_ep_sonicmoeas a selectable experts implementation. That turns an unsupported mode into a guaranteed runtime failure instead of rejecting it during setup.Suggested fix
_LOCAL_KERNELS = { "eager": _eager_local, "grouped_mm": _grouped_mm_local, "scattermoe": _scattermoe_local, - "sonicmoe": _sonicmoe_local, } @@ REGISTRY = { "deep_ep": deep_ep_experts_forward, "deep_ep_grouped_mm": deep_ep_grouped_mm_experts_forward, "deep_ep_scattermoe": deep_ep_scattermoe_experts_forward, - "deep_ep_sonicmoe": deep_ep_sonicmoe_experts_forward, } @@ return { "eager": "deep_ep", "grouped_mm": "deep_ep_grouped_mm", "scattermoe": "deep_ep_scattermoe", - "sonicmoe": "deep_ep_sonicmoe", }[kernel]Also applies to: 228-238, 264-271
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/axolotl/integrations/expert_parallel/experts_fn.py` around lines 69 - 77, The registry advertises a selectable implementation ("sonicmoe"/"deep_ep_sonicmoe") while _sonicmoe_local() always raises, so remove or conditionally register that entry to avoid runtime failures: delete the "sonicmoe" / "deep_ep_sonicmoe" entries from _LOCAL_KERNELS and any corresponding mapping or _GLOBAL_KERNELS/name mapping that references deep_ep_sonicmoe (or only register it behind a feature flag), and ensure any factory/path that selects by name returns a clear setup-time error if the mode is requested but not implemented; use the function name _sonicmoe_local to locate the implementation and update all registry dictionaries that reference it.tests/e2e/integrations/test_sonicmoe_lora.py (1)
55-85: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftKeep the LoRA E2E path config-driven as well.
This helper forces SonicMoE on via code instead of the supported YAML/config path, so the LoRA suite won't catch breakage in the real feature-toggle flow. For an end-to-end test, I'd route activation through the same config surface users rely on instead of mutating
_experts_implementationdirectly.As per coding guidelines, "Config-driven architecture: all training features are toggled via YAML configuration, not code changes".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/integrations/test_sonicmoe_lora.py` around lines 55 - 85, The test helper hardcodes SonicMoE by setting config._experts_implementation and calling register_sonicmoe_experts inside _create_tiny_qwen3_config/_build_sonic_model; change it to drive expert activation from the same config surface the app uses (e.g., accept a test config parameter or read the YAML-driven flag) and only set config._experts_implementation/register_sonicmoe_experts when that config flag indicates sonicmoe is enabled. Update _create_tiny_qwen3_config to derive _experts_implementation from the provided test configuration and update _build_sonic_model to call register_sonicmoe_experts conditionally based on that same config value so the E2E path matches the real feature-toggle flow.tests/e2e/integrations/test_sonicmoe.py (1)
44-75: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftExercise the public YAML-driven toggle path here.
These helpers hardcode
config._experts_implementationand manual expert registration, so this E2E suite no longer covers the supported config-driven flow that maps user YAML to kernel selection. A regression in validator/plugin wiring could still pass here while real training breaks.As per coding guidelines, "Config-driven architecture: all training features are toggled via YAML configuration, not code changes".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/integrations/test_sonicmoe.py` around lines 44 - 75, The test helpers currently bypass the public YAML-driven toggle by calling register_sonicmoe_experts() and writing to the private config._experts_implementation; change them to exercise the public config path instead: stop calling register_sonicmoe_experts() and stop setting config._experts_implementation, and instead set the public config key (e.g., config.experts_implementation) or load a tiny YAML snippet via the project's config loader so the normal YAML->plugin wiring runs and selects the sonicmoe kernel; keep using _create_tiny_qwen3_config and _build_model but ensure they drive kernel selection through the public config/YAML path rather than manual registration or private attributes.
🧹 Nitpick comments (2)
src/axolotl/integrations/kernels/constants.py (1)
1-2: ⚡ Quick winTrim comments/docstrings to short WHY-focused notes only.
Line 1, Line 6, and Line 13 currently describe WHAT the code does; in this path, comments should be one short line and only capture non-obvious rationale.
♻️ Proposed cleanup
-"""Diagnostic helpers for MoE kernel integrations (kernel dispatch itself -is architecture-agnostic via the ExpertsInterface).""" +"""Helpers for experts-only MoE class resolution.""" @@ -# Models where MoE is embedded in the decoder layer (no separate SparseMoeBlock). +# Keep explicit map for experts-only model layouts. @@ - """Resolve the Experts class for a known model type, or ``None``.""" + """Resolve experts class; return None when model type is unmapped."""As per coding guidelines
src/axolotl/**/*.py: “Only add comments when explaining the WHY behind non-obvious logic… Do not comment on WHAT code does… Comments should be a maximum of one short line.”Also applies to: 6-6, 13-13
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/axolotl/integrations/kernels/constants.py` around lines 1 - 2, The module-level docstring and inline comments are describing WHAT the code does; replace the multi-line docstring and the comments at the other noted locations with a single short WHY-focused line each (e.g., one-line rationale for the MoE kernel diagnostic helpers) so they explain non-obvious rationale only and not implementation details; update the top-level docstring and the comments around any diagnostic helpers/constants in this file to be a concise one-line WHY note.src/axolotl/integrations/kernels/args.py (1)
8-14: ⚡ Quick winReduce inline comments to short WHY-only notes.
Line 8-14 and Line 78-80 are explanatory WHAT-comments and exceed the one-line guidance for this path.
As per coding guidelines
src/axolotl/**/*.py: “Only add comments when explaining the WHY behind non-obvious logic… Do not comment on WHAT code does… Comments should be a maximum of one short line.”Also applies to: 78-80
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/axolotl/integrations/kernels/args.py` around lines 8 - 14, The block listing valid experts_implementation values is a WHAT-style multi-line comment; replace it with a single short WHY-focused comment explaining why callers care about valid expert implementations and move the detailed list into the module docstring or README; do the same for the later multi-line WHAT comment around lines 78-80 (locate the comment near any references to experts_implementation or related kernel selection logic such as the variable/function that reads/validates experts_implementation) — keep each in-file comment to one short line that explains the rationale only.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/axolotl/integrations/kernels/args.py`:
- Line 57: The before-validator currently evaluates raw dict values directly
(e.g., data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1)
which mis-handles string inputs like "false" or "2" and can raise TypeError;
update the before-validator to explicitly coerce/parse these fields first (e.g.,
parse use_sonicmoe into a boolean using a safe string-to-bool helper and coerce
expert_parallel_size to int with a fallback) and then perform the comparison,
adjusting any helper or validator functions used by the validator; also replace
the multi-line comment blocks that document option values with short single-line
comments (split or rephrase the content around the validator and the option
docs) to comply with the one-line comment guideline.
In `@src/axolotl/integrations/kernels/plugin.py`:
- Around line 82-83: The branch that enables SonicMoE (cfg.use_sonicmoe) must
fail immediately when the host lacks GPU support: call
_check_sonicmoe_gpu_compat() and if it reports incompatibility (or returns a
falsy value) raise an explicit exception (e.g., RuntimeError/ValueError) or
return an error so plugin setup aborts rather than continuing; alternatively
modify _check_sonicmoe_gpu_compat() to raise on CPU-only hosts and ensure the
cfg.use_sonicmoe path relies on that behavior so the misconfiguration is
detected early instead of failing later in sonicmoe_experts_forward_with_lora.
- Around line 73-92: pre_model_load currently only registers experts if
cfg.use_scattermoe or cfg.use_sonicmoe are true, which breaks when the YAML sets
cfg.experts_implementation directly; update pre_model_load to also inspect
cfg.experts_implementation (values "scattermoe" and "sonicmoe") and call
register_scattermoe_experts() or register_sonicmoe_experts() as appropriate when
the corresponding boolean is false, preserve the existing ep_active check and
assignment to cfg.experts_implementation, and keep LOG.info messages unchanged
so the registry is initialized whenever either the legacy booleans or the
config-driven experts_implementation indicates scattermoe/sonicmoe.
---
Outside diff comments:
In `@src/axolotl/integrations/expert_parallel/experts_fn.py`:
- Around line 69-77: The registry advertises a selectable implementation
("sonicmoe"/"deep_ep_sonicmoe") while _sonicmoe_local() always raises, so remove
or conditionally register that entry to avoid runtime failures: delete the
"sonicmoe" / "deep_ep_sonicmoe" entries from _LOCAL_KERNELS and any
corresponding mapping or _GLOBAL_KERNELS/name mapping that references
deep_ep_sonicmoe (or only register it behind a feature flag), and ensure any
factory/path that selects by name returns a clear setup-time error if the mode
is requested but not implemented; use the function name _sonicmoe_local to
locate the implementation and update all registry dictionaries that reference
it.
In `@src/axolotl/integrations/kernels/libs/scattermoe_lora/experts.py`:
- Around line 143-146: The current branch detects PEFT via
_has_peft_wrapper(self) but always expects _unwrap_experts_lora(self) to return
a gate_up_proj LoRA tensor, which drops LoRA deltas when only down_proj is
wrapped; update the logic in the block that sets gup_lora and down_lora to
handle one-sided PEFT wrappers by calling _unwrap_experts_lora(self) and then
checking which return positions are None, assigning whichever of gate_up_proj
(gup_lora) or down_proj (down_lora) is present instead of assuming gate_up_proj
is wrapped; ensure the ScatterMoE path uses the non-None tensor(s) so a config
that targets only down_proj is preserved.
In `@tests/e2e/integrations/test_sonicmoe_lora.py`:
- Around line 55-85: The test helper hardcodes SonicMoE by setting
config._experts_implementation and calling register_sonicmoe_experts inside
_create_tiny_qwen3_config/_build_sonic_model; change it to drive expert
activation from the same config surface the app uses (e.g., accept a test config
parameter or read the YAML-driven flag) and only set
config._experts_implementation/register_sonicmoe_experts when that config flag
indicates sonicmoe is enabled. Update _create_tiny_qwen3_config to derive
_experts_implementation from the provided test configuration and update
_build_sonic_model to call register_sonicmoe_experts conditionally based on that
same config value so the E2E path matches the real feature-toggle flow.
In `@tests/e2e/integrations/test_sonicmoe.py`:
- Around line 44-75: The test helpers currently bypass the public YAML-driven
toggle by calling register_sonicmoe_experts() and writing to the private
config._experts_implementation; change them to exercise the public config path
instead: stop calling register_sonicmoe_experts() and stop setting
config._experts_implementation, and instead set the public config key (e.g.,
config.experts_implementation) or load a tiny YAML snippet via the project's
config loader so the normal YAML->plugin wiring runs and selects the sonicmoe
kernel; keep using _create_tiny_qwen3_config and _build_model but ensure they
drive kernel selection through the public config/YAML path rather than manual
registration or private attributes.
---
Nitpick comments:
In `@src/axolotl/integrations/kernels/args.py`:
- Around line 8-14: The block listing valid experts_implementation values is a
WHAT-style multi-line comment; replace it with a single short WHY-focused
comment explaining why callers care about valid expert implementations and move
the detailed list into the module docstring or README; do the same for the later
multi-line WHAT comment around lines 78-80 (locate the comment near any
references to experts_implementation or related kernel selection logic such as
the variable/function that reads/validates experts_implementation) — keep each
in-file comment to one short line that explains the rationale only.
In `@src/axolotl/integrations/kernels/constants.py`:
- Around line 1-2: The module-level docstring and inline comments are describing
WHAT the code does; replace the multi-line docstring and the comments at the
other noted locations with a single short WHY-focused line each (e.g., one-line
rationale for the MoE kernel diagnostic helpers) so they explain non-obvious
rationale only and not implementation details; update the top-level docstring
and the comments around any diagnostic helpers/constants in this file to be a
concise one-line WHY note.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8ca3e483-6b63-4a23-b116-761fa267edea
📒 Files selected for processing (21)
docs/optimizations.qmdsrc/axolotl/integrations/expert_parallel/experts_fn.pysrc/axolotl/integrations/kernels/README.mdsrc/axolotl/integrations/kernels/args.pysrc/axolotl/integrations/kernels/constants.pysrc/axolotl/integrations/kernels/libs/scattermoe_lora/experts.pysrc/axolotl/integrations/kernels/libs/sonicmoe/__init__.pysrc/axolotl/integrations/kernels/libs/sonicmoe/experts.pysrc/axolotl/integrations/kernels/libs/sonicmoe/gemma4_experts.pysrc/axolotl/integrations/kernels/libs/sonicmoe/lora.pysrc/axolotl/integrations/kernels/libs/sonicmoe/patch.pysrc/axolotl/integrations/kernels/libs/sonicmoe/routing.pysrc/axolotl/integrations/kernels/libs/sonicmoe/weight_converter.pysrc/axolotl/integrations/kernels/plugin.pytests/e2e/integrations/test_sonicmoe.pytests/e2e/integrations/test_sonicmoe_lora.pytests/integrations/test_gemma4_moe.pytests/integrations/test_routing_parity.pytests/integrations/test_sonicmoe.pytests/integrations/test_sonicmoe_gradients.pytests/integrations/test_sonicmoe_lora.py
💤 Files with no reviewable changes (7)
- tests/integrations/test_routing_parity.py
- src/axolotl/integrations/kernels/libs/sonicmoe/weight_converter.py
- src/axolotl/integrations/kernels/libs/sonicmoe/patch.py
- src/axolotl/integrations/kernels/libs/sonicmoe/gemma4_experts.py
- src/axolotl/integrations/kernels/libs/sonicmoe/routing.py
- tests/integrations/test_sonicmoe_gradients.py
- tests/integrations/test_sonicmoe_lora.py
| @classmethod | ||
| def check_sonicmoe_ep_unsupported(cls, data): | ||
| """SonicMoE + EP is not yet implemented (EP `_sonicmoe_local` raises).""" | ||
| if data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n 'bool\(data\.get\("use_scattermoe"\)\)|bool\(data\.get\("use_sonicmoe"\)\)|\(data\.get\("expert_parallel_size"\) or 1\) > 1' src/axolotl/integrations/kernels/args.py
python - <<'PY'
print("bool('false') =", bool("false"))
print("bool('0') =", bool("0"))
for v in ["2", 2, None]:
try:
print(f"{v!r}: {(v or 1) > 1}")
except Exception as e:
print(f"{v!r}: {type(e).__name__} -> {e}")
PYRepository: axolotl-ai-cloud/axolotl
Length of output: 405
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/axolotl/integrations/kernels/args.py"
# Show the relevant surrounding code and any long comment blocks near the referenced line ranges
python - <<'PY'
import itertools, pathlib
path = pathlib.Path("src/axolotl/integrations/kernels/args.py")
lines = path.read_text().splitlines()
for start, end in [(1, 35), (45, 90), (70, 95), (75, 110)]:
print(f"\n--- {start}-{end} ---")
for i in range(start, min(end, len(lines)) + 1):
print(f"{i:4d}: {lines[i-1]}")
PYRepository: axolotl-ai-cloud/axolotl
Length of output: 6985
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n 'use_(sonicmoe|scattermoe)\s*:\s*' . -g'*.yml' -g'*.yaml' -g'*.json' || true
rg -n 'expert_parallel_size\s*:\s*' . -g'*.yml' -g'*.yaml' || true
rg -n 'model_validator\(mode="before"\)|mode="before"' src/axolotl/integrations/kernels/args.py src/axolotl -S || true
rg -n 'str\(.+\)\.lower\(\).*(true|false)|parse.*bool|coerce.*bool|in \("true"' src/axolotl || trueRepository: axolotl-ai-cloud/axolotl
Length of output: 10349
Avoid pre-coercion truthiness/type issues in before-validators
src/axolotl/integrations/kernels/args.pyline 57 and lines 69-70 can mis-handle raw string inputs:bool("false")evaluates toTrue, and("2" or 1) > 1raisesTypeError(str vs int) before coercion.- Multi-line comment blocks at lines 8-14 and 78-80 violate the “comments should be max one short line” guideline (they document option values / behavior rather than “WHY”).
🐛 Proposed fix
@@
def check_sonicmoe_ep_unsupported(cls, data):
"""SonicMoE + EP is not yet implemented (EP `_sonicmoe_local` raises)."""
- if data.get("use_sonicmoe") and (data.get("expert_parallel_size") or 1) > 1:
+ ep_size_raw = data.get("expert_parallel_size")
+ try:
+ ep_size = int(ep_size_raw) if ep_size_raw is not None else 1
+ except (TypeError, ValueError):
+ ep_size = 1
+ if data.get("use_sonicmoe") is True and ep_size > 1:
raise ValueError(
"use_sonicmoe=true is not supported with expert_parallel_size > 1. "
"Use use_scattermoe=true under EP, or set expert_parallel_size=1."
)
return data
@@
- use_scattermoe = bool(data.get("use_scattermoe"))
- use_sonicmoe = bool(data.get("use_sonicmoe"))
+ use_scattermoe = data.get("use_scattermoe") is True
+ use_sonicmoe = data.get("use_sonicmoe") is True🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/axolotl/integrations/kernels/args.py` at line 57, The before-validator
currently evaluates raw dict values directly (e.g., data.get("use_sonicmoe") and
(data.get("expert_parallel_size") or 1) > 1) which mis-handles string inputs
like "false" or "2" and can raise TypeError; update the before-validator to
explicitly coerce/parse these fields first (e.g., parse use_sonicmoe into a
boolean using a safe string-to-bool helper and coerce expert_parallel_size to
int with a fallback) and then perform the comparison, adjusting any helper or
validator functions used by the validator; also replace the multi-line comment
blocks that document option values with short single-line comments (split or
rephrase the content around the validator and the option docs) to comply with
the one-line comment guideline.
| if cfg.use_scattermoe: | ||
| self._register_kernels() | ||
| if is_experts_only_model(moe_model_type): | ||
| # Models like Gemma4 where MoE is embedded in the decoder layer | ||
| # — register ScatterMoE in the ExpertsInterface so that | ||
| # @use_experts_implementation dispatches to it. | ||
| self._register_experts_interface() | ||
| if not ep_active: | ||
| cfg.experts_implementation = "scattermoe" | ||
| elif ep_active: | ||
| LOG.info( | ||
| "expert_parallel_size > 1: skipping SparseMoeBlock-level " | ||
| "ScatterMoE patch; the deep_ep_scattermoe registered " | ||
| "function handles the kernel under EP." | ||
| ) | ||
| else: | ||
| self._kernelize_model(moe_model_type) | ||
| elif cfg.use_sonicmoe: | ||
| if not importlib.util.find_spec("sonicmoe"): | ||
| raise RuntimeError( | ||
| "SonicMoE is not installed. See installation instructions at " | ||
| "https://github.com/axolotl-ai-cloud/axolotl/blob/main/src/axolotl/integrations/kernels/README.md#sonicmoe-installation" | ||
| ) | ||
| from axolotl.integrations.kernels.libs.scattermoe_lora.experts import ( | ||
| register_scattermoe_experts, | ||
| ) | ||
|
|
||
| register_scattermoe_experts() | ||
| if not ep_active: | ||
| cfg.experts_implementation = "scattermoe" | ||
| LOG.info("Registered 'scattermoe' in transformers ExpertsInterface") | ||
| elif cfg.use_sonicmoe: | ||
| _check_sonicmoe_gpu_compat() | ||
|
|
||
| if is_experts_only_model(moe_model_type): | ||
| from axolotl.integrations.kernels.libs.sonicmoe.gemma4_experts import ( | ||
| patch_gemma4_sonicmoe, | ||
| ) | ||
|
|
||
| LOG.info( | ||
| f"Applying SonicMoE experts-level patch for model type: {moe_model_type}" | ||
| ) | ||
| patch_gemma4_sonicmoe() | ||
| # TODO(EP+SonicMoE): grad norms explode during training. Re-enable | ||
| # once the root cause is identified. Same shape as the ScatterMoE | ||
| # branch above, but SonicMoE additionally needs the gate_up_proj | ||
| # interleave converter since its w1 layout is [g0, u0, g1, u1, ...] | ||
| # while the checkpoint stores it concatenated [gate..., up...]. | ||
| # | ||
| # elif ep_active: | ||
| # from axolotl.integrations.kernels.libs.sonicmoe.weight_converter import ( | ||
| # register_sonicmoe_weight_converter, | ||
| # ) | ||
| # | ||
| # LOG.info( | ||
| # "expert_parallel_size > 1: skipping SparseMoeBlock-level " | ||
| # "SonicMoE patch; the deep_ep_sonicmoe registered function " | ||
| # "handles the kernel under EP. Registering gate_up_proj " | ||
| # "interleave converter." | ||
| # ) | ||
| # register_sonicmoe_weight_converter(moe_model_type) | ||
| else: | ||
| from axolotl.integrations.kernels.libs.sonicmoe import patch_sonicmoe | ||
|
|
||
| LOG.info(f"Applying SonicMoE patches for model type: {moe_model_type}") | ||
| patch_sonicmoe( | ||
| moe_model_type, | ||
| torch_compile=bool(getattr(cfg, "torch_compile", False)), | ||
| base_model_type=cfg.model_config_type, | ||
| ) | ||
|
|
||
| def _register_kernels(self): | ||
| from kernels import ( | ||
| LocalLayerRepository, | ||
| Mode, | ||
| register_kernel_mapping, | ||
| ) | ||
| from axolotl.integrations.kernels.libs.sonicmoe.experts import ( | ||
| register_sonicmoe_experts, | ||
| ) | ||
|
|
||
| plugin_root = Path(__file__).parent | ||
| register_kernel_mapping( | ||
| { | ||
| "HFScatterMoEParallelExperts": { | ||
| "cuda": { | ||
| Mode.TRAINING: LocalLayerRepository( | ||
| repo_path=plugin_root / "libs" / "scattermoe_lora", | ||
| package_name="scattermoe_lora", | ||
| layer_name="HFScatterMoEGatedMLP", | ||
| ), | ||
| Mode.INFERENCE: LocalLayerRepository( | ||
| repo_path=plugin_root / "libs" / "scattermoe_lora", | ||
| package_name="scattermoe_lora", | ||
| layer_name="HFScatterMoEGatedMLP", | ||
| ), | ||
| }, | ||
| } | ||
| } | ||
| ) | ||
| register_sonicmoe_experts() | ||
| if not ep_active: | ||
| cfg.experts_implementation = "sonicmoe" | ||
| LOG.info("Registered 'sonicmoe' in transformers ExpertsInterface") |
There was a problem hiding this comment.
Honor cfg.experts_implementation when registering kernels.
pre_model_load() only installs the handler when the legacy use_scattermoe / use_sonicmoe booleans are set. In this stack the validator layer also accepts YAML that selects experts_implementation directly, so experts_implementation: scattermoe or sonicmoe without the matching boolean leaves the registry uninitialized and fails later at lookup time.
Suggested fix
+ requested_impl = getattr(cfg, "experts_implementation", None)
- if cfg.use_scattermoe:
+ if cfg.use_scattermoe or requested_impl == "scattermoe":
from axolotl.integrations.kernels.libs.scattermoe_lora.experts import (
register_scattermoe_experts,
)
register_scattermoe_experts()
if not ep_active:
cfg.experts_implementation = "scattermoe"
LOG.info("Registered 'scattermoe' in transformers ExpertsInterface")
- elif cfg.use_sonicmoe:
+ elif cfg.use_sonicmoe or requested_impl == "sonicmoe":
_check_sonicmoe_gpu_compat()As per coding guidelines, **/*.py: Config-driven architecture: all training features are toggled via YAML configuration, not code changes.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if cfg.use_scattermoe: | |
| self._register_kernels() | |
| if is_experts_only_model(moe_model_type): | |
| # Models like Gemma4 where MoE is embedded in the decoder layer | |
| # — register ScatterMoE in the ExpertsInterface so that | |
| # @use_experts_implementation dispatches to it. | |
| self._register_experts_interface() | |
| if not ep_active: | |
| cfg.experts_implementation = "scattermoe" | |
| elif ep_active: | |
| LOG.info( | |
| "expert_parallel_size > 1: skipping SparseMoeBlock-level " | |
| "ScatterMoE patch; the deep_ep_scattermoe registered " | |
| "function handles the kernel under EP." | |
| ) | |
| else: | |
| self._kernelize_model(moe_model_type) | |
| elif cfg.use_sonicmoe: | |
| if not importlib.util.find_spec("sonicmoe"): | |
| raise RuntimeError( | |
| "SonicMoE is not installed. See installation instructions at " | |
| "https://github.com/axolotl-ai-cloud/axolotl/blob/main/src/axolotl/integrations/kernels/README.md#sonicmoe-installation" | |
| ) | |
| from axolotl.integrations.kernels.libs.scattermoe_lora.experts import ( | |
| register_scattermoe_experts, | |
| ) | |
| register_scattermoe_experts() | |
| if not ep_active: | |
| cfg.experts_implementation = "scattermoe" | |
| LOG.info("Registered 'scattermoe' in transformers ExpertsInterface") | |
| elif cfg.use_sonicmoe: | |
| _check_sonicmoe_gpu_compat() | |
| if is_experts_only_model(moe_model_type): | |
| from axolotl.integrations.kernels.libs.sonicmoe.gemma4_experts import ( | |
| patch_gemma4_sonicmoe, | |
| ) | |
| LOG.info( | |
| f"Applying SonicMoE experts-level patch for model type: {moe_model_type}" | |
| ) | |
| patch_gemma4_sonicmoe() | |
| # TODO(EP+SonicMoE): grad norms explode during training. Re-enable | |
| # once the root cause is identified. Same shape as the ScatterMoE | |
| # branch above, but SonicMoE additionally needs the gate_up_proj | |
| # interleave converter since its w1 layout is [g0, u0, g1, u1, ...] | |
| # while the checkpoint stores it concatenated [gate..., up...]. | |
| # | |
| # elif ep_active: | |
| # from axolotl.integrations.kernels.libs.sonicmoe.weight_converter import ( | |
| # register_sonicmoe_weight_converter, | |
| # ) | |
| # | |
| # LOG.info( | |
| # "expert_parallel_size > 1: skipping SparseMoeBlock-level " | |
| # "SonicMoE patch; the deep_ep_sonicmoe registered function " | |
| # "handles the kernel under EP. Registering gate_up_proj " | |
| # "interleave converter." | |
| # ) | |
| # register_sonicmoe_weight_converter(moe_model_type) | |
| else: | |
| from axolotl.integrations.kernels.libs.sonicmoe import patch_sonicmoe | |
| LOG.info(f"Applying SonicMoE patches for model type: {moe_model_type}") | |
| patch_sonicmoe( | |
| moe_model_type, | |
| torch_compile=bool(getattr(cfg, "torch_compile", False)), | |
| base_model_type=cfg.model_config_type, | |
| ) | |
| def _register_kernels(self): | |
| from kernels import ( | |
| LocalLayerRepository, | |
| Mode, | |
| register_kernel_mapping, | |
| ) | |
| from axolotl.integrations.kernels.libs.sonicmoe.experts import ( | |
| register_sonicmoe_experts, | |
| ) | |
| plugin_root = Path(__file__).parent | |
| register_kernel_mapping( | |
| { | |
| "HFScatterMoEParallelExperts": { | |
| "cuda": { | |
| Mode.TRAINING: LocalLayerRepository( | |
| repo_path=plugin_root / "libs" / "scattermoe_lora", | |
| package_name="scattermoe_lora", | |
| layer_name="HFScatterMoEGatedMLP", | |
| ), | |
| Mode.INFERENCE: LocalLayerRepository( | |
| repo_path=plugin_root / "libs" / "scattermoe_lora", | |
| package_name="scattermoe_lora", | |
| layer_name="HFScatterMoEGatedMLP", | |
| ), | |
| }, | |
| } | |
| } | |
| ) | |
| register_sonicmoe_experts() | |
| if not ep_active: | |
| cfg.experts_implementation = "sonicmoe" | |
| LOG.info("Registered 'sonicmoe' in transformers ExpertsInterface") | |
| requested_impl = getattr(cfg, "experts_implementation", None) | |
| if cfg.use_scattermoe or requested_impl == "scattermoe": | |
| from axolotl.integrations.kernels.libs.scattermoe_lora.experts import ( | |
| register_scattermoe_experts, | |
| ) | |
| register_scattermoe_experts() | |
| if not ep_active: | |
| cfg.experts_implementation = "scattermoe" | |
| LOG.info("Registered 'scattermoe' in transformers ExpertsInterface") | |
| elif cfg.use_sonicmoe or requested_impl == "sonicmoe": | |
| _check_sonicmoe_gpu_compat() | |
| from axolotl.integrations.kernels.libs.sonicmoe.experts import ( | |
| register_sonicmoe_experts, | |
| ) | |
| register_sonicmoe_experts() | |
| if not ep_active: | |
| cfg.experts_implementation = "sonicmoe" | |
| LOG.info("Registered 'sonicmoe' in transformers ExpertsInterface") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/axolotl/integrations/kernels/plugin.py` around lines 73 - 92,
pre_model_load currently only registers experts if cfg.use_scattermoe or
cfg.use_sonicmoe are true, which breaks when the YAML sets
cfg.experts_implementation directly; update pre_model_load to also inspect
cfg.experts_implementation (values "scattermoe" and "sonicmoe") and call
register_scattermoe_experts() or register_sonicmoe_experts() as appropriate when
the corresponding boolean is false, preserve the existing ep_active check and
assignment to cfg.experts_implementation, and keep LOG.info messages unchanged
so the registry is initialized whenever either the legacy booleans or the
config-driven experts_implementation indicates scattermoe/sonicmoe.
| elif cfg.use_sonicmoe: | ||
| _check_sonicmoe_gpu_compat() |
There was a problem hiding this comment.
Fail fast for SonicMoE on CPU-only hosts.
This branch treats _check_sonicmoe_gpu_compat() as the capability gate, but that helper currently returns when CUDA is unavailable. use_sonicmoe therefore survives plugin setup and only dies much later in sonicmoe_experts_forward_with_lora, which makes the unsupported config harder to diagnose.
Suggested fix
- if not torch.cuda.is_available():
- return
+ if not torch.cuda.is_available():
+ raise RuntimeError("SonicMoE requires a CUDA device.")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/axolotl/integrations/kernels/plugin.py` around lines 82 - 83, The branch
that enables SonicMoE (cfg.use_sonicmoe) must fail immediately when the host
lacks GPU support: call _check_sonicmoe_gpu_compat() and if it reports
incompatibility (or returns a falsy value) raise an explicit exception (e.g.,
RuntimeError/ValueError) or return an error so plugin setup aborts rather than
continuing; alternatively modify _check_sonicmoe_gpu_compat() to raise on
CPU-only hosts and ensure the cfg.use_sonicmoe path relies on that behavior so
the misconfiguration is detected early instead of failing later in
sonicmoe_experts_forward_with_lora.
ca4ca95 to
8e3ed29
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
8e3ed29 to
8ee58b1
Compare
registry
Description
Waiting on #3632 to get in first to refactor how it handles experts there.Cleans up our kernels implementation to reduce our impact, let upstream handle the routing. We just apply to the MoE block now. This increases our support surface while reducing the burden of rewriting routing for new archs.
Motivation and Context
How has this been tested?
AI Usage Disclaimer
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)