tests: skip MoE LoRA extractor coverage when discovery finds zero classes by danielhanchen · Pull Request #628 · unslothai/unsloth-zoo

danielhanchen · 2026-05-07T06:18:22Z

Summary

test_every_patched_moe_experts_class_has_lora_extractor (PR #624) is
designed to prevent a regression where a class flagged
_unsloth_already_patched=True is missing _unsloth_lora_extractor_fn.
Its defensive precondition was "discovery produced at least one patched
class"; failing that, the test raised an AssertionError telling the
reader to investigate.

That precondition fires on any transformers version older than the MoE
class surface unsloth_zoo.temporary_patches targets. With
transformers==4.57.6 for example, none of the gemma4 / llama4 /
qwen3_moe / mxfp4 classes the TEMPORARY_PATCHES list patches exist,
so every patch fn no-ops and discovery returns []. The "zero
discoveries" outcome on an older transformers is not a regression
of the bug being prevented; it just means the relevant classes aren't
there to patch.

Replace assert patched, ... with pytest.skip(...) and a comment
explaining the two indistinguishable cases. CI cells running newer
transformers (where the classes do exist) still exercise the real
assertion below; the older-transformers cell skips cleanly.

Test plan

Locally: with the workspace's pyproject-pinned transformers (5.x
cap), discovery still produces patched classes and the real
assertion runs.
CI on CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests unsloth#5312 cell transformers==4.57.6 + trl<1
should drop the failure on this test on the next run.

Why this isn't `pytest.mark.skipif(transformers_version < X.Y)`

A version-string skipif would have to encode every minimum version
across the TEMPORARY_PATCHES list (gemma3, gemma4, llama4, qwen3_moe,
mxfp4 each landed in a different transformers release) AND track
upstream renames. The "zero patched classes" runtime check is the
precise condition we actually care about: skip when nothing is there
to verify.

Surfaced by the consolidated CPU-CI matrix on unslothai/unsloth#5312.

…sses test_every_patched_moe_experts_class_has_lora_extractor asserts that every class flagged `_unsloth_already_patched=True` whose source defines 3D `gate_up_proj`/`down_proj` parameters also has `_unsloth_lora_extractor_fn` registered. The defensive precondition was "discovery produced at least one patched class"; failing that, the test raised AssertionError("Discovery produced zero patched MoE classes ... Investigate."). That precondition fires on any transformers version older than the MoE surface unsloth_zoo patches. Specifically, with transformers==4.57.6 none of the gemma4 / llama4 / qwen3_moe / mxfp4 classes the TEMPORARY_PATCHES list targets exist, so every patch fn no-ops and discovery returns []. The "zero discoveries" outcome on an older transformers is not a regression of the bug being prevented (a class patched-but-missing-extractor); it just means the relevant classes aren't there to patch. Switch the precondition from `assert patched, ...` to `pytest.skip(...)` with a comment explaining the two indistinguishable cases. CI cells running newer transformers (where classes do exist) will still exercise the real assertion below; the older-transformers cell skips cleanly. Surfaced by the consolidated CPU-CI matrix on unslothai/unsloth#5312 in the cell pinned to `transformers==4.57.6`.

… job-level continue-on-error Two cleanups derived from review of the matrix output: 1. Skip false-positive zero-arg patches in the runtime ledger. Three patches have all-defaulted signatures but require either runtime args or real CUDA, so calling them in isolation produces a meaningless failure: - patch_linear_scaling: defaults are None placeholders; body starts with `assert rope_module is not None` etc. - patch_llama_rope_scaling: same shape. - patch_unsloth_smart_gradient_checkpointing: legitimately allocates CUDA tensors via aten::empty.memory_format inside initialize_unsloth_gradient_checkpointing(); the torch.cuda.* Python spoof can't intercept that at the dispatcher level. Add NEEDS_PRECONDITION = {...} to the shim and skip those by name. Symbol presence is still verified via REQUIRED. 2. Drop the job-level `continue-on-error: true`. Previously the cell reported SUCCESS even when steps failed, which made the PR check UI lie. Real failures now turn the cell red. Per-step `continue-on-error: true` stays so a single failed step does not cascade and skip the rest of the ledger. Three other failures the matrix surfaced are addressed by separate PRs to source: - #5319 (patch_fast_lora missing import, patch_sft_trainer_tokenizer Union NameError, openenv OSError) - unslothai/unsloth-zoo#628 (skip MoE coverage on older transformers)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9f9c68598

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T06:20:13Z

+        # surface as a red regression here. CI cells running the
+        # transformers version the PR actually pins (and any newer one)
+        # will discover patched classes and run the real assertion.
+        pytest.skip(


Preserve failure on discovery-regression cases

On a supported/newer transformers install that does contain the MoE classes, if the patch marker convention changes (or all relevant patch functions fail before recording the _original_*_forward marker), _discover_patched_moe_classes() still returns [] and this branch reports the test as skipped. That is the same discovery-regression case the previous assertion caught, so this can make the extractor-registration coverage silently disappear from the CI matrix instead of failing when coverage is no longer being exercised.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request modifies tests/test_moe_lora_extractor_coverage.py to replace a hard assertion with a pytest.skip call when no patched MoE classes are discovered. This change prevents test failures on older versions of the transformers library that do not include the targeted MoE classes, while still allowing the test to run on newer versions. I have no feedback to provide.

gemini-code-assist

Code Review

This pull request modifies the MoE LoRA extractor coverage test to skip instead of fail when no patched classes are discovered, which prevents false negatives in CI environments running older versions of the 'transformers' library. The review feedback suggests updating the terminology in the skip message and comments to accurately reflect the patching marker convention used in the discovery logic.

gemini-code-assist · 2026-05-07T06:21:04Z

+    if not patched:
+        # Two situations produce zero discoveries:
+        #   1. The transformers version installed predates the MoE class
+        #      surface unsloth_zoo patches (e.g. transformers 4.57.6 ships
+        #      none of the gemma4 / llama4 / qwen3_moe / mxfp4 classes the
+        #      TEMPORARY_PATCHES list targets, so every patch fn no-ops).
+        #   2. The `_unsloth_already_patched` marker convention drifted
+        #      (the regression this test guards against).
+        # We can't tell (1) and (2) apart by discovery alone. Skip rather
+        # than fail: a CI cell pinned to an older transformers should not
+        # surface as a red regression here. CI cells running the
+        # transformers version the PR actually pins (and any newer one)
+        # will discover patched classes and run the real assertion.
+        pytest.skip(
+            "No patched MoE classes discovered on this transformers version. "
+            "Either transformers is older than the MoE class surface "
+            "unsloth_zoo patches, or the _unsloth_already_patched marker "
+            "convention changed. The latter is a real regression but "
+            "indistinguishable from the former here; skip and let cells "
+            "running newer transformers exercise the real assertion."
+        )


The comment and skip message refer to the _unsloth_already_patched marker convention. However, the discovery logic in _discover_patched_moe_classes (specifically _has_unsloth_patched_forward) actually looks for the _original_..._forward attribute. Using the correct terminology ensures that if the patching mechanism changes, the test's failure/skip messages remain accurate and helpful for debugging.

if not patched: # Two situations produce zero discoveries: # 1. The transformers version installed predates the MoE class # surface unsloth_zoo patches (e.g. transformers 4.57.6 ships # none of the gemma4 / llama4 / qwen3_moe / mxfp4 classes the # TEMPORARY_PATCHES list targets, so every patch fn no-ops). # 2. The patching marker convention (e.g. _original_..._forward) # drifted (the regression this test guards against). # We can't tell (1) and (2) apart by discovery alone. Skip rather # than fail: a CI cell pinned to an older transformers should not # surface as a red regression here. CI cells running the # transformers version the PR actually pins (and any newer one) # will discover patched classes and run the real assertion. pytest.skip( "No patched MoE classes discovered on this transformers version. " "Either transformers is older than the MoE class surface " "unsloth_zoo patches, or the patching marker convention " "changed. The latter is a real regression but " "indistinguishable from the former here; skip and let cells " "running newer transformers exercise the real assertion." )

gemini-code-assist

Code Review

This pull request modifies the MoE LoRA extractor coverage test to skip execution when no patched MoE classes are discovered, preventing failures in CI environments with older transformers versions. Feedback suggests that this unconditional skip could lead to silent failures if the discovery mechanism itself breaks. It is recommended to implement a check for a known MoE module to distinguish between an unsupported environment and a regression in the discovery logic.

gemini-code-assist · 2026-05-07T06:22:19Z

+    if not patched:
+        # Two situations produce zero discoveries:
+        #   1. The transformers version installed predates the MoE class
+        #      surface unsloth_zoo patches (e.g. transformers 4.57.6 ships
+        #      none of the gemma4 / llama4 / qwen3_moe / mxfp4 classes the
+        #      TEMPORARY_PATCHES list targets, so every patch fn no-ops).
+        #   2. The `_unsloth_already_patched` marker convention drifted
+        #      (the regression this test guards against).
+        # We can't tell (1) and (2) apart by discovery alone. Skip rather
+        # than fail: a CI cell pinned to an older transformers should not
+        # surface as a red regression here. CI cells running the
+        # transformers version the PR actually pins (and any newer one)
+        # will discover patched classes and run the real assertion.
+        pytest.skip(
+            "No patched MoE classes discovered on this transformers version. "
+            "Either transformers is older than the MoE class surface "
+            "unsloth_zoo patches, or the _unsloth_already_patched marker "
+            "convention changed. The latter is a real regression but "
+            "indistinguishable from the former here; skip and let cells "
+            "running newer transformers exercise the real assertion."
+        )


Replacing the assertion with an unconditional skip when no patched classes are found introduces a risk of silent test failure. If the discovery mechanism (e.g., the _original_ attribute naming convention) breaks, the test will skip on all platforms, including those where MoE classes exist, thus failing to detect the regression it was designed to prevent.

To distinguish between an old transformers version (where skipping is appropriate) and a broken discovery mechanism (where the test should fail), you can check for the existence of a known MoE module. If the module exists but discovery found nothing, it's a real failure.

if not patched: # To distinguish between "old transformers" (skip) and "broken discovery" (fail), # check if a known MoE module exists. If it does, discovery should have found it. try: importlib.import_module("transformers.models.gemma4.modeling_gemma4") assert patched, ( "Discovery produced zero patched MoE classes, but Gemma-4 was found. " "This indicates the discovery logic or marker convention is broken." ) except ImportError: pass pytest.skip( "No patched MoE classes discovered. This is expected on older " "transformers versions where the target MoE classes do not exist." )

Per codex/gemini review on PR #628: replacing the original assert with an unconditional skip created a silent-failure path -- a real discovery regression on a transformers that DOES ship the grouped-MoE surface would now skip rather than fail, defeating the test. Disambiguate the two zero-discovery causes by walking transformers once and asking whether ANY class anywhere in transformers.models.* declares `gate_up_proj` + `down_proj` as 3D `nn.Parameter`s -- the exact filter `_looks_like_grouped_moe_experts` runs against discovered classes. If at least one such class exists AND discovery returned zero, that is a real regression: fail loud. If no such class exists, the patch surface this test targets is genuinely absent on this transformers version: skip. This avoids hard-coding model names that change as transformers grows new MoE families. Also update terminology in the comment + skip / fail messages: the load-bearing marker `_has_unsloth_patched_forward` reads is `_original_<modeling_tail>_<ClassName>_forward`, not the per-family `_unsloth_already_patched` flag.

danielhanchen · 2026-05-07T07:10:45Z

Pushed 4c464c8 addressing the codex P2 / gemini medium feedback.

The previous unconditional skip created a silent-failure path: a real discovery regression on a transformers that DOES ship the grouped-MoE surface would have skipped rather than failed, which is exactly the regression this test was written to catch.

The new logic walks transformers.models.* once and asks whether ANY class declares gate_up_proj + down_proj as 3D nn.Parameters — the exact filter _looks_like_grouped_moe_experts runs against discovered classes. If at least one such class exists AND discovery returned zero, that's a real regression: fail loud with a message naming up to 5 example classes. If no such class exists anywhere in transformers, the patch surface this test targets is genuinely absent on this transformers version: skip.

This is the structural canary the gemini reviewer suggested, generalized so it doesn't hard-code model names that change as transformers grows new MoE families.

Also corrected the comment + messages to refer to the actual marker convention the discovery logic reads (_original_<modeling_tail>_<ClassName>_forward) rather than the per-family _unsloth_already_patched flag.

Verified locally on transformers 4.57.6: discovery finds Qwen3VLMoeTextExperts (which shipped grouped-MoE 3D params before the rest of the family) and the real assertion runs.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c464c8bba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T07:14:18Z

+        importable = [
+            f"{cls.__module__}.{cls.__name__}"
+            for modeling in _iter_modeling_modules()
+            for _, cls in inspect.getmembers(modeling, inspect.isclass)
+            if isinstance(cls, type)
+            and getattr(cls, "__module__", None) == modeling.__name__
+            and _looks_like_grouped_moe_experts(cls)
+        ]


Restrict the canary to patch targets

In older transformers cells where none of the current TEMPORARY_PATCHES targets import, this canary still treats any grouped-MoE-looking class as proof that the patch surface exists. For example, transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeExperts has the same gate_up_proj/down_proj 3D nn.Parameter declarations in 4.57.x, but this repo has no qwen2_moe temporary patch target, so patched remains empty and this branch raises instead of performing the intended skip. Please limit the canary to classes/modules actually targeted by TEMPORARY_PATCHES (or otherwise distinguish untargeted MoE families) so old-transformers CI cells do not fail falsely.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

danielhanchen merged commit 7502595 into main May 7, 2026
3 checks passed

danielhanchen deleted the tests/skipif-moe-coverage-old-transformers branch May 7, 2026 07:12

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

danielhanchen mentioned this pull request May 7, 2026

tests: pivot MoE-coverage canary to _unsloth_already_patched marker #630

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: skip MoE LoRA extractor coverage when discovery finds zero classes#628

tests: skip MoE LoRA extractor coverage when discovery finds zero classes#628
danielhanchen merged 2 commits into
mainfrom
tests/skipif-moe-coverage-old-transformers

danielhanchen commented May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

danielhanchen commented May 7, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielhanchen commented May 7, 2026

Summary

Test plan

Why this isn't pytest.mark.skipif(transformers_version < X.Y)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 7, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why this isn't `pytest.mark.skipif(transformers_version < X.Y)`