Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 56 additions & 5 deletions tests/test_moe_lora_extractor_coverage.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,11 +302,62 @@ def test_every_patched_moe_experts_class_has_lora_extractor():
_apply_all_temporary_patches()

patched = _discover_patched_moe_classes()
assert patched, (
"Discovery produced zero patched MoE classes. Either no MoE "
"transformers module imported on this transformers version, or the "
"_unsloth_already_patched marker convention changed. Investigate."
)
if not patched:
# Disambiguate two zero-discovery causes:
# (a) the installed transformers predates every MoE class surface
# unsloth_zoo targets (TEMPORARY_PATCHES entries no-op silently
# when their target submodule is absent) -- legitimate skip; or
# (b) the `_original_<modeling_tail>_<ClassName>_forward` marker
# convention used by `temporary_patches.utils.patch_function`
# (and read by `_has_unsloth_patched_forward` above) drifted --
# a real regression that the test must surface.
# Probe a stable canary: if ANY of the known MoE classes the patch
# functions target is importable on this transformers, then the
# patch surface exists and (a) is ruled out -- a zero discovery
# under that condition is (b), and we fail loudly. Otherwise we
# genuinely have no patch surface to test and skip.
# Structural canary: walk every transformers.models.*.modeling_* module
# and ask "does ANY class declare `gate_up_proj` + `down_proj` as 3D
# `nn.Parameter`?" That is precisely the surface
# `_looks_like_grouped_moe_experts` filters discovery on. If at least
# one such class exists anywhere in transformers, the surface this
# test guards IS present on this version, and discovery returning 0
# is a real regression (either `_has_unsloth_patched_forward`'s marker
# convention drifted or every TEMPORARY_PATCHES entry that targets
# such a class silently failed before reaching `patch_function`).
# Scanning the live tree avoids hard-coding model names that change
# as transformers grows new MoE families.
importable = [
f"{cls.__module__}.{cls.__name__}"
for modeling in _iter_modeling_modules()
for _, cls in inspect.getmembers(modeling, inspect.isclass)
if isinstance(cls, type)
and getattr(cls, "__module__", None) == modeling.__name__
and _looks_like_grouped_moe_experts(cls)
]
Comment on lines +330 to +337
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict the canary to patch targets

In older transformers cells where none of the current TEMPORARY_PATCHES targets import, this canary still treats any grouped-MoE-looking class as proof that the patch surface exists. For example, transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeExperts has the same gate_up_proj/down_proj 3D nn.Parameter declarations in 4.57.x, but this repo has no qwen2_moe temporary patch target, so patched remains empty and this branch raises instead of performing the intended skip. Please limit the canary to classes/modules actually targeted by TEMPORARY_PATCHES (or otherwise distinguish untargeted MoE families) so old-transformers CI cells do not fail falsely.

Useful? React with 👍 / 👎.

if importable:
raise AssertionError(
"Discovery produced zero patched MoE classes, but transformers "
f"ships {len(importable)} class(es) whose `__init__` declares "
"`gate_up_proj` + `down_proj` as 3D `nn.Parameter`s -- the "
"exact surface this test targets. Examples: "
f"{importable[:5]}. That means TEMPORARY_PATCHES entries ran "
"without recording the `_original_<modeling_tail>_<ClassName>_"
"forward` marker that `_has_unsloth_patched_forward` reads. "
"Either `patch_function` no longer stores the original under "
"that attribute name, or the patch fn for every grouped-MoE "
"family silently failed before reaching `patch_function`. "
"This is the regression the test guards against -- do not "
"convert this back into a skip without first re-establishing "
"the marker convention."
)
pytest.skip(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve failure on discovery-regression cases

On a supported/newer transformers install that does contain the MoE classes, if the patch marker convention changes (or all relevant patch functions fail before recording the _original_*_forward marker), _discover_patched_moe_classes() still returns [] and this branch reports the test as skipped. That is the same discovery-regression case the previous assertion caught, so this can make the extractor-registration coverage silently disappear from the CI matrix instead of failing when coverage is no longer being exercised.

Useful? React with 👍 / 👎.

"No patched MoE classes discovered AND no class anywhere in "
"transformers.models.* declares `gate_up_proj` + `down_proj` as "
"3D `nn.Parameter`s. The grouped-MoE patch surface this test "
"guards is absent on this transformers version; cells running a "
"newer transformers exercise the real assertion."
)
Comment on lines +305 to +360
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment and skip message refer to the _unsloth_already_patched marker convention. However, the discovery logic in _discover_patched_moe_classes (specifically _has_unsloth_patched_forward) actually looks for the _original_..._forward attribute. Using the correct terminology ensures that if the patching mechanism changes, the test's failure/skip messages remain accurate and helpful for debugging.

    if not patched:
        # Two situations produce zero discoveries:
        #   1. The transformers version installed predates the MoE class
        #      surface unsloth_zoo patches (e.g. transformers 4.57.6 ships
        #      none of the gemma4 / llama4 / qwen3_moe / mxfp4 classes the
        #      TEMPORARY_PATCHES list targets, so every patch fn no-ops).
        #   2. The patching marker convention (e.g. _original_..._forward)
        #      drifted (the regression this test guards against).
        # We can't tell (1) and (2) apart by discovery alone. Skip rather
        # than fail: a CI cell pinned to an older transformers should not
        # surface as a red regression here. CI cells running the
        # transformers version the PR actually pins (and any newer one)
        # will discover patched classes and run the real assertion.
        pytest.skip(
            "No patched MoE classes discovered on this transformers version. "
            "Either transformers is older than the MoE class surface "
            "unsloth_zoo patches, or the patching marker convention "
            "changed. The latter is a real regression but "
            "indistinguishable from the former here; skip and let cells "
            "running newer transformers exercise the real assertion."
        )

Comment on lines +305 to +360
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Replacing the assertion with an unconditional skip when no patched classes are found introduces a risk of silent test failure. If the discovery mechanism (e.g., the _original_ attribute naming convention) breaks, the test will skip on all platforms, including those where MoE classes exist, thus failing to detect the regression it was designed to prevent.

To distinguish between an old transformers version (where skipping is appropriate) and a broken discovery mechanism (where the test should fail), you can check for the existence of a known MoE module. If the module exists but discovery found nothing, it's a real failure.

    if not patched:
        # To distinguish between "old transformers" (skip) and "broken discovery" (fail),
        # check if a known MoE module exists. If it does, discovery should have found it.
        try:
            importlib.import_module("transformers.models.gemma4.modeling_gemma4")
            assert patched, (
                "Discovery produced zero patched MoE classes, but Gemma-4 was found. "
                "This indicates the discovery logic or marker convention is broken."
            )
        except ImportError:
            pass

        pytest.skip(
            "No patched MoE classes discovered. This is expected on older "
            "transformers versions where the target MoE classes do not exist."
        )


# Hard contract: extractor must be registered on every patched class.
missing = [
Expand Down