-
Notifications
You must be signed in to change notification settings - Fork 265
tests: skip MoE LoRA extractor coverage when discovery finds zero classes #628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -302,11 +302,62 @@ def test_every_patched_moe_experts_class_has_lora_extractor(): | |
| _apply_all_temporary_patches() | ||
|
|
||
| patched = _discover_patched_moe_classes() | ||
| assert patched, ( | ||
| "Discovery produced zero patched MoE classes. Either no MoE " | ||
| "transformers module imported on this transformers version, or the " | ||
| "_unsloth_already_patched marker convention changed. Investigate." | ||
| ) | ||
| if not patched: | ||
| # Disambiguate two zero-discovery causes: | ||
| # (a) the installed transformers predates every MoE class surface | ||
| # unsloth_zoo targets (TEMPORARY_PATCHES entries no-op silently | ||
| # when their target submodule is absent) -- legitimate skip; or | ||
| # (b) the `_original_<modeling_tail>_<ClassName>_forward` marker | ||
| # convention used by `temporary_patches.utils.patch_function` | ||
| # (and read by `_has_unsloth_patched_forward` above) drifted -- | ||
| # a real regression that the test must surface. | ||
| # Probe a stable canary: if ANY of the known MoE classes the patch | ||
| # functions target is importable on this transformers, then the | ||
| # patch surface exists and (a) is ruled out -- a zero discovery | ||
| # under that condition is (b), and we fail loudly. Otherwise we | ||
| # genuinely have no patch surface to test and skip. | ||
| # Structural canary: walk every transformers.models.*.modeling_* module | ||
| # and ask "does ANY class declare `gate_up_proj` + `down_proj` as 3D | ||
| # `nn.Parameter`?" That is precisely the surface | ||
| # `_looks_like_grouped_moe_experts` filters discovery on. If at least | ||
| # one such class exists anywhere in transformers, the surface this | ||
| # test guards IS present on this version, and discovery returning 0 | ||
| # is a real regression (either `_has_unsloth_patched_forward`'s marker | ||
| # convention drifted or every TEMPORARY_PATCHES entry that targets | ||
| # such a class silently failed before reaching `patch_function`). | ||
| # Scanning the live tree avoids hard-coding model names that change | ||
| # as transformers grows new MoE families. | ||
| importable = [ | ||
| f"{cls.__module__}.{cls.__name__}" | ||
| for modeling in _iter_modeling_modules() | ||
| for _, cls in inspect.getmembers(modeling, inspect.isclass) | ||
| if isinstance(cls, type) | ||
| and getattr(cls, "__module__", None) == modeling.__name__ | ||
| and _looks_like_grouped_moe_experts(cls) | ||
| ] | ||
| if importable: | ||
| raise AssertionError( | ||
| "Discovery produced zero patched MoE classes, but transformers " | ||
| f"ships {len(importable)} class(es) whose `__init__` declares " | ||
| "`gate_up_proj` + `down_proj` as 3D `nn.Parameter`s -- the " | ||
| "exact surface this test targets. Examples: " | ||
| f"{importable[:5]}. That means TEMPORARY_PATCHES entries ran " | ||
| "without recording the `_original_<modeling_tail>_<ClassName>_" | ||
| "forward` marker that `_has_unsloth_patched_forward` reads. " | ||
| "Either `patch_function` no longer stores the original under " | ||
| "that attribute name, or the patch fn for every grouped-MoE " | ||
| "family silently failed before reaching `patch_function`. " | ||
| "This is the regression the test guards against -- do not " | ||
| "convert this back into a skip without first re-establishing " | ||
| "the marker convention." | ||
| ) | ||
| pytest.skip( | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
On a supported/newer transformers install that does contain the MoE classes, if the patch marker convention changes (or all relevant patch functions fail before recording the Useful? React with 👍 / 👎. |
||
| "No patched MoE classes discovered AND no class anywhere in " | ||
| "transformers.models.* declares `gate_up_proj` + `down_proj` as " | ||
| "3D `nn.Parameter`s. The grouped-MoE patch surface this test " | ||
| "guards is absent on this transformers version; cells running a " | ||
| "newer transformers exercise the real assertion." | ||
| ) | ||
|
Comment on lines
+305
to
+360
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment and skip message refer to the if not patched:
# Two situations produce zero discoveries:
# 1. The transformers version installed predates the MoE class
# surface unsloth_zoo patches (e.g. transformers 4.57.6 ships
# none of the gemma4 / llama4 / qwen3_moe / mxfp4 classes the
# TEMPORARY_PATCHES list targets, so every patch fn no-ops).
# 2. The patching marker convention (e.g. _original_..._forward)
# drifted (the regression this test guards against).
# We can't tell (1) and (2) apart by discovery alone. Skip rather
# than fail: a CI cell pinned to an older transformers should not
# surface as a red regression here. CI cells running the
# transformers version the PR actually pins (and any newer one)
# will discover patched classes and run the real assertion.
pytest.skip(
"No patched MoE classes discovered on this transformers version. "
"Either transformers is older than the MoE class surface "
"unsloth_zoo patches, or the patching marker convention "
"changed. The latter is a real regression but "
"indistinguishable from the former here; skip and let cells "
"running newer transformers exercise the real assertion."
)
Comment on lines
+305
to
+360
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replacing the assertion with an unconditional skip when no patched classes are found introduces a risk of silent test failure. If the discovery mechanism (e.g., the To distinguish between an old if not patched:
# To distinguish between "old transformers" (skip) and "broken discovery" (fail),
# check if a known MoE module exists. If it does, discovery should have found it.
try:
importlib.import_module("transformers.models.gemma4.modeling_gemma4")
assert patched, (
"Discovery produced zero patched MoE classes, but Gemma-4 was found. "
"This indicates the discovery logic or marker convention is broken."
)
except ImportError:
pass
pytest.skip(
"No patched MoE classes discovered. This is expected on older "
"transformers versions where the target MoE classes do not exist."
) |
||
|
|
||
| # Hard contract: extractor must be registered on every patched class. | ||
| missing = [ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In older transformers cells where none of the current
TEMPORARY_PATCHEStargets import, this canary still treats any grouped-MoE-looking class as proof that the patch surface exists. For example,transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeExpertshas the samegate_up_proj/down_proj3Dnn.Parameterdeclarations in 4.57.x, but this repo has noqwen2_moetemporary patch target, sopatchedremains empty and this branch raises instead of performing the intended skip. Please limit the canary to classes/modules actually targeted byTEMPORARY_PATCHES(or otherwise distinguish untargeted MoE families) so old-transformers CI cells do not fail falsely.Useful? React with 👍 / 👎.