tests: drift detector parity with unsloth-zoo (fix Core matrix RED on triton + vllm)#5421
Conversation
Two gaps surfaced when running tests/test_import_fixes_drift.py on a
fresh main install (transformers 4.57.6, trl 0.25.1, peft 0.19.1,
triton 3.5.1, vllm 0.15.1):
* triton_compiled_kernel test predicate was strict: only accepted
a class-level num_ctas. fix_triton_compiled_kernel_missing_attrs
installs the attrs via a wrapped __init__ (the post-3.6 shape),
so the detector fired DRIFT DETECTED even with the fix correctly
applied. Relax to also accept the wrapped-__init__ signature
(closure freevars / co_names probe). Mirrors zoo's already-relaxed
predicate (unsloth-zoo PR #639).
* tests/conftest.py applied ONLY the peft transformers_weight_conversion
stub fix via file-path loading. fix_vllm_guided_decoding_params /
fix_triton_compiled_kernel_missing_attrs / etc. never ran inside the
test process, so the corresponding drift detectors probed an
unpatched runtime state and pytest.fail'd. Replace the surgical
file-path loader with a guarded import unsloth (the GPU-free
harness above already pre-spoofs the device-type chain), so the
full import_fixes.py pass applies before pytest collects. Mirrors
unsloth-zoo's conftest pattern.
Local verification on transformers 4.57.6 + trl 0.25.1 + peft 0.19.1
+ triton 3.5.1 + vllm 0.15.1+cu130:
before: 16 passed, 2 failed (triton + vllm DRIFT DETECTED)
after: 18 passed, 0 failed
There was a problem hiding this comment.
Code Review
This pull request simplifies the test setup by importing unsloth directly to trigger upstream fixes and updates the Triton drift detection to inspect __init__ bytecode. Reviewers noted a potential AttributeError in the bytecode inspection and recommended narrowing the broad exception handler in the import logic to ModuleNotFoundError while adding logging.
| code = getattr(init, "__code__", None) | ||
| freevars = set(getattr(code, "co_freevars", ()) or ()) | ||
| co_names = set(getattr(code, "co_names", ()) or ()) | ||
| if "_orig_init" in freevars or {"num_ctas", "cluster_dims"}.issubset( | ||
| co_names | ||
| ): | ||
| return |
There was a problem hiding this comment.
| except Exception: | ||
| return | ||
| if pkg_spec is None or not pkg_spec.submodule_search_locations: | ||
| return | ||
| fix_path = os.path.join( | ||
| pkg_spec.submodule_search_locations[0], | ||
| "import_fixes.py", | ||
| ) | ||
| if not os.path.exists(fix_path): | ||
| return | ||
|
|
||
| mod_name = "unsloth.import_fixes" | ||
| _installed_skeleton = False | ||
| if mod_name in sys.modules: | ||
| mod = sys.modules[mod_name] | ||
| else: | ||
| # Submodule import needs SOME parent ``unsloth`` entry; reuse or | ||
| # install a bare skeleton and pop on exit so later ``import unsloth`` | ||
| # calls hit the real package init. | ||
| if "unsloth" not in sys.modules: | ||
| pkg = types.ModuleType("unsloth") | ||
| pkg.__path__ = list(pkg_spec.submodule_search_locations) | ||
| pkg.__spec__ = pkg_spec | ||
| pkg.__package__ = "unsloth" | ||
| pkg.__file__ = os.path.join( | ||
| pkg_spec.submodule_search_locations[0], | ||
| "__init__.py", | ||
| ) | ||
| sys.modules["unsloth"] = pkg | ||
| _installed_skeleton = True | ||
| spec = _ilu.spec_from_file_location(mod_name, fix_path) | ||
| if spec is None or spec.loader is None: | ||
| if _installed_skeleton: | ||
| sys.modules.pop("unsloth", None) | ||
| return | ||
| mod = _ilu.module_from_spec(spec) | ||
| sys.modules[mod_name] = mod | ||
| try: | ||
| spec.loader.exec_module(mod) | ||
| except Exception: | ||
| sys.modules.pop(mod_name, None) | ||
| if _installed_skeleton: | ||
| sys.modules.pop("unsloth", None) | ||
| return | ||
|
|
||
| fix = getattr(mod, "fix_peft_transformers_weight_conversion_import", None) | ||
| if fix is None: | ||
| if _installed_skeleton: | ||
| sys.modules.pop("unsloth", None) | ||
| return | ||
| try: | ||
| fix() | ||
| except Exception: | ||
| # Individual fix is internally guarded; don't take pytest down. | ||
| pass |
There was a problem hiding this comment.
Using a broad, silent exception handler (except Exception: pass) can mask unexpected issues such as syntax errors or logic bugs within the unsloth package initialization. Since the goal is to support environments where unsloth is not installed, catching ModuleNotFoundError specifically and checking the module name is more appropriate. Additionally, per the general rules, logging the exception at a debug level would aid in troubleshooting if the import fails for other reasons.
References
- Avoid using broad, silent exception handlers like
except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging. - When catching an
ImportErrorfor an optional dependency, prefer catching the more specificModuleNotFoundErrorand check the module name to avoid suppressing unrelated import errors.
…ai#5423) PR unslothai#5414's drift detectors and the corresponding import_fixes helpers were written against transformers 4.x. The Repo tests (CPU) step (which installs transformers>=4.51,<5.5 and currently resolves to 5.4.0) surfaces three real predicate gaps: * test_pretrained_model_enable_input_require_grads_uses_old_pattern fires DRIFT DETECTED whenever the source contains "for module in self.modules()". But the unsloth replacement that patch_enable_input_require_grads installs ALSO uses that pattern -- deliberately, just wrapped in try / except NotImplementedError. So the predicate cannot distinguish broken upstream from the working patch. Accept either pre-HF#41993 shape (no self.modules() loop) or the post-patch shape (loop + NotImplementedError handler). * test_transformers_torchcodec_available_flag_is_present asserts the pre-5.x module-level _torchcodec_available flag. transformers 5 replaced it with an lru_cache'd is_torchcodec_available() callable. Accept either symbol. Also update disable_torchcodec_if_broken to actually disable on 5.x: clear the cache and rebind the function to return False. Local verification: * transformers 4.57.6 + trl 0.25.1 + peft 0.19.1 + triton 3.5.1 + vllm 0.15.1+cu130 (the Core HF=4.57.6 cell shape): 18 passed. * transformers 5.8.1 + peft 0.19.1 + torch 2.9 CPU (the Repo tests (CPU) shape, drop trl / vllm / datasets / xformers): 12 passed, 6 skipped on missing optional libs, 0 failed. PR unslothai#5376's Repo tests (CPU) failure was a triple: - triton + enable_input_require_grads: fixed by merging current main (PR unslothai#5421's relaxed triton predicate + conftest 'import unsloth'). - torchcodec: fixed by THIS PR.
Summary
Closes two remaining DRIFT DETECTED failures on the Core matrix that PR #5414 + #5416 didn't fully close. After this PR all 18 drift detectors pass on a fresh main install (verified locally on transformers 4.57.6 + trl 0.25.1 + peft 0.19.1 + triton 3.5.1 + vllm 0.15.1+cu130).
Two gaps
1. triton predicate too strict
The unsloth detector for
fix_triton_compiled_kernel_missing_attrsonly accepts a class-levelnum_ctas. But the fix installs the attrs via a wrapped__init__(the post-3.6 shape) — so the detector firesDRIFT DETECTEDeven with the fix correctly applied. Relax to also accept the wrapped-__init__signature (closure freevars / co_names probe). Mirrors the same relaxation already on unsloth-zoo (unsloth-zoo PR #639).2. tests/conftest.py only applied the peft fix
The conftest applied ONLY
fix_peft_transformers_weight_conversion_import(via surgical file-path loading).fix_vllm_guided_decoding_params/fix_triton_compiled_kernel_missing_attrs/ etc. never ran inside the test process, so the corresponding drift detectors probed an unpatched runtime state and pytest.fail'd. Replace the surgical loader with a guardedimport unsloth(the GPU-free harness above already pre-spoofs the device-type chain via_preload_device_type+_patch_torch_cuda_for_import, soimport unslothsurvives on a CPU-only runner). Mirrors unsloth-zoo's conftest pattern.Local verification
Test plan
pytest tests/test_import_fixes_drift.pyshows 18/18 pass.import_fixes drift detectors (18 tests, HARD GATE)step.