Skip to content

tests: contain security-conftest network block; fix stale mlx paths; skip GPU import in trainer-exec-marker#648

Merged
danielhanchen merged 2 commits into
mainfrom
sec/zoo-ci-harness-fixes
May 15, 2026
Merged

tests: contain security-conftest network block; fix stale mlx paths; skip GPU import in trainer-exec-marker#648
danielhanchen merged 2 commits into
mainfrom
sec/zoo-ci-harness-fixes

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

Three CI harness bugs surfaced once the unsloth-side matrix started running zoo's full pytest tree on every PR (the unsloth_zoo @ main - full pytest (CPU) step on unslothai/unsloth PR #5428). They account for ~50 of the 61 failures on that step. Pure harness fixes, no production code touched.

1. tests/security/conftest.py - contain the network-block scope

The network_blocker fixture was scope="session" + autouse=True and monkey-patched the module-global socket.socket. Once any security test ran, the patch stayed live for every subsequent test in the same pytest session, silently breaking every non-security test that needed the network. test_upstream_pinned_symbols_transformers alone has ~40 parametrised cases that fetch HF modeling source over HTTPS; each one failed with network access blocked by tests/security/conftest.py.

Drop the scope to function: setup/teardown happens per-test, cost is ~10us, and the blast radius is contained to the security tree only.

2. tests/test_upstream_pinned_symbols_accelerator.py - fix stale MLX paths

Two tests still read unsloth_zoo/mlx_trainer.py and unsloth_zoo/mlx_loader.py as flat modules. They were promoted to the mlx/ subpackage in e6d8f7f (mlx/trainer.py, mlx/loader.py); the tests have been failing with FileNotFoundError ever since. Accept either layout so the test survives the rename and still pins the deprecated-API regression both originally guarded.

3. tests/test_compiler_rewriter_exhaustive.py::test_unsloth_trainer_exec_marker - handle no-accelerator import

Did pytest.importorskip("unsloth") then a plain import unsloth.trainer. On a CPU-only CI runner without zoo's GPU-spoof harness applied, import unsloth raises NotImplementedError("Unsloth cannot find any torch accelerator? You need a GPU.") -- neither importorskip nor the existing ImportError branch handle that. Convert that exact exception to pytest.skip; the GPU cell still exercises the real import.

Verification

Single pytest session interleaving security + previously-blocked transformers tests:

python -m pytest tests/security \
  'tests/test_upstream_pinned_symbols_transformers.py::test_gemma3_modeling_required_classes[v5.5.0]' \
  'tests/test_upstream_pinned_symbols_transformers.py::test_ministral_attention_module_present[v5.5.0]' \
  -v
# 17 passed in 0.90s

Pre-fix the two transformers tests failed with the leaked network block; post-fix they pass.

Out of scope

This PR does not touch the remaining ~10 failures in the same CI step that are legitimate DRIFT DETECTED signals from test_temporary_patches_exhaustive.py and test_compiler_rewriter_exhaustive.py firing on transformers 5.5.0 (the pyproject's <=5.5.0 cap floats default to 5.5.0). Those need either zoo-side patch updates for 5.5.0 compat or a narrower pyproject upper bound -- the drift signal itself is working as designed and shouldn't be muted.

Test plan

  • python -m pytest tests/security -q -- 15 passed
  • python -m pytest tests/test_upstream_pinned_symbols_accelerator.py::test_mlx_trainer_uses_modern_memory_apis_only tests/test_upstream_pinned_symbols_accelerator.py::test_get_existing_mlx_quantization_detects_both_keys -v -- 2 passed
  • Single-session interleave above -- 17 passed

…+ skip GPU import in trainer-exec-marker

Three CI harness bugs surfaced once the unsloth-side matrix started
running zoo's full pytest tree on every PR (the
"unsloth_zoo @ main - full pytest (CPU)" step on unslothai/unsloth
PR #5428).

1. tests/security/conftest.py
   The network_blocker fixture was scope="session" + autouse=True and
   monkey-patched the module-global socket.socket. Once any security
   test ran, the patch stayed live for every subsequent test in the
   same pytest session - silently breaking every non-security test
   that needed the network (test_upstream_pinned_symbols_transformers
   alone has ~40 parametrised cases that fetch HF modeling source
   over HTTPS). Drop the scope to "function" so setup/teardown happens
   per-test; cost is ~10us and blast radius is contained to the
   security tree only.

2. tests/test_upstream_pinned_symbols_accelerator.py
   Two tests read unsloth_zoo/mlx_trainer.py and unsloth_zoo/mlx_loader.py
   as flat modules. They were promoted to the mlx/ subpackage in
   e6d8f7f (mlx/trainer.py, mlx/loader.py); the tests have been
   failing with FileNotFoundError ever since. Accept either layout so
   the test survives the rename and still pins the deprecated-API
   regression both originally guarded.

3. tests/test_compiler_rewriter_exhaustive.py::test_unsloth_trainer_exec_marker
   Did pytest.importorskip("unsloth") then a plain import unsloth.trainer.
   On a CPU-only CI runner without zoo's GPU-spoof harness applied,
   import unsloth raises NotImplementedError("Unsloth cannot find any
   torch accelerator? You need a GPU.") - which neither importorskip
   nor the existing ImportError branch handle. Convert that exact
   exception to a pytest.skip; the GPU cell still exercises the real
   import.

Local validation (single pytest session, proves the leak is gone):
  python -m pytest tests/security \
    'tests/test_upstream_pinned_symbols_transformers.py::test_gemma3_modeling_required_classes[v5.5.0]' \
    'tests/test_upstream_pinned_symbols_transformers.py::test_ministral_attention_module_present[v5.5.0]' \
    -v
  -> 17 passed in 0.90s

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2eb71f12b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2057 to +2059
except NotImplementedError as e:
if "accelerator" in str(e) or "GPU" in str(e):
pytest.skip(f"No accelerator visible to unsloth import: {e}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move no-accelerator catch around importorskip

In the CPU-only CI case described here, import unsloth is executed by pytest.importorskip("unsloth") before this try block, and pytest only converts import failures into skips. When unsloth is installed but raises NotImplementedError because no accelerator is visible, the exception escapes before this new handler runs, so the test still fails instead of skipping; wrap the importorskip call or perform the initial import inside this handler.

Useful? React with 👍 / 👎.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a regression where session-scoped network blockers interfered with other tests and updates MLX tests to support a new subpackage layout. It also introduces logic to skip unsloth-related tests on CPU-only runners. Feedback was provided regarding the placement of pytest.importorskip in tests/test_compiler_rewriter_exhaustive.py, as it currently sits outside the try block and will cause the test to crash before the skip logic can execute.

Comment on lines 2045 to 2047
pytest.importorskip("unsloth")
try:
import unsloth.trainer as trainer_mod

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The pytest.importorskip("unsloth") call at line 2045 will trigger the top-level import of the unsloth package. On CPU-only environments without a GPU-spoofing harness, this import raises the NotImplementedError that this PR intends to handle. Since this call is currently outside the try...except block, the test will still crash with an unhandled exception before reaching the skip logic. Moving the importorskip call inside the try block ensures the exception is caught and the test is skipped as intended.

Suggested change
pytest.importorskip("unsloth")
try:
import unsloth.trainer as trainer_mod
try:
pytest.importorskip("unsloth")
import unsloth.trainer as trainer_mod

…ortorskip path

Two follow-up fixes after the first push:

1. .github/workflows/consolidated-tests-ci.yml
   The Core matrix step installed torch but not torchvision. transformers
   chains torchvision in at module top for the qwen2_vl / qwen2_5_vl
   image processors, so the drift-existence probe in
   test_zoo_source_upstream_refs fired RED on
   "ModuleNotFoundError: No module named 'torchvision'". The drift
   message itself says "Either install the dep in CI or remove the zoo
   reference" -- installing is the right move since zoo legitimately
   references the image processor. CPU build only.

2. tests/test_compiler_rewriter_exhaustive.py::test_unsloth_trainer_exec_marker
   First pass wrapped the post-importorskip `import unsloth.trainer`
   call. But unsloth.__init__ raises NotImplementedError at line 144
   (`from ._gpu_init import *`), which means importorskip itself
   propagates the exception -- it only converts ImportError to skip.
   Move both `import unsloth` and `import unsloth.trainer` inside the
   try-block so the NotImplementedError branch catches both.

Local verification:
  python -m pytest tests/test_compiler_rewriter_exhaustive.py::test_unsloth_trainer_exec_marker -v
  -> 1 passed
@danielhanchen danielhanchen merged commit 9fe4396 into main May 15, 2026
17 of 19 checks passed
@danielhanchen danielhanchen deleted the sec/zoo-ci-harness-fixes branch May 15, 2026 04:15
danielhanchen added a commit that referenced this pull request May 15, 2026
* fix CI fallout from MLX subpackage refactor (#634)

Three small fixes triggered by tests on PRs #645 / unsloth#5426:

1) tests/test_upstream_pinned_symbols_accelerator.py read the legacy
   flat unsloth_zoo/mlx_trainer.py and mlx_loader.py paths directly.
   #634 moved those to unsloth_zoo/mlx/trainer.py and mlx/loader.py;
   point the tests at the new paths so they exercise the post-refactor
   sources.

2) unsloth_zoo/vision_utils.py:68 did an unconditional top-level
   import torchvision while torchvision is not a declared dependency.
   On a CPU-only install that lacks torchvision (e.g. zoo's Core CI
   matrix), every import of unsloth_zoo.vision_utils raises
   ModuleNotFoundError. Guard the import so the module loads; existing
   _read_video_torchvision call sites already only fire when video
   reading is requested.

3) tests/test_zoo_source_upstream_refs.py tries to resolve transformers
   qwen2_vl / qwen2_5_vl image-processing modules, both of which
   transitively `import torchvision`. Skip via pytest.importorskip
   when torchvision is not installed instead of failing the run.

Each fix is independent and minimal; no behavior change for installs
that have torchvision (the eager import path is identical), and no
behavior change for the MLX trainer / loader (the tests still pin
the exact same source-level invariants on the new file paths).

* tests/conftest: also stub unsloth.device_type on CPU-only runners

The post-#634 conftest stubs ``unsloth_zoo.device_type`` so importing
the package on a CPU-only CI runner doesn't trip ``get_device_type()``.
``test_unsloth_trainer_exec_marker`` (and any future test that does
``import unsloth.*``) goes one level up: it imports ``unsloth.trainer``,
which traverses ``unsloth/__init__.py`` → ``_gpu_init`` → ``unsloth/
device_type.py:get_device_type()``, raising the same NotImplementedError
zoo's stub was meant to suppress.

Parameterise ``_preload_real_device_type(package, prereqs)`` so the
same harness works for both packages, factor the fallback into
``_install_device_type_stub(name)``, and call both. unsloth's
``get_device_type()`` consumes ``unsloth_zoo`` helpers but has no
intra-unsloth prereqs, hence ``prereqs=()`` there.

* tests/conftest: don't stub unsloth.device_type after all

The previous commit (196a78b) stubbed unsloth.device_type so
test_unsloth_trainer_exec_marker could complete its
``import unsloth.trainer`` smoke. That makes ``import unsloth`` succeed
on CPU-only CI, which then runs
``unsloth/_gpu_init.py:_patch_trl_trainer()`` and rebinds
``trl.trainer.sft_trainer.SFTTrainer`` /
``transformers.models.ministral.MinistralAttention`` to Unsloth's
compiled wrappers.

``inspect.getsource(...)`` on those classes follows the wrapper's
``__init__.__code__.co_filename`` and returns the rewritten source --
which doesn't contain the literals the drift detectors anchor on
(``self._signature_columns``, ``self._prepare_dataset(``,
``class SFTTrainer``, named ``hidden_states``/``attention_mask``).
The HF=4.57.6 job went from -1 pre-existing failure
(``test_unsloth_trainer_exec_marker``, also failing on main) to -4 new
failures across ``test_MinistralAttention_forward_signature`` and the
three ``test_unsloth_rl_trainer_*`` checks. That's a regression.

Keep parameterised ``_preload_real_device_type`` and
``_install_device_type_stub`` -- they're harmless infrastructure and
useful if a future test needs the unsloth.device_type stub
specifically. Just don't fire the second preload at session setup.

``test_unsloth_trainer_exec_marker`` will continue to fail on
CPU-only CI as it does on main; that's a separate pre-existing issue
with ``unsloth/device_type.py`` raising on accelerator-less hosts and
should be fixed upstream in unsloth itself, not papered over in zoo's
conftest.

* device_type: mirror UNSLOTH_ALLOW_CPU=1 gate from unsloth #5429

unsloth/_gpu_init.py:125 imports DEVICE_TYPE from unsloth_zoo, not
local unsloth/device_type.py, so the env-var gate has to live here
too for `import unsloth.trainer` to succeed on CPU-only CI.

The companion unsloth PR landed as cb15a7a (#5429). Pairs with the
conftest change in this branch that sets UNSLOTH_ALLOW_CPU=1 before
triggering `import unsloth` from _apply_upstream_import_fixes_for_tests.

@functools.cache on get_device_type plus the module-level
DEVICE_TYPE = get_device_type() assignment mean the os.environ.get
runs exactly once per process. Production hosts with a real accelerator
hit the `cuda`/`xpu`/`hip` branches above and never reach the env-var
short-circuit.

* prod: graceful no-op for two transformers 5.x runtime bugs

Two upstream changes broke zoo at runtime (not just CI drift detection):

1. transformers 5.x removed transformers.cache_utils.HybridCache. zoo's
   utils.py falls back to `HybridCache = typing.Any`, which made
   `gemma.py:260 isinstance(past_key_values, HybridCache)` raise
   TypeError ("isinstance() arg 2 must be a type, a tuple of types,
   or a union") at runtime on the gemma mask path.

   Fix: expose HAS_HYBRID_CACHE alongside the existing HybridCache
   fallback in utils.py, and gate the isinstance call in gemma.py:260
   on the flag. When HybridCache is genuinely missing, the elif chain
   falls through to the dynamic-cache branch -- same observable
   behavior as the pre-5.x path that never matched the type.

2. transformers 5.x rewrote PreTrainedModel.save_pretrained. The
   source-string anchors zoo's LoRA-merge-on-save optimization patches
   in `merge_and_dequantize_lora` (state_dict_split assignment,
   `state_dict[tensor].contiguous()`, `def save_pretrained`, and the
   filename_to_tensors for-loop on the push_to_hub path) are all gone.
   Each downstream `raise RuntimeError("Failed to find ...")` would
   fire, killing `merge_and_dequantize_lora(push_to_hub=True)` on 5.x.

   Fix: scan all required anchors upfront in saving_utils.py. If any
   are missing on installed transformers, emit a one-time warnings.warn
   and fall back to vanilla `model.save_pretrained(...)`. The end-user
   sees the warning, no crash, and is told to call
   `model.merge_and_unload()` (or equivalent) before saving if they
   want merged LoRA weights on disk. The per-anchor RuntimeError
   raises downstream are kept as defense-in-depth for partial drift
   (one anchor renamed, others intact) that the upfront check might
   miss.

The two corresponding drift detectors in tests/ are rewritten in the
companion commit as positive assertions: 4.57.6 keeps the strict
existence check; 5.x asserts the anchor is gone AND zoo's fallback
covers it (`HAS_HYBRID_CACHE` flag / `_required_anchors` list).

* tests: version-gate transformers-5.x drift detectors

Sixteen drift detectors in the Core HF=default / HF=latest matrix
slices were pointing at upstream symbols / source strings that
transformers 5.x intentionally removed:

* 9 `temporary_patches/*_for_causal_lm_forward_named_params` --
  cache_position moved from named forward param into
  **kwargs: Unpack[TransformersKwargs] (DeepseekV3, GptOss, CsmDepth,
  Csm, Qwen3Moe, Qwen3Next, Qwen3VLMoe, MinistralModel, GraniteMoeHybrid).
* 4 compiler/source rewriter probes -- output_attentions branching
  removed, MoE routing_weights cast refactored, Trainer
  is_torch_tpu_available/is_torch_xla_available both removed,
  _update_causal_mask hasattr probe all-False.
* 1 routing_weights.to substring probe in compiler_rewriter_exhaustive.
* 1 GptOssConfig dedent-compare -- rope_theta/initial_context_length/
  rope_scaling collapsed into rope_parameters dict.
* 1 GptOssConfig kwarg signature -- same rope_theta rename.

For all 16, the zoo runtime patch already gracefully no-ops on 5.x
(verified by the Explore audit: try/except + relaxed patch_function +
hasattr guards + str.replace's silent no-op-on-miss). Skip the
detector on 5.x with a clear "<symbol> removed on transformers
{version}" message; keep it strict on 4.57.6 where real drift could
still surface.

Two more drift detectors -- test_hybrid_cache_class_present and the
two saving_utils.py pinned-string tests -- become positive assertions:
on 5.x they confirm zoo's prod fix correctly identifies the missing
anchor (HAS_HYBRID_CACHE is False / `_required_anchors` covers the
missing strings) and the graceful fallback is wired up.

Conftest also sets UNSLOTH_ALLOW_CPU=1 before the existing
`import unsloth` trigger, which unblocks `test_unsloth_trainer_exec_marker`
on CPU-only CI runners (the companion unsloth PR #5429 lets that
import succeed without an accelerator).

* tests: also gate test_MinistralAttention_forward_signature on 5.x

Missed this one in 18c9f62. Same root cause as MinistralModel:
transformers 5.x reflowed MinistralAttention.forward, zoo's
``patch_function(..., match_level='relaxed')`` falls back to a
``(self, *args, **kwargs)`` wrapper, the named-param probe sees the
wrapper signature and drift-fails. Runtime call still works because
the wrapper forwards via kwargs. Skip on 5.x, keep strict on 4.57.6.

* tests: probe Ministral stash + skip relaxed-wrapper case

Zoo's ministral.py:94-96 wraps the actual implementation in a generic
``def forward(self, *args, **kwargs): return _full_forward(...)``
adapter before calling patch_function -- this lets `check_args_kwargs`
accept params transformers 5.x removed (cache_position) by routing
them through **kwargs. The side effect: `inspect.signature(
MinistralAttention.forward)` after patching shows the wrapper, not the
real named params.

This test used to pass only because the test runner couldn't fully
trigger zoo's TEMPORARY_PATCHES loop (`unsloth/models/_utils.py:580`
runs at import time but unsloth import was being silently aborted).
With UNSLOTH_ALLOW_CPU=1 now making `import unsloth` succeed, the
patch fires correctly and the live signature is the wrapper.

Fix: probe the `_original_modeling_ministral_MinistralAttention_forward`
stash that `patch_function(store_original=True)` writes, falling back
to the live attribute. If the live attr is the relaxed wrapper and
there's no stash, the named-param probe isn't meaningful -- the
runtime kwargs contract is enforced elsewhere. Skip with a clear
message. 4.57.6: stash is present and the named-param probe still
runs strict; real drift in the upstream signature still surfaces.

* tests: gate qwen2_5_vl image-processor drift on 5.x

transformers 5.x dropped the slow image processors entirely
(image_processing_qwen2_5_vl.py and image_processing_qwen2_5_vl_fast.py
are both gone in v5.5.0). Qwen2.5-VL now reuses Qwen2VLImageProcessor
directly. zoo's misc.py:1500-1506 patch site for Qwen2_5_VLImageProcessor
is try/except ImportError-wrapped, so it silently no-ops on 5.x and the
runtime shim still fires via the Qwen2VLImageProcessor patch at
misc.py:1485-1498 (the class Qwen2.5-VL inherits at runtime).

The Qwen2.5-VL image-processor path was previously hidden by
``pytest.importorskip("torchvision")`` -- PR #648 added torchvision to
the upstream-regression matrix install, which unmasked the drift.
Runtime is unaffected. Add the same 5.x skip pattern used by the other
16 gated detectors so the dead-code patch site doesn't block CI.

4.57.6 keeps the strict path-existence check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant