[tests] [studio] Fix VLM detection for transformers v5 by danielhanchen · Pull Request #11 · unslothai/unsloth-staging-1

danielhanchen · 2026-04-06T09:50:22Z

Staging mirror of unslothai#4868

Original PR: unslothai#4868
Author: Datta0

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.

Original description

Fixes: unslothai#4859

This PR contains test changes only (1 files). Code changes are in a separate PR.

Test files:

studio/backend/tests/test_transformers_version.py

for more information, see https://pre-commit.ci

…guard for PR unslothai#5754 Round 9 reviewer flagged a pile of handoff asymmetries: every GPU-owning lifecycle change (training, export, chat, images) needed its own bespoke unload sequence and they had drifted out of sync. Some skipped llama-server is_active; some missed safetensors loading_models; export and training did not check is_export_active. Backend handoff (P1) * routes/inference.py: new _release_chat_for / _release_export_for helpers. Both treat llama-server as held when is_loaded OR is_active, safetensors as held when active_model_name OR loading_models is non-empty, and export as held when current_checkpoint OR is_export_active. Both helpers run their unloads in worker threads so async routes do not block the event loop. * routes/training.py: replaces its bespoke inline llama / safe / export unload sequence with await _release_chat_for / _release_ export_for. * routes/export.py: same swap for the chat unload chain (export still does NOT call _release_export_for on itself). * routes/inference.py GGUF + standard chat-load paths: now use _release_export_for to drop a settled export, and the standard path's llama unload now also handles is_active=True (round 9 review #8). Backend reject-on-active export (P1 #5) * routes/inference.py: new _raise_if_export_active. Symmetric with _raise_if_training_active: a long-running export is refused with HTTP 409 instead of being silently killed when /images/load or /load arrives. Diffusion / images load and both chat-load paths call it. * core/inference/diffusion.py _release_other_gpu_owners_for_ diffusion: no longer tears down an in-flight export job. Only drops a SETTLED export checkpoint (current_checkpoint populated, is_export_active False). Round 9 review #5 -- the previous behavior could terminate an in-flight export and leave a partial output artifact. Token leak via logger.exception (P1 #6) * core/inference/diffusion.py: load-failure logging now uses logger.error(..., exc_msg) with the already-scrubbed string and exc_info=False. logger.exception() with the raw Exception would expose any hf_... token that diffusers / huggingface_hub embedded in the message or traceback locals, defeating the earlier in-flight scrub. Dependency pinning (P1 #11) * pyproject.toml: huggingfacenotorch optional extra now pins diffusers>=0.37.0. Previously the floor was only set in studio/backend/requirements/no-torch-runtime.txt, so a normal pip install would resolve diffusers 0.36.0 (no Flux2KleinPipeline) and the default curated FLUX.2 klein Images model would fail at runtime. Cache-delete exact match (P1 #14) * routes/models.py /delete-cached: llama.cpp and safetensors guards now match on exact repo-id (case-insensitive) instead of prefix. Diffusion guard already does this; the chat guards were the remaining surface where loading org/model-v2 blocked deleting org/model.

P1 #1 + #2 + #6: extended the chat / diffusion / training identifier hardening to every export-side request model. ExportCommonOptions (parent of ExportMergedModelRequest / ExportBaseModelRequest / ExportLoRAAdapterRequest) now applies _no_control_chars and _reject_embedded_hf_token to repo_id and base_model_id; ExportGGUFRequest gets the same on its repo_id plus a control-char check on quantization_method; and LoadCheckpointRequest validates checkpoint_path. Previously "/api/export/*" accepted newline-smuggled identifiers and URL-form ``hf_xxxxx`` tokens that flowed into log lines. P1 #3 + #4: ``_run_with_helper`` and ``_run_multi_pass_advisor`` now use a shared ``_gpu_workload_busy_for_helper`` that gates on diffusion (round 22 already), training, AND export. The round 22 guard only checked diffusion, so the dataset helper / advisor could still load llama-server on top of an active training run or a resident export checkpoint. Each step fails closed (unverifiable status counts as busy) so the user's primary workload is preserved. P1 #5: PublishDatasetRequest in models/data_recipe.py also applies the identifier hardening to repo_id; the publish path previously accepted control characters and URL-form tokens. P1 #7-10: added _validate_logged_identifier helper to routes/models.py and applied it to the path / query parameter endpoints that flow into logger.info(...) calls -- ``/config/{model_name}``, ``/check-vision/{model_name}``, ``/check-embedding/{model_name}``, ``/gguf-variants``. Mapped the validator's ValueError to HTTP 422 so the client sees the same shape as a Pydantic validation failure. P2 #11 + #12: ``Loading diffusion model %s`` and ``Diffusion load failed for %s`` log lines route ``repo_id`` / ``effective_base`` through ``_display_repo_id`` (collapses absolute local paths to the leaf, still scrubs HF tokens) instead of plain ``_redact_hf_tokens``. The error path was already collapsed in the user-facing 400 / RuntimeError, but the structured-log lines kept the full path. All 97 diffusion + training-validation + related tests pass locally.

P1 #1: ``_gpu_workload_busy_for_helper`` in ``utils/datasets/llm_assist.py`` now also gates on the GGUF chat backend (llama-server) AND the safetensors chat backend. Round 23 extended it to training + export but missed Chat, so a helper / advisor GGUF could still race a loaded chat model for VRAM. Both checks fail closed when status is unverifiable. P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff unloads so the diffusion release runs BEFORE the chat releases. A wedged diffusion unload used to fire AFTER chat was already gone, so the user lost both on a single failure. Drop chat last so an earlier failure preserves it. Applied to ``/training/start`` (training.py), ``/export/load`` (export.py), ``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch (routes/inference.py). P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens ``model_path`` and ``gguf_variant`` via the shared ``_validate_logged_identifier`` helper, so control characters and URL-form HF tokens can no longer log-line-smuggle. P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and ``variant`` the same way. P1 #9: ``/download-progress`` ``repo_id`` query parameter is also hardened; the value flows into log lines deep inside ``_get_repo_size_cached`` on lookup failure. P1 #11: ``CheckFormatRequest.dataset_name`` and ``AiAssistMappingRequest.{dataset_name, model_name}`` in ``models/datasets.py`` now apply the same control-char + embedded-HF-token validators, matching every other public request-body model. All 115 diffusion + training-validation + cached_gguf + export + inference model-validation tests pass locally. (P1 #6 native-path-lease enforcement for diffusion local paths and P1 #12 React Compiler frontend lint deferred -- both need focused design / frontend touchups separate from this batch.)

P1 #1 + #2 + #6: extended the chat / diffusion / training identifier hardening to every export-side request model. ExportCommonOptions (parent of ExportMergedModelRequest / ExportBaseModelRequest / ExportLoRAAdapterRequest) now applies _no_control_chars and _reject_embedded_hf_token to repo_id and base_model_id; ExportGGUFRequest gets the same on its repo_id plus a control-char check on quantization_method; and LoadCheckpointRequest validates checkpoint_path. Previously "/api/export/*" accepted newline-smuggled identifiers and URL-form ``hf_xxxxx`` tokens that flowed into log lines. P1 #3 + #4: ``_run_with_helper`` and ``_run_multi_pass_advisor`` now use a shared ``_gpu_workload_busy_for_helper`` that gates on diffusion (round 22 already), training, AND export. The round 22 guard only checked diffusion, so the dataset helper / advisor could still load llama-server on top of an active training run or a resident export checkpoint. Each step fails closed (unverifiable status counts as busy) so the user's primary workload is preserved. P1 #5: PublishDatasetRequest in models/data_recipe.py also applies the identifier hardening to repo_id; the publish path previously accepted control characters and URL-form tokens. P1 #7-10: added _validate_logged_identifier helper to routes/models.py and applied it to the path / query parameter endpoints that flow into logger.info(...) calls -- ``/config/{model_name}``, ``/check-vision/{model_name}``, ``/check-embedding/{model_name}``, ``/gguf-variants``. Mapped the validator's ValueError to HTTP 422 so the client sees the same shape as a Pydantic validation failure. P2 #11 + #12: ``Loading diffusion model %s`` and ``Diffusion load failed for %s`` log lines route ``repo_id`` / ``effective_base`` through ``_display_repo_id`` (collapses absolute local paths to the leaf, still scrubs HF tokens) instead of plain ``_redact_hf_tokens``. The error path was already collapsed in the user-facing 400 / RuntimeError, but the structured-log lines kept the full path. All 97 diffusion + training-validation + related tests pass locally.

P1 #1: ``_gpu_workload_busy_for_helper`` in ``utils/datasets/llm_assist.py`` now also gates on the GGUF chat backend (llama-server) AND the safetensors chat backend. Round 23 extended it to training + export but missed Chat, so a helper / advisor GGUF could still race a loaded chat model for VRAM. Both checks fail closed when status is unverifiable. P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff unloads so the diffusion release runs BEFORE the chat releases. A wedged diffusion unload used to fire AFTER chat was already gone, so the user lost both on a single failure. Drop chat last so an earlier failure preserves it. Applied to ``/training/start`` (training.py), ``/export/load`` (export.py), ``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch (routes/inference.py). P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens ``model_path`` and ``gguf_variant`` via the shared ``_validate_logged_identifier`` helper, so control characters and URL-form HF tokens can no longer log-line-smuggle. P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and ``variant`` the same way. P1 #9: ``/download-progress`` ``repo_id`` query parameter is also hardened; the value flows into log lines deep inside ``_get_repo_size_cached`` on lookup failure. P1 #11: ``CheckFormatRequest.dataset_name`` and ``AiAssistMappingRequest.{dataset_name, model_name}`` in ``models/datasets.py`` now apply the same control-char + embedded-HF-token validators, matching every other public request-body model. All 115 diffusion + training-validation + cached_gguf + export + inference model-validation tests pass locally. (P1 #6 native-path-lease enforcement for diffusion local paths and P1 #12 React Compiler frontend lint deferred -- both need focused design / frontend touchups separate from this batch.)

Twelve P1 findings from round 26 reviewer aggregate, plus the CI revert of round 25 P1 #5 to a less invasive location. 1. requirements/studio.txt + requirements/single-env/constraints.txt: revert the round 25 huggingface-hub bump (broke Studio Update CI, Mac Studio Update CI, Mac Studio UI CI, Studio UI CI all with ResolutionImpossible against transformers==4.57.6 which requires hub<1.0). Standard install path stays on the well-tested 4.57.6 + 0.36.2 + trl 0.23.1 trio. 2. requirements/no-torch-runtime.txt + pyproject.toml [huggingfacenotorch]: bump huggingface_hub floor from >=0.34.0 to >=1.3.0,<2.0 -- this is where the actual transformers 5.x + hub 0.36.2 broken combo can land because the file installs --no-deps. transformers 5.x calls hub.is_offline_mode which only exists in hub 1.x. 3. utils/datasets/llm_assist.py: revert round 25 P1 #4 (helper/advisor sharing the global llama backend) which introduced three regressions: a chat-evict load race after the busy precheck, a finally-block that could unload a user chat model, and an identifier mismatch the delete guard could not canonicalize. Go back to PRIVATE LlamaCppBackend instances and expose the active helper/advisor repos through a new thread-safe registry (helper_advisor_owns_repo / _register_helper_advisor_repo / _unregister_helper_advisor_repo) so DELETE /api/models/delete-cached can still block the rmtree. 4. routes/models.py delete_cached_model: check the new helper/advisor registry up front and 409 if a helper/advisor still owns the target repo. Closes round 26 P1 #13 and #14 (helper/advisor identifiers were prefixed and would never equal the raw repo id). 5. routes/models.py get_lora_base_model: validate lora_path with _validate_logged_identifier before it is reflected in 404 detail and error logs (round 26 P1 #12). 6. routes/inference.py /unload: round 21 P1 #3 added a "or not is_loaded" fallback that let an unload of owner/B cancel a pending llama load of owner/A. Replace it with a narrow llama_is_starting_without_identifier branch that only fires when llama-server is mid-startup with neither identifier set (round 26 P1 #5). 7. routes/inference.py /unload: poll loading_model_identifier for up to 5 s after asyncio.to_thread(unload_model) so a legitimate pending-load cancel does not 503 because the load thread has not yet observed _cancel_event in its finally (round 26 P2 #15). 8. models/training.py TrainingStartRequest: extend identifier hardening to hf_dataset, subset, train_split, eval_split. Round 22 only guarded model_name (round 26 P1 #10). 9. models/data_recipe.py SeedInspectRequest: add _no_control_chars + _reject_embedded_hf_token field_validators on dataset_name (round 26 P1 #11). Tests: 105 targeted (diffusion + cached_gguf + llama_cpp_cache + inference_model_validation + models_get_model_config) and 1768 broader backend tests pass locally. Pre-existing test_desktop_auth.py, test_studio_api.py, and test_training_worker_flash_attn.py failures reproduce on HEAD without these changes.

Four actionable findings from round 30. Skipped P1 #1 / #2 / #3 (huggingface-hub bump in studio.txt / single-env / colab-new) because the live B200 Studio that successfully generated FLUX.2 klein images runs the exact combo the reviewer flags as broken: huggingface_hub 0.36.2 + transformers 4.57.6 + diffusers 0.37.1 Flux2KleinPipeline: True (imports cleanly) The is_offline_mode ImportError only fires with transformers 5.x, and the standard install path pins transformers==4.57.6 via constraints. The round 26 fix bumped no-torch-runtime.txt + pyproject huggingfacenotorch where the --no-deps install path can land on transformers 5.x; that remains the correct surface. 1. core/inference/diffusion.py: preflight transformers + accelerate via importlib.util.find_spec BEFORE any destructive GPU-owner unload. Diffusers can expose stub pipeline classes when transformers / accelerate are missing, so the load used to drop chat first and fail later inside from_pretrained. find_spec keeps existing tests that stub these modules passing because no real module is executed (round 30 P1 #11). 2. models/export.py ExportGGUFRequest.quantization_method: extend the embedded HF token validator to this field too. Round 23 added the control-char guard but not the token guard; the value is forwarded into worker command lines and reflected in error / success text (round 30 P1 #5). 3. models/data_recipe.py SeedInspectUploadRequest: add _no_control_chars + _reject_embedded_hf_token field_validators to filename and to each entry of file_names. Mirrors the sibling SeedInspectRequest.dataset_name hardening (round 30 P1 #6). 4. frontend/src/features/images/images-page.tsx: defer the initial refreshStatus() call via queueMicrotask so the synchronous setRefreshingStatus(true) inside it does not trip the react-hooks/set-state-in-effect lint on mount (round 30 P2 #12). Deferred (need larger surgery / out of scope for this round): P1 #4 native_path_lease for diffusion local-path loads P1 #7-#10 helper/advisor + public-start window mutual lock symmetry Tests: 98 targeted (diffusion + cached_gguf + inference_validation) pass locally; frontend npm run typecheck passes.

Two round-33 reviewer findings: hub-floor consistency and the multipart upload filename validator gap. Dependencies: reverted the round-26 huggingface_hub>=1.3.0 floor in no-torch-runtime.txt and pyproject.toml (round 33 P1 #1-#5, 4/12 vote consensus). studio.txt forces huggingface_hub==0.36.2 to match the transformers==4.57.6 pin in extras-no-deps.txt, so the 1.3.0 floor was internally inconsistent. Reviewers reproduced the resolver conflict on a fresh install. Empirical justification (re-verified on the live B200 host before the revert): huggingface_hub 0.36.2 + transformers 4.57.6 + diffusers 0.37.1 imports Flux2KleinPipeline cleanly and runs end-to-end image generation. transformers 4.57.6 carries its own transformers.utils.hub.is_offline_mode and does not actually need huggingface_hub.is_offline_mode at import time. The original bump was guarding against the (never-realised) transformers 5.x path, which extras-no-deps explicitly pins away. Validation: multipart /seed/upload-unstructured-file now applies the same _no_control_chars and _reject_embedded_hf_token checks to file.filename that SeedInspectUploadRequest.filename already applies in the JSON variant (round 33 P1 #7). The filename is reflected back to the client, persisted in the per-file meta JSON, and echoed by error responses, so the JSON-side hardening must not be asymmetric with the multipart path. Skipped (consistent with prior rounds): * Find_spec vs full import (R33 P1 #6): preserves test compatibility with the huggingface_hub stub fixture. * React hooks set-state-in-effect lint (R33 P1 #8): codebase has 146 pre-existing violations of the same rule; studio-frontend-ci does not gate on lint. * Direct DiffusionBackend.load_model bypass (R33 P1 #9): the route is the only production entry point, and the backend helper now publishes its own diffusion-backend pending tag (round 32 P1 #3). Direct-caller hardening would require duplicating the lease check into load_model itself, which is out of scope for the route-layer security boundary. * One-segment Hub IDs (R33 P2 #10): strict 2-segment Hub id check is intentional; one-segment names are not valid Hub ids. * Cwd-relative shadow of Hub IDs (R33 P2 #11): documented side-channel tradeoff accepted in round 31 commit msg. 97 targeted backend tests pass.

Datta0 and others added 3 commits April 6, 2026 05:53

Fix VLM detection for transformers v5

248ca24

[pre-commit.ci] auto fixes from pre-commit.com hooks

fbfa817

for more information, see https://pre-commit.ci

Split: keep only 1 file(s)

12539ee

danielhanchen closed this Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests] [studio] Fix VLM detection for transformers v5#11

[tests] [studio] Fix VLM detection for transformers v5#11
danielhanchen wants to merge 3 commits into
mainfrom
pr-4868-tests

danielhanchen commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielhanchen commented Apr 6, 2026

Staging mirror of unslothai#4868

Original description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants