fix: skip redundant HfFileSystem().glob() calls in loader.py by danielhanchen · Pull Request #13 · unslothai/unsloth-staging-1

danielhanchen · 2026-04-06T11:13:01Z

Staging mirror of unslothai#4852

Original PR: unslothai#4852
Author: rolandtannous

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.

Original description

Summary

Guard SUPPORTS_LLAMA32 glob blocks in FastLanguageModel.from_pretrained and FastModel.from_pretrained with is_model and is_peft so the HfFileSystem HTTP call is only made when both configs could actually exist
Prevents indefinite hangs on slow/unreliable networks since the glob result is redundant when either AutoConfig or PeftConfig already failed to load
Adds 7 unit tests verifying glob is skipped/called in the correct scenarios

Test plan

pytest tests/test_loader_glob_skip.py -v — all 7 tests pass
Train any HF model (e.g. unsloth/Qwen3.5-2B) — should no longer hang on slow networks
Train a LoRA adapter repo — should still detect conflicting configs correctly
Train a local model path — behavior unchanged

Guard the SUPPORTS_LLAMA32 glob blocks with `is_model and is_peft` so the HfFileSystem HTTP call is only made when both configs could actually exist. This prevents indefinite hangs on slow/unreliable networks since the glob result is redundant when either AutoConfig or PeftConfig already failed to load.

for more information, see https://pre-commit.ci

Tests for the glob skip guard belong in their own PR to keep the loader change minimal and reviewable.

danielhanchen · 2026-04-06T11:13:09Z

/gemini review

danielhanchen · 2026-04-06T11:13:10Z

@codex review

chatgpt-codex-connector · 2026-04-06T11:14:58Z

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gemini-code-assist

Code Review

This pull request updates the Llama 3.2 support logic in unsloth/models/loader.py to include checks for is_model and is_peft. Feedback was provided to refactor this duplicated logic into a helper function or variable to enhance maintainability.

gemini-code-assist · 2026-04-06T11:25:37Z


        # New transformers need to check manually.
-        if SUPPORTS_LLAMA32:
+        if SUPPORTS_LLAMA32 and is_model and is_peft:


The condition is_model and is_peft is checked here, but the same logic is repeated later in the file at line 1285. To improve maintainability and reduce code duplication, consider extracting this check into a helper function or a boolean variable.

gemini-code-assist · 2026-04-06T11:25:37Z


        # New transformers need to check manually.
-        if SUPPORTS_LLAMA32:
+        if SUPPORTS_LLAMA32 and is_model and is_peft:


This logic is duplicated from line 511. Please refactor this into a shared helper function to ensure consistency and maintainability.

P1 #1: ``_gpu_workload_busy_for_helper`` in ``utils/datasets/llm_assist.py`` now also gates on the GGUF chat backend (llama-server) AND the safetensors chat backend. Round 23 extended it to training + export but missed Chat, so a helper / advisor GGUF could still race a loaded chat model for VRAM. Both checks fail closed when status is unverifiable. P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff unloads so the diffusion release runs BEFORE the chat releases. A wedged diffusion unload used to fire AFTER chat was already gone, so the user lost both on a single failure. Drop chat last so an earlier failure preserves it. Applied to ``/training/start`` (training.py), ``/export/load`` (export.py), ``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch (routes/inference.py). P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens ``model_path`` and ``gguf_variant`` via the shared ``_validate_logged_identifier`` helper, so control characters and URL-form HF tokens can no longer log-line-smuggle. P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and ``variant`` the same way. P1 #9: ``/download-progress`` ``repo_id`` query parameter is also hardened; the value flows into log lines deep inside ``_get_repo_size_cached`` on lookup failure. P1 #11: ``CheckFormatRequest.dataset_name`` and ``AiAssistMappingRequest.{dataset_name, model_name}`` in ``models/datasets.py`` now apply the same control-char + embedded-HF-token validators, matching every other public request-body model. All 115 diffusion + training-validation + cached_gguf + export + inference model-validation tests pass locally. (P1 #6 native-path-lease enforcement for diffusion local paths and P1 #12 React Compiler frontend lint deferred -- both need focused design / frontend touchups separate from this batch.)

Twelve P1 findings from round 26 reviewer aggregate, plus the CI revert of round 25 P1 #5 to a less invasive location. 1. requirements/studio.txt + requirements/single-env/constraints.txt: revert the round 25 huggingface-hub bump (broke Studio Update CI, Mac Studio Update CI, Mac Studio UI CI, Studio UI CI all with ResolutionImpossible against transformers==4.57.6 which requires hub<1.0). Standard install path stays on the well-tested 4.57.6 + 0.36.2 + trl 0.23.1 trio. 2. requirements/no-torch-runtime.txt + pyproject.toml [huggingfacenotorch]: bump huggingface_hub floor from >=0.34.0 to >=1.3.0,<2.0 -- this is where the actual transformers 5.x + hub 0.36.2 broken combo can land because the file installs --no-deps. transformers 5.x calls hub.is_offline_mode which only exists in hub 1.x. 3. utils/datasets/llm_assist.py: revert round 25 P1 #4 (helper/advisor sharing the global llama backend) which introduced three regressions: a chat-evict load race after the busy precheck, a finally-block that could unload a user chat model, and an identifier mismatch the delete guard could not canonicalize. Go back to PRIVATE LlamaCppBackend instances and expose the active helper/advisor repos through a new thread-safe registry (helper_advisor_owns_repo / _register_helper_advisor_repo / _unregister_helper_advisor_repo) so DELETE /api/models/delete-cached can still block the rmtree. 4. routes/models.py delete_cached_model: check the new helper/advisor registry up front and 409 if a helper/advisor still owns the target repo. Closes round 26 P1 #13 and #14 (helper/advisor identifiers were prefixed and would never equal the raw repo id). 5. routes/models.py get_lora_base_model: validate lora_path with _validate_logged_identifier before it is reflected in 404 detail and error logs (round 26 P1 #12). 6. routes/inference.py /unload: round 21 P1 #3 added a "or not is_loaded" fallback that let an unload of owner/B cancel a pending llama load of owner/A. Replace it with a narrow llama_is_starting_without_identifier branch that only fires when llama-server is mid-startup with neither identifier set (round 26 P1 #5). 7. routes/inference.py /unload: poll loading_model_identifier for up to 5 s after asyncio.to_thread(unload_model) so a legitimate pending-load cancel does not 503 because the load thread has not yet observed _cancel_event in its finally (round 26 P2 #15). 8. models/training.py TrainingStartRequest: extend identifier hardening to hf_dataset, subset, train_split, eval_split. Round 22 only guarded model_name (round 26 P1 #10). 9. models/data_recipe.py SeedInspectRequest: add _no_control_chars + _reject_embedded_hf_token field_validators on dataset_name (round 26 P1 #11). Tests: 105 targeted (diffusion + cached_gguf + llama_cpp_cache + inference_model_validation + models_get_model_config) and 1768 broader backend tests pass locally. Pre-existing test_desktop_auth.py, test_studio_api.py, and test_training_worker_flash_attn.py failures reproduce on HEAD without these changes.

Five additional P1 findings round 27 reviewer flagged on top of the round 27 commit 6c528fb (Counter refcount + handoff visibility were already covered). Three remaining studio.txt / no-torch-runtime hub suggestions are NOT applied because they would re-break CI; the empirical evidence (round 26 commit 65ea3a2 restored CI green) takes precedence over the reviewer's stale-state suggestion. 1. models/training.py TrainingStartRequest: extend the embedded HF token validator to subset, train_split, eval_split. Round 26 only added the control-char guard to those three; the token guard was asymmetric and would accept owner/data\\nFAKE hf_abcdef... payloads through subset / split fields. 2. models/datasets.py CheckFormatRequest: extend both validators (control chars + embedded HF token) to subset and train_split. Same asymmetric-fix bug as #1. 3. models/data_recipe.py SeedInspectRequest: extend both validators to subset and split. Same pattern. 4. utils/datasets/llm_assist.py precache_helper_gguf: register the helper repo in the helper/advisor refcount registry around the hf_hub_download loop, then unregister in the finally. Without this, the FastAPI-startup background pre-cache could be racing a concurrent DELETE /api/models/delete-cached against the same cache directory. The runtime helper / advisor calls already register (round 26 P1 #13/#14) but the precache was the asymmetric gap. 5. routes/models.py _loaded_model_matches_deleted_path: match bidirectionally (active under target OR target under active) so deleting a child directory of a loaded local model (.../my-flux/ text_encoder while .../my-flux is loaded) trips the guard. Mirrors the diffusion delete-guard symmetric path-overlap check. Tests: 105 targeted (diffusion + cache + inference_validation) and the broader backend suite pass locally.

rolandtannous and others added 4 commits April 5, 2026 03:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

c50591b

for more information, see https://pre-commit.ci

Remove test file from main PR - moved to separate PR

30db8f3

Tests for the glob skip guard belong in their own PR to keep the loader change minimal and reviewable.

Merge remote-tracking branch 'origin/main' into pr-4852-head

b83d57f

danielhanchen closed this Apr 6, 2026

gemini-code-assist Bot reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip redundant HfFileSystem().glob() calls in loader.py#13

fix: skip redundant HfFileSystem().glob() calls in loader.py#13
danielhanchen wants to merge 4 commits into
mainfrom
pr-4852-head

danielhanchen commented Apr 6, 2026

Uh oh!

danielhanchen commented Apr 6, 2026

Uh oh!

danielhanchen commented Apr 6, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 6, 2026

Uh oh!

gemini-code-assist Bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielhanchen commented Apr 6, 2026

Staging mirror of unslothai#4852

Original description

Summary

Test plan

Uh oh!

danielhanchen commented Apr 6, 2026

Uh oh!

danielhanchen commented Apr 6, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants