Skip to content

fix: skip redundant HfFileSystem().glob() calls in loader.py#13

Closed
danielhanchen wants to merge 4 commits into
mainfrom
pr-4852-head
Closed

fix: skip redundant HfFileSystem().glob() calls in loader.py#13
danielhanchen wants to merge 4 commits into
mainfrom
pr-4852-head

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

Staging mirror of unslothai#4852

Original PR: unslothai#4852
Author: rolandtannous

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.


Original description

Summary

  • Guard SUPPORTS_LLAMA32 glob blocks in FastLanguageModel.from_pretrained and FastModel.from_pretrained with is_model and is_peft so the HfFileSystem HTTP call is only made when both configs could actually exist
  • Prevents indefinite hangs on slow/unreliable networks since the glob result is redundant when either AutoConfig or PeftConfig already failed to load
  • Adds 7 unit tests verifying glob is skipped/called in the correct scenarios

Test plan

  • pytest tests/test_loader_glob_skip.py -v — all 7 tests pass
  • Train any HF model (e.g. unsloth/Qwen3.5-2B) — should no longer hang on slow networks
  • Train a LoRA adapter repo — should still detect conflicting configs correctly
  • Train a local model path — behavior unchanged

rolandtannous and others added 4 commits April 5, 2026 03:12
Guard the SUPPORTS_LLAMA32 glob blocks with `is_model and is_peft` so
the HfFileSystem HTTP call is only made when both configs could actually
exist. This prevents indefinite hangs on slow/unreliable networks since
the glob result is redundant when either AutoConfig or PeftConfig
already failed to load.
Tests for the glob skip guard belong in their own PR to keep
the loader change minimal and reviewable.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Llama 3.2 support logic in unsloth/models/loader.py to include checks for is_model and is_peft. Feedback was provided to refactor this duplicated logic into a helper function or variable to enhance maintainability.

Comment thread unsloth/models/loader.py

# New transformers need to check manually.
if SUPPORTS_LLAMA32:
if SUPPORTS_LLAMA32 and is_model and is_peft:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition is_model and is_peft is checked here, but the same logic is repeated later in the file at line 1285. To improve maintainability and reduce code duplication, consider extracting this check into a helper function or a boolean variable.

Comment thread unsloth/models/loader.py

# New transformers need to check manually.
if SUPPORTS_LLAMA32:
if SUPPORTS_LLAMA32 and is_model and is_peft:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic is duplicated from line 511. Please refactor this into a shared helper function to ensure consistency and maintainability.

danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in
``utils/datasets/llm_assist.py`` now also gates on the GGUF chat
backend (llama-server) AND the safetensors chat backend. Round 23
extended it to training + export but missed Chat, so a helper /
advisor GGUF could still race a loaded chat model for VRAM.
Both checks fail closed when status is unverifiable.

P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff
unloads so the diffusion release runs BEFORE the chat releases.
A wedged diffusion unload used to fire AFTER chat was already
gone, so the user lost both on a single failure. Drop chat last
so an earlier failure preserves it. Applied to
``/training/start`` (training.py), ``/export/load`` (export.py),
``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch
(routes/inference.py).

P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens
``model_path`` and ``gguf_variant`` via the shared
``_validate_logged_identifier`` helper, so control characters
and URL-form HF tokens can no longer log-line-smuggle.

P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and
``variant`` the same way.

P1 #9: ``/download-progress`` ``repo_id`` query parameter is
also hardened; the value flows into log lines deep inside
``_get_repo_size_cached`` on lookup failure.

P1 #11: ``CheckFormatRequest.dataset_name`` and
``AiAssistMappingRequest.{dataset_name, model_name}`` in
``models/datasets.py`` now apply the same control-char +
embedded-HF-token validators, matching every other public
request-body model.

All 115 diffusion + training-validation + cached_gguf + export
+ inference model-validation tests pass locally.

(P1 #6 native-path-lease enforcement for diffusion local paths
and P1 #12 React Compiler frontend lint deferred -- both need
focused design / frontend touchups separate from this batch.)
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in
``utils/datasets/llm_assist.py`` now also gates on the GGUF chat
backend (llama-server) AND the safetensors chat backend. Round 23
extended it to training + export but missed Chat, so a helper /
advisor GGUF could still race a loaded chat model for VRAM.
Both checks fail closed when status is unverifiable.

P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff
unloads so the diffusion release runs BEFORE the chat releases.
A wedged diffusion unload used to fire AFTER chat was already
gone, so the user lost both on a single failure. Drop chat last
so an earlier failure preserves it. Applied to
``/training/start`` (training.py), ``/export/load`` (export.py),
``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch
(routes/inference.py).

P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens
``model_path`` and ``gguf_variant`` via the shared
``_validate_logged_identifier`` helper, so control characters
and URL-form HF tokens can no longer log-line-smuggle.

P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and
``variant`` the same way.

P1 #9: ``/download-progress`` ``repo_id`` query parameter is
also hardened; the value flows into log lines deep inside
``_get_repo_size_cached`` on lookup failure.

P1 #11: ``CheckFormatRequest.dataset_name`` and
``AiAssistMappingRequest.{dataset_name, model_name}`` in
``models/datasets.py`` now apply the same control-char +
embedded-HF-token validators, matching every other public
request-body model.

All 115 diffusion + training-validation + cached_gguf + export
+ inference model-validation tests pass locally.

(P1 #6 native-path-lease enforcement for diffusion local paths
and P1 #12 React Compiler frontend lint deferred -- both need
focused design / frontend touchups separate from this batch.)
danielhanchen added a commit that referenced this pull request May 25, 2026
Twelve P1 findings from round 26 reviewer aggregate, plus the CI
revert of round 25 P1 #5 to a less invasive location.

1. requirements/studio.txt + requirements/single-env/constraints.txt:
   revert the round 25 huggingface-hub bump (broke Studio Update CI,
   Mac Studio Update CI, Mac Studio UI CI, Studio UI CI all with
   ResolutionImpossible against transformers==4.57.6 which requires
   hub<1.0). Standard install path stays on the well-tested 4.57.6 +
   0.36.2 + trl 0.23.1 trio.

2. requirements/no-torch-runtime.txt + pyproject.toml
   [huggingfacenotorch]: bump huggingface_hub floor from >=0.34.0 to
   >=1.3.0,<2.0 -- this is where the actual transformers 5.x +
   hub 0.36.2 broken combo can land because the file installs
   --no-deps. transformers 5.x calls hub.is_offline_mode which only
   exists in hub 1.x.

3. utils/datasets/llm_assist.py: revert round 25 P1 #4 (helper/advisor
   sharing the global llama backend) which introduced three
   regressions: a chat-evict load race after the busy precheck, a
   finally-block that could unload a user chat model, and an
   identifier mismatch the delete guard could not canonicalize. Go
   back to PRIVATE LlamaCppBackend instances and expose the active
   helper/advisor repos through a new thread-safe registry
   (helper_advisor_owns_repo / _register_helper_advisor_repo /
   _unregister_helper_advisor_repo) so DELETE /api/models/delete-cached
   can still block the rmtree.

4. routes/models.py delete_cached_model: check the new helper/advisor
   registry up front and 409 if a helper/advisor still owns the
   target repo. Closes round 26 P1 #13 and #14 (helper/advisor
   identifiers were prefixed and would never equal the raw repo id).

5. routes/models.py get_lora_base_model: validate lora_path with
   _validate_logged_identifier before it is reflected in 404 detail
   and error logs (round 26 P1 #12).

6. routes/inference.py /unload: round 21 P1 #3 added a "or not
   is_loaded" fallback that let an unload of owner/B cancel a pending
   llama load of owner/A. Replace it with a narrow
   llama_is_starting_without_identifier branch that only fires when
   llama-server is mid-startup with neither identifier set (round 26
   P1 #5).

7. routes/inference.py /unload: poll loading_model_identifier for up
   to 5 s after asyncio.to_thread(unload_model) so a legitimate
   pending-load cancel does not 503 because the load thread has not
   yet observed _cancel_event in its finally (round 26 P2 #15).

8. models/training.py TrainingStartRequest: extend identifier
   hardening to hf_dataset, subset, train_split, eval_split. Round 22
   only guarded model_name (round 26 P1 #10).

9. models/data_recipe.py SeedInspectRequest: add _no_control_chars +
   _reject_embedded_hf_token field_validators on dataset_name (round
   26 P1 #11).

Tests: 105 targeted (diffusion + cached_gguf + llama_cpp_cache +
inference_model_validation + models_get_model_config) and 1768
broader backend tests pass locally. Pre-existing
test_desktop_auth.py, test_studio_api.py, and
test_training_worker_flash_attn.py failures reproduce on HEAD
without these changes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Five additional P1 findings round 27 reviewer flagged on top of the
round 27 commit 6c528fb (Counter refcount + handoff visibility were
already covered). Three remaining studio.txt / no-torch-runtime hub
suggestions are NOT applied because they would re-break CI; the
empirical evidence (round 26 commit 65ea3a2 restored CI green) takes
precedence over the reviewer's stale-state suggestion.

1. models/training.py TrainingStartRequest: extend the embedded HF
   token validator to subset, train_split, eval_split. Round 26 only
   added the control-char guard to those three; the token guard was
   asymmetric and would accept owner/data\\nFAKE hf_abcdef...
   payloads through subset / split fields.

2. models/datasets.py CheckFormatRequest: extend both validators
   (control chars + embedded HF token) to subset and train_split.
   Same asymmetric-fix bug as #1.

3. models/data_recipe.py SeedInspectRequest: extend both validators
   to subset and split. Same pattern.

4. utils/datasets/llm_assist.py precache_helper_gguf: register the
   helper repo in the helper/advisor refcount registry around the
   hf_hub_download loop, then unregister in the finally. Without
   this, the FastAPI-startup background pre-cache could be racing
   a concurrent DELETE /api/models/delete-cached against the same
   cache directory. The runtime helper / advisor calls already
   register (round 26 P1 #13/#14) but the precache was the
   asymmetric gap.

5. routes/models.py _loaded_model_matches_deleted_path: match
   bidirectionally (active under target OR target under active) so
   deleting a child directory of a loaded local model (.../my-flux/
   text_encoder while .../my-flux is loaded) trips the guard.
   Mirrors the diffusion delete-guard symmetric path-overlap check.

Tests: 105 targeted (diffusion + cache + inference_validation) and
the broader backend suite pass locally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants