fix: skip redundant HfFileSystem().glob() calls in loader.py#13
Closed
danielhanchen wants to merge 4 commits into
Closed
fix: skip redundant HfFileSystem().glob() calls in loader.py#13danielhanchen wants to merge 4 commits into
danielhanchen wants to merge 4 commits into
Conversation
Guard the SUPPORTS_LLAMA32 glob blocks with `is_model and is_peft` so the HfFileSystem HTTP call is only made when both configs could actually exist. This prevents indefinite hangs on slow/unreliable networks since the glob result is redundant when either AutoConfig or PeftConfig already failed to load.
for more information, see https://pre-commit.ci
Tests for the glob skip guard belong in their own PR to keep the loader change minimal and reviewable.
Member
Author
|
/gemini review |
Member
Author
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
danielhanchen
added a commit
that referenced
this pull request
May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in ``utils/datasets/llm_assist.py`` now also gates on the GGUF chat backend (llama-server) AND the safetensors chat backend. Round 23 extended it to training + export but missed Chat, so a helper / advisor GGUF could still race a loaded chat model for VRAM. Both checks fail closed when status is unverifiable. P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff unloads so the diffusion release runs BEFORE the chat releases. A wedged diffusion unload used to fire AFTER chat was already gone, so the user lost both on a single failure. Drop chat last so an earlier failure preserves it. Applied to ``/training/start`` (training.py), ``/export/load`` (export.py), ``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch (routes/inference.py). P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens ``model_path`` and ``gguf_variant`` via the shared ``_validate_logged_identifier`` helper, so control characters and URL-form HF tokens can no longer log-line-smuggle. P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and ``variant`` the same way. P1 #9: ``/download-progress`` ``repo_id`` query parameter is also hardened; the value flows into log lines deep inside ``_get_repo_size_cached`` on lookup failure. P1 #11: ``CheckFormatRequest.dataset_name`` and ``AiAssistMappingRequest.{dataset_name, model_name}`` in ``models/datasets.py`` now apply the same control-char + embedded-HF-token validators, matching every other public request-body model. All 115 diffusion + training-validation + cached_gguf + export + inference model-validation tests pass locally. (P1 #6 native-path-lease enforcement for diffusion local paths and P1 #12 React Compiler frontend lint deferred -- both need focused design / frontend touchups separate from this batch.)
danielhanchen
added a commit
that referenced
this pull request
May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in ``utils/datasets/llm_assist.py`` now also gates on the GGUF chat backend (llama-server) AND the safetensors chat backend. Round 23 extended it to training + export but missed Chat, so a helper / advisor GGUF could still race a loaded chat model for VRAM. Both checks fail closed when status is unverifiable. P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff unloads so the diffusion release runs BEFORE the chat releases. A wedged diffusion unload used to fire AFTER chat was already gone, so the user lost both on a single failure. Drop chat last so an earlier failure preserves it. Applied to ``/training/start`` (training.py), ``/export/load`` (export.py), ``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch (routes/inference.py). P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens ``model_path`` and ``gguf_variant`` via the shared ``_validate_logged_identifier`` helper, so control characters and URL-form HF tokens can no longer log-line-smuggle. P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and ``variant`` the same way. P1 #9: ``/download-progress`` ``repo_id`` query parameter is also hardened; the value flows into log lines deep inside ``_get_repo_size_cached`` on lookup failure. P1 #11: ``CheckFormatRequest.dataset_name`` and ``AiAssistMappingRequest.{dataset_name, model_name}`` in ``models/datasets.py`` now apply the same control-char + embedded-HF-token validators, matching every other public request-body model. All 115 diffusion + training-validation + cached_gguf + export + inference model-validation tests pass locally. (P1 #6 native-path-lease enforcement for diffusion local paths and P1 #12 React Compiler frontend lint deferred -- both need focused design / frontend touchups separate from this batch.)
danielhanchen
added a commit
that referenced
this pull request
May 25, 2026
Twelve P1 findings from round 26 reviewer aggregate, plus the CI revert of round 25 P1 #5 to a less invasive location. 1. requirements/studio.txt + requirements/single-env/constraints.txt: revert the round 25 huggingface-hub bump (broke Studio Update CI, Mac Studio Update CI, Mac Studio UI CI, Studio UI CI all with ResolutionImpossible against transformers==4.57.6 which requires hub<1.0). Standard install path stays on the well-tested 4.57.6 + 0.36.2 + trl 0.23.1 trio. 2. requirements/no-torch-runtime.txt + pyproject.toml [huggingfacenotorch]: bump huggingface_hub floor from >=0.34.0 to >=1.3.0,<2.0 -- this is where the actual transformers 5.x + hub 0.36.2 broken combo can land because the file installs --no-deps. transformers 5.x calls hub.is_offline_mode which only exists in hub 1.x. 3. utils/datasets/llm_assist.py: revert round 25 P1 #4 (helper/advisor sharing the global llama backend) which introduced three regressions: a chat-evict load race after the busy precheck, a finally-block that could unload a user chat model, and an identifier mismatch the delete guard could not canonicalize. Go back to PRIVATE LlamaCppBackend instances and expose the active helper/advisor repos through a new thread-safe registry (helper_advisor_owns_repo / _register_helper_advisor_repo / _unregister_helper_advisor_repo) so DELETE /api/models/delete-cached can still block the rmtree. 4. routes/models.py delete_cached_model: check the new helper/advisor registry up front and 409 if a helper/advisor still owns the target repo. Closes round 26 P1 #13 and #14 (helper/advisor identifiers were prefixed and would never equal the raw repo id). 5. routes/models.py get_lora_base_model: validate lora_path with _validate_logged_identifier before it is reflected in 404 detail and error logs (round 26 P1 #12). 6. routes/inference.py /unload: round 21 P1 #3 added a "or not is_loaded" fallback that let an unload of owner/B cancel a pending llama load of owner/A. Replace it with a narrow llama_is_starting_without_identifier branch that only fires when llama-server is mid-startup with neither identifier set (round 26 P1 #5). 7. routes/inference.py /unload: poll loading_model_identifier for up to 5 s after asyncio.to_thread(unload_model) so a legitimate pending-load cancel does not 503 because the load thread has not yet observed _cancel_event in its finally (round 26 P2 #15). 8. models/training.py TrainingStartRequest: extend identifier hardening to hf_dataset, subset, train_split, eval_split. Round 22 only guarded model_name (round 26 P1 #10). 9. models/data_recipe.py SeedInspectRequest: add _no_control_chars + _reject_embedded_hf_token field_validators on dataset_name (round 26 P1 #11). Tests: 105 targeted (diffusion + cached_gguf + llama_cpp_cache + inference_model_validation + models_get_model_config) and 1768 broader backend tests pass locally. Pre-existing test_desktop_auth.py, test_studio_api.py, and test_training_worker_flash_attn.py failures reproduce on HEAD without these changes.
danielhanchen
added a commit
that referenced
this pull request
May 25, 2026
Five additional P1 findings round 27 reviewer flagged on top of the round 27 commit 6c528fb (Counter refcount + handoff visibility were already covered). Three remaining studio.txt / no-torch-runtime hub suggestions are NOT applied because they would re-break CI; the empirical evidence (round 26 commit 65ea3a2 restored CI green) takes precedence over the reviewer's stale-state suggestion. 1. models/training.py TrainingStartRequest: extend the embedded HF token validator to subset, train_split, eval_split. Round 26 only added the control-char guard to those three; the token guard was asymmetric and would accept owner/data\\nFAKE hf_abcdef... payloads through subset / split fields. 2. models/datasets.py CheckFormatRequest: extend both validators (control chars + embedded HF token) to subset and train_split. Same asymmetric-fix bug as #1. 3. models/data_recipe.py SeedInspectRequest: extend both validators to subset and split. Same pattern. 4. utils/datasets/llm_assist.py precache_helper_gguf: register the helper repo in the helper/advisor refcount registry around the hf_hub_download loop, then unregister in the finally. Without this, the FastAPI-startup background pre-cache could be racing a concurrent DELETE /api/models/delete-cached against the same cache directory. The runtime helper / advisor calls already register (round 26 P1 #13/#14) but the precache was the asymmetric gap. 5. routes/models.py _loaded_model_matches_deleted_path: match bidirectionally (active under target OR target under active) so deleting a child directory of a loaded local model (.../my-flux/ text_encoder while .../my-flux is loaded) trips the guard. Mirrors the diffusion delete-guard symmetric path-overlap check. Tests: 105 targeted (diffusion + cache + inference_validation) and the broader backend suite pass locally.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Staging mirror of unslothai#4852
Original PR: unslothai#4852
Author: rolandtannous
This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.
Original description
Summary
SUPPORTS_LLAMA32glob blocks inFastLanguageModel.from_pretrainedandFastModel.from_pretrainedwithis_model and is_peftso theHfFileSystemHTTP call is only made when both configs could actually existAutoConfigorPeftConfigalready failed to loadTest plan
pytest tests/test_loader_glob_skip.py -v— all 7 tests passunsloth/Qwen3.5-2B) — should no longer hang on slow networks