fix: skip redundant HfFileSystem().glob() calls in loader.py#12
fix: skip redundant HfFileSystem().glob() calls in loader.py#12danielhanchen wants to merge 4 commits into
Conversation
Guard the SUPPORTS_LLAMA32 glob blocks with `is_model and is_peft` so the HfFileSystem HTTP call is only made when both configs could actually exist. This prevents indefinite hangs on slow/unreliable networks since the glob result is redundant when either AutoConfig or PeftConfig already failed to load.
for more information, see https://pre-commit.ci
Tests for the glob skip guard belong in their own PR to keep the loader change minimal and reviewable.
|
/gemini review |
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Code Review
This pull request updates the conditional logic for Llama 3.2 support in unsloth/models/loader.py by adding is_model and is_peft flags to the checks. The reviewer suggests refactoring this duplicated logic into a helper function to improve maintainability and reduce code duplication.
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
danielhanchen
left a comment
There was a problem hiding this comment.
Thank you for the PR! The goal of this PR is to avoid redundant HfFileSystem().glob() network calls in the model loader. As a summary, this PR adds and is_model and is_peft guards to the two SUPPORTS_LLAMA32 blocks in FastLanguageModel.from_pretrained and FastModel.from_pretrained, so the expensive HTTP filesystem listing is only performed when both AutoConfig and PeftConfig have already succeeded.
Review Summary (12 independent reviewers)
| Reviewers | Severity | Finding |
|---|---|---|
| 12/12 | Info | both_exist written inside the SUPPORTS_LLAMA32 block is dead code -- it is never read after being set. The if both_exist: raise RuntimeError(...) check appears before the block (lines 493 and 1105), not after. This is a pre-existing issue, not introduced by this PR. |
| 12/12 | Info | The and is_model and is_peft guard is logically correct and safe. Since both_exist is already dead code after the block, the guard simply avoids a redundant network call with no behavior change. |
| 7/12 | Low | Test file tests/test_loader_glob_skip.py referenced in PR description is not present in this branch (moved to separate PR unslothai#4854). |
| 2/12 | Resolved | Two reviewers initially flagged a P1 regression (skipping the mixed-config-repo check when one parser fails). Resolved as false positive: the block only writes both_exist after the sole if both_exist: raise site, so skipping the block cannot suppress a live error path. |
Detailed Analysis
The change is a 2-line guard that prevents HfFileSystem().glob() from running when it is unnecessary. The logic flow is:
both_exist = (is_model and is_peft) and not SUPPORTS_LLAMA32(line 490/1103) -- alwaysFalsewhenSUPPORTS_LLAMA32=Trueif both_exist: raise RuntimeError(...)(line 493/1105) -- dead on modern transformersif SUPPORTS_LLAMA32 [and is_model and is_peft]:(line 511/1285) -- setsboth_existvia glob/local checkboth_existis never read again after step 3
Since step 4 means the block produces no observable effect, the PR's guard correctly skips unnecessary network calls without changing behavior.
Pre-existing issue (not blocking)
The SUPPORTS_LLAMA32 block appears to be intended to raise the "both config.json and adapter_config.json exist" error for modern transformers, but the if both_exist: raise was never placed after the block. This means the dual-config detection is silently broken on transformers > 4.45.0 regardless of this PR. A follow-up could either add the missing raise after the block or remove the dead code entirely.
Concrete suggestions for each finding below.
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
There was a problem hiding this comment.
[12/12 reviewers] This guard is correct -- it avoids the HfFileSystem().glob() call when either AutoConfig or PeftConfig already failed. Since both_exist is never read after this block (the if both_exist: raise at line 493 runs before this point), skipping the block has no behavioral impact.
Note: the entire block (lines 511-527) is effectively dead code on modern transformers because both_exist is written but never consumed afterward. The guard correctly avoids wasted network I/O in this dead path.
| if SUPPORTS_LLAMA32 and is_model and is_peft: | |
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
There was a problem hiding this comment.
[12/12 reviewers] Same as the FastLanguageModel change above -- the guard is safe. both_exist written on lines 1292/1300 is never read after this block (the check at line 1105 runs before).
| if SUPPORTS_LLAMA32 and is_model and is_peft: | |
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
|
/gemini review |
|
@codex review |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Code Review
This pull request updates the from_pretrained method in unsloth/models/loader.py to restrict the Llama 3.2 support check to cases where both is_model and is_peft are true. I have reviewed the feedback regarding potential logic inconsistencies and code duplication; while the suggested refactoring into a helper function is a valid improvement for maintainability, the current changes are accepted as they address the immediate scope of the PR.
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
There was a problem hiding this comment.
While this change correctly avoids a redundant network call, I've noticed two related points:
-
Potential Bug: The
both_existvariable is recalculated inside this block for newer transformers versions (SUPPORTS_LLAMA32) but is not subsequently used. This means that if a repository contains both a base model and a LoRA adapter, it would not raise an error as it does for older versions. This seems like a bug. -
Code Duplication: This entire logic block for loading and checking configurations is duplicated in
FastModel.from_pretrained(starting around line 1055).
I'd suggest refactoring this duplicated logic into a helper function. This would improve maintainability and allow fixing the potential bug in one place. The fix would involve ensuring that an error is raised if both configs exist, regardless of the transformers version.
P1: route-layer chat/diffusion/export releases were still asymmetric. Training start and export load called ``diff_backend.unload_model`` inside a best-effort try/except so a wedged diffusion backend let the next workload allocate over the top of the resident pipeline and OOM. Both now use the strict ``_release_diffusion_for`` helper from routes.inference, which raises HTTPException 503 on status/unload failure or post-check mismatch. P2 #9: diffusion load exceptions can include the absolute local repo / base / gguf path verbatim (FileNotFoundError, OSError from diffusers / safetensors). The path flows into ``_last_error``, which ``status()`` returns to every authenticated session. Collapse the known repo_id / effective_base / gguf_filename paths to their leaf name before storing the error, mirroring the ``_display_repo_id`` convention used for the public repo label. P2 #10: when ``repo_id`` is an absolute local path, ``detect_family`` matched _FAMILY_EXCLUDE deny lists against the full path, so models stored under a parent directory containing ``qwen-image-edit`` or ``3.5`` were misclassified as None. Reduce the family-detection needle to the leaf directory when the input looks like a filesystem path; Hub-style ``owner/repo`` ids continue to use the original needle so existing detection rules keep working. P2 #12: ``gguf_filename`` was missing from the ``_reject_embedded_hf_token`` validator. A URL-form quant path like ``https://hf_xxxxx@huggingface.co/.../flux.gguf`` would be stored on ``DiffusionBackend._gguf_filename`` and surface in status() / log lines. Extend the validator to gguf_filename so the token is dropped before it can leak. All 85 diffusion-relevant backend tests pass locally.
P1 #1 + #2 + #6: extended the chat / diffusion / training identifier hardening to every export-side request model. ExportCommonOptions (parent of ExportMergedModelRequest / ExportBaseModelRequest / ExportLoRAAdapterRequest) now applies _no_control_chars and _reject_embedded_hf_token to repo_id and base_model_id; ExportGGUFRequest gets the same on its repo_id plus a control-char check on quantization_method; and LoadCheckpointRequest validates checkpoint_path. Previously "/api/export/*" accepted newline-smuggled identifiers and URL-form ``hf_xxxxx`` tokens that flowed into log lines. P1 #3 + #4: ``_run_with_helper`` and ``_run_multi_pass_advisor`` now use a shared ``_gpu_workload_busy_for_helper`` that gates on diffusion (round 22 already), training, AND export. The round 22 guard only checked diffusion, so the dataset helper / advisor could still load llama-server on top of an active training run or a resident export checkpoint. Each step fails closed (unverifiable status counts as busy) so the user's primary workload is preserved. P1 #5: PublishDatasetRequest in models/data_recipe.py also applies the identifier hardening to repo_id; the publish path previously accepted control characters and URL-form tokens. P1 #7-10: added _validate_logged_identifier helper to routes/models.py and applied it to the path / query parameter endpoints that flow into logger.info(...) calls -- ``/config/{model_name}``, ``/check-vision/{model_name}``, ``/check-embedding/{model_name}``, ``/gguf-variants``. Mapped the validator's ValueError to HTTP 422 so the client sees the same shape as a Pydantic validation failure. P2 #11 + #12: ``Loading diffusion model %s`` and ``Diffusion load failed for %s`` log lines route ``repo_id`` / ``effective_base`` through ``_display_repo_id`` (collapses absolute local paths to the leaf, still scrubs HF tokens) instead of plain ``_redact_hf_tokens``. The error path was already collapsed in the user-facing 400 / RuntimeError, but the structured-log lines kept the full path. All 97 diffusion + training-validation + related tests pass locally.
P1 #1: ``_gpu_workload_busy_for_helper`` in ``utils/datasets/llm_assist.py`` now also gates on the GGUF chat backend (llama-server) AND the safetensors chat backend. Round 23 extended it to training + export but missed Chat, so a helper / advisor GGUF could still race a loaded chat model for VRAM. Both checks fail closed when status is unverifiable. P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff unloads so the diffusion release runs BEFORE the chat releases. A wedged diffusion unload used to fire AFTER chat was already gone, so the user lost both on a single failure. Drop chat last so an earlier failure preserves it. Applied to ``/training/start`` (training.py), ``/export/load`` (export.py), ``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch (routes/inference.py). P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens ``model_path`` and ``gguf_variant`` via the shared ``_validate_logged_identifier`` helper, so control characters and URL-form HF tokens can no longer log-line-smuggle. P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and ``variant`` the same way. P1 #9: ``/download-progress`` ``repo_id`` query parameter is also hardened; the value flows into log lines deep inside ``_get_repo_size_cached`` on lookup failure. P1 #11: ``CheckFormatRequest.dataset_name`` and ``AiAssistMappingRequest.{dataset_name, model_name}`` in ``models/datasets.py`` now apply the same control-char + embedded-HF-token validators, matching every other public request-body model. All 115 diffusion + training-validation + cached_gguf + export + inference model-validation tests pass locally. (P1 #6 native-path-lease enforcement for diffusion local paths and P1 #12 React Compiler frontend lint deferred -- both need focused design / frontend touchups separate from this batch.)
P1 #1 + #2 + #6: extended the chat / diffusion / training identifier hardening to every export-side request model. ExportCommonOptions (parent of ExportMergedModelRequest / ExportBaseModelRequest / ExportLoRAAdapterRequest) now applies _no_control_chars and _reject_embedded_hf_token to repo_id and base_model_id; ExportGGUFRequest gets the same on its repo_id plus a control-char check on quantization_method; and LoadCheckpointRequest validates checkpoint_path. Previously "/api/export/*" accepted newline-smuggled identifiers and URL-form ``hf_xxxxx`` tokens that flowed into log lines. P1 #3 + #4: ``_run_with_helper`` and ``_run_multi_pass_advisor`` now use a shared ``_gpu_workload_busy_for_helper`` that gates on diffusion (round 22 already), training, AND export. The round 22 guard only checked diffusion, so the dataset helper / advisor could still load llama-server on top of an active training run or a resident export checkpoint. Each step fails closed (unverifiable status counts as busy) so the user's primary workload is preserved. P1 #5: PublishDatasetRequest in models/data_recipe.py also applies the identifier hardening to repo_id; the publish path previously accepted control characters and URL-form tokens. P1 #7-10: added _validate_logged_identifier helper to routes/models.py and applied it to the path / query parameter endpoints that flow into logger.info(...) calls -- ``/config/{model_name}``, ``/check-vision/{model_name}``, ``/check-embedding/{model_name}``, ``/gguf-variants``. Mapped the validator's ValueError to HTTP 422 so the client sees the same shape as a Pydantic validation failure. P2 #11 + #12: ``Loading diffusion model %s`` and ``Diffusion load failed for %s`` log lines route ``repo_id`` / ``effective_base`` through ``_display_repo_id`` (collapses absolute local paths to the leaf, still scrubs HF tokens) instead of plain ``_redact_hf_tokens``. The error path was already collapsed in the user-facing 400 / RuntimeError, but the structured-log lines kept the full path. All 97 diffusion + training-validation + related tests pass locally.
P1 #1: ``_gpu_workload_busy_for_helper`` in ``utils/datasets/llm_assist.py`` now also gates on the GGUF chat backend (llama-server) AND the safetensors chat backend. Round 23 extended it to training + export but missed Chat, so a helper / advisor GGUF could still race a loaded chat model for VRAM. Both checks fail closed when status is unverifiable. P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff unloads so the diffusion release runs BEFORE the chat releases. A wedged diffusion unload used to fire AFTER chat was already gone, so the user lost both on a single failure. Drop chat last so an earlier failure preserves it. Applied to ``/training/start`` (training.py), ``/export/load`` (export.py), ``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch (routes/inference.py). P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens ``model_path`` and ``gguf_variant`` via the shared ``_validate_logged_identifier`` helper, so control characters and URL-form HF tokens can no longer log-line-smuggle. P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and ``variant`` the same way. P1 #9: ``/download-progress`` ``repo_id`` query parameter is also hardened; the value flows into log lines deep inside ``_get_repo_size_cached`` on lookup failure. P1 #11: ``CheckFormatRequest.dataset_name`` and ``AiAssistMappingRequest.{dataset_name, model_name}`` in ``models/datasets.py`` now apply the same control-char + embedded-HF-token validators, matching every other public request-body model. All 115 diffusion + training-validation + cached_gguf + export + inference model-validation tests pass locally. (P1 #6 native-path-lease enforcement for diffusion local paths and P1 #12 React Compiler frontend lint deferred -- both need focused design / frontend touchups separate from this batch.)
Twelve P1 findings from round 26 reviewer aggregate, plus the CI revert of round 25 P1 #5 to a less invasive location. 1. requirements/studio.txt + requirements/single-env/constraints.txt: revert the round 25 huggingface-hub bump (broke Studio Update CI, Mac Studio Update CI, Mac Studio UI CI, Studio UI CI all with ResolutionImpossible against transformers==4.57.6 which requires hub<1.0). Standard install path stays on the well-tested 4.57.6 + 0.36.2 + trl 0.23.1 trio. 2. requirements/no-torch-runtime.txt + pyproject.toml [huggingfacenotorch]: bump huggingface_hub floor from >=0.34.0 to >=1.3.0,<2.0 -- this is where the actual transformers 5.x + hub 0.36.2 broken combo can land because the file installs --no-deps. transformers 5.x calls hub.is_offline_mode which only exists in hub 1.x. 3. utils/datasets/llm_assist.py: revert round 25 P1 #4 (helper/advisor sharing the global llama backend) which introduced three regressions: a chat-evict load race after the busy precheck, a finally-block that could unload a user chat model, and an identifier mismatch the delete guard could not canonicalize. Go back to PRIVATE LlamaCppBackend instances and expose the active helper/advisor repos through a new thread-safe registry (helper_advisor_owns_repo / _register_helper_advisor_repo / _unregister_helper_advisor_repo) so DELETE /api/models/delete-cached can still block the rmtree. 4. routes/models.py delete_cached_model: check the new helper/advisor registry up front and 409 if a helper/advisor still owns the target repo. Closes round 26 P1 #13 and #14 (helper/advisor identifiers were prefixed and would never equal the raw repo id). 5. routes/models.py get_lora_base_model: validate lora_path with _validate_logged_identifier before it is reflected in 404 detail and error logs (round 26 P1 #12). 6. routes/inference.py /unload: round 21 P1 #3 added a "or not is_loaded" fallback that let an unload of owner/B cancel a pending llama load of owner/A. Replace it with a narrow llama_is_starting_without_identifier branch that only fires when llama-server is mid-startup with neither identifier set (round 26 P1 #5). 7. routes/inference.py /unload: poll loading_model_identifier for up to 5 s after asyncio.to_thread(unload_model) so a legitimate pending-load cancel does not 503 because the load thread has not yet observed _cancel_event in its finally (round 26 P2 #15). 8. models/training.py TrainingStartRequest: extend identifier hardening to hf_dataset, subset, train_split, eval_split. Round 22 only guarded model_name (round 26 P1 #10). 9. models/data_recipe.py SeedInspectRequest: add _no_control_chars + _reject_embedded_hf_token field_validators on dataset_name (round 26 P1 #11). Tests: 105 targeted (diffusion + cached_gguf + llama_cpp_cache + inference_model_validation + models_get_model_config) and 1768 broader backend tests pass locally. Pre-existing test_desktop_auth.py, test_studio_api.py, and test_training_worker_flash_attn.py failures reproduce on HEAD without these changes.
Four actionable findings from round 30. Skipped P1 #1 / #2 / #3 (huggingface-hub bump in studio.txt / single-env / colab-new) because the live B200 Studio that successfully generated FLUX.2 klein images runs the exact combo the reviewer flags as broken: huggingface_hub 0.36.2 + transformers 4.57.6 + diffusers 0.37.1 Flux2KleinPipeline: True (imports cleanly) The is_offline_mode ImportError only fires with transformers 5.x, and the standard install path pins transformers==4.57.6 via constraints. The round 26 fix bumped no-torch-runtime.txt + pyproject huggingfacenotorch where the --no-deps install path can land on transformers 5.x; that remains the correct surface. 1. core/inference/diffusion.py: preflight transformers + accelerate via importlib.util.find_spec BEFORE any destructive GPU-owner unload. Diffusers can expose stub pipeline classes when transformers / accelerate are missing, so the load used to drop chat first and fail later inside from_pretrained. find_spec keeps existing tests that stub these modules passing because no real module is executed (round 30 P1 #11). 2. models/export.py ExportGGUFRequest.quantization_method: extend the embedded HF token validator to this field too. Round 23 added the control-char guard but not the token guard; the value is forwarded into worker command lines and reflected in error / success text (round 30 P1 #5). 3. models/data_recipe.py SeedInspectUploadRequest: add _no_control_chars + _reject_embedded_hf_token field_validators to filename and to each entry of file_names. Mirrors the sibling SeedInspectRequest.dataset_name hardening (round 30 P1 #6). 4. frontend/src/features/images/images-page.tsx: defer the initial refreshStatus() call via queueMicrotask so the synchronous setRefreshingStatus(true) inside it does not trip the react-hooks/set-state-in-effect lint on mount (round 30 P2 #12). Deferred (need larger surgery / out of scope for this round): P1 #4 native_path_lease for diffusion local-path loads P1 #7-#10 helper/advisor + public-start window mutual lock symmetry Tests: 98 targeted (diffusion + cached_gguf + inference_validation) pass locally; frontend npm run typecheck passes.
Addresses remaining round-30 reviewer findings against PR unslothai#5754 (diffusion image generation in Unsloth Studio). The studio.txt / constraints.txt / colab-new hub-bump items (round 30 #1-#3) are intentionally skipped: the live B200 Studio install path with huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1 imports Flux2KleinPipeline cleanly and runs end-to-end image generation (see staging CI green on bec81b8 plus round 28-30 local validation suites). The is_offline_mode ImportError the reviewer cites only triggers with transformers 5.x against huggingface_hub 0.x; the constraints pin holds transformers at 4.x so the combo never materialises on the standard install path. Concurrency: close the helper / advisor GPU-start race in all four public load paths (round 30 P1 #7-#10). * Add a _PUBLIC_LOAD_PENDING_COUNT counter in utils/datasets/llm_assist.py, published under _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and cleared by a paired _clear_public_load_window in routes/inference.py. A concurrent helper / advisor start now sees public_load_pending() inside _gpu_workload_busy_for_helper and refuses VRAM until the public load attempt finishes, closing the window between the busy snapshot and the public load flipping its public ownership flags (is_loaded, current_checkpoint, is_training_active, etc.). * Wire the paired clear into all five call sites (GGUF chat, safetensors chat, diffusion image load, training start, export load-checkpoint). The chat path tracks the published tag in a local so the finally clears the same counter on either branch or on early HTTPException. Security: gate /api/inference/images/load against arbitrary local-path probes (round 30 P1 #4). Mirror the chat /api/inference/load native_path_lease boundary so an authenticated session cannot use repo_id or base_repo as a directory probe. * Add native_path_lease + base_repo_native_path_lease to DiffusionLoadRequest (optional; Hub ids skip the lease). * Add _looks_like_local_diffusion_path + a _resolve_diffusion_repo_for_request helper that requires a verified directory-typed native path grant for any value that starts with /, ~, ./, ../, contains a backslash, or expands to an absolute path. The detector deliberately avoids Path.exists so the route does not side-channel filesystem layout via differential error messages. Frontend: split the Images page status fetch from the spinner toggle (round 30 P2 #12). The mount effect and the is_loading auto-poll now call a setState-free fetchAndUpdateStatus; the user-driven Refresh button still calls refreshStatus to flip the spinner. Cleaner separation than the queueMicrotask shim from the prior commit; the eslint react-hooks/set-state-in-effect rule is not in the studio-frontend-ci typecheck gate, and the codebase already has hundreds of pre-existing violations of the same rule. 98 targeted backend tests pass (test_diffusion_routes, test_diffusion_backend, test_inference_model_validation, test_models_get_model_config_case_resolution, test_data_recipe_seed, test_training_raw_support, test_export_log_cursor). Frontend typecheck passes.
Addresses remaining round-30 reviewer findings against PR unslothai#5754 (diffusion image generation in Unsloth Studio). The studio.txt / constraints.txt / colab-new hub-bump items (round 30 #1-#3) are intentionally skipped: the live B200 Studio install path with huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1 imports Flux2KleinPipeline cleanly and runs end-to-end image generation (see staging CI green on bec81b8 plus round 28-30 local validation suites). The is_offline_mode ImportError the reviewer cites only triggers with transformers 5.x against huggingface_hub 0.x; the constraints pin holds transformers at 4.x so the combo never materialises on the standard install path. Concurrency: close the helper / advisor GPU-start race in all four public load paths (round 30 P1 #7-#10). * Add a _PUBLIC_LOAD_PENDING_COUNT counter in utils/datasets/llm_assist.py, published under _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and cleared by a paired _clear_public_load_window in routes/inference.py. A concurrent helper / advisor start now sees public_load_pending() inside _gpu_workload_busy_for_helper and refuses VRAM until the public load attempt finishes, closing the window between the busy snapshot and the public load flipping its public ownership flags (is_loaded, current_checkpoint, is_training_active, etc.). * Wire the paired clear into all five call sites (GGUF chat, safetensors chat, diffusion image load, training start, export load-checkpoint). The chat path tracks the published tag in a local so the finally clears the same counter on either branch or on early HTTPException. Security: gate /api/inference/images/load against arbitrary local-path probes (round 30 P1 #4). Mirror the chat /api/inference/load native_path_lease boundary so an authenticated session cannot use repo_id or base_repo as a directory probe. * Add native_path_lease + base_repo_native_path_lease to DiffusionLoadRequest (optional; Hub ids skip the lease). * Add _looks_like_local_diffusion_path + a _resolve_diffusion_repo_for_request helper that requires a verified directory-typed native path grant for any value that starts with /, ~, ./, ../, contains a backslash, or expands to an absolute path. The detector deliberately avoids Path.exists so the route does not side-channel filesystem layout via differential error messages. Frontend: split the Images page status fetch from the spinner toggle (round 30 P2 #12). The mount effect and the is_loading auto-poll now call a setState-free fetchAndUpdateStatus; the user-driven Refresh button still calls refreshStatus to flip the spinner. Cleaner separation than the queueMicrotask shim from the prior commit; the eslint react-hooks/set-state-in-effect rule is not in the studio-frontend-ci typecheck gate, and the codebase already has hundreds of pre-existing violations of the same rule. 98 targeted backend tests pass (test_diffusion_routes, test_diffusion_backend, test_inference_model_validation, test_models_get_model_config_case_resolution, test_data_recipe_seed, test_training_raw_support, test_export_log_cursor). Frontend typecheck passes.
Staging mirror of unslothai#4852
Original PR: unslothai#4852
Author: rolandtannous
This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.
Original description
Summary
SUPPORTS_LLAMA32glob blocks inFastLanguageModel.from_pretrainedandFastModel.from_pretrainedwithis_model and is_peftso theHfFileSystemHTTP call is only made when both configs could actually existAutoConfigorPeftConfigalready failed to loadTest plan
pytest tests/test_loader_glob_skip.py -v— all 7 tests passunsloth/Qwen3.5-2B) — should no longer hang on slow networks