Skip to content

fix: skip redundant HfFileSystem().glob() calls in loader.py#14

Closed
danielhanchen wants to merge 8 commits into
mainfrom
pr-4852-head
Closed

fix: skip redundant HfFileSystem().glob() calls in loader.py#14
danielhanchen wants to merge 8 commits into
mainfrom
pr-4852-head

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

Staging mirror of unslothai#4852

Original PR: unslothai#4852
Author: rolandtannous

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.


Original description

Summary

  • Guard SUPPORTS_LLAMA32 glob blocks in FastLanguageModel.from_pretrained and FastModel.from_pretrained with is_model and is_peft so the HfFileSystem HTTP call is only made when both configs could actually exist
  • Prevents indefinite hangs on slow/unreliable networks since the glob result is redundant when either AutoConfig or PeftConfig already failed to load
  • Adds 7 unit tests verifying glob is skipped/called in the correct scenarios

Test plan

  • pytest tests/test_loader_glob_skip.py -v — all 7 tests pass
  • Train any HF model (e.g. unsloth/Qwen3.5-2B) — should no longer hang on slow networks
  • Train a LoRA adapter repo — should still detect conflicting configs correctly
  • Train a local model path — behavior unchanged

rolandtannous and others added 4 commits April 5, 2026 03:12
Guard the SUPPORTS_LLAMA32 glob blocks with `is_model and is_peft` so
the HfFileSystem HTTP call is only made when both configs could actually
exist. This prevents indefinite hangs on slow/unreliable networks since
the glob result is redundant when either AutoConfig or PeftConfig
already failed to load.
Tests for the glob skip guard belong in their own PR to keep
the loader change minimal and reviewable.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates unsloth/models/loader.py to include is_model and is_peft checks when handling Llama 3.2 models. The review feedback identifies a logic bug where the both_exist flag is set but never utilized, rendering conflict detection ineffective. It is recommended to raise a RuntimeError immediately when both flags are true to resolve the bug and improve performance by avoiding redundant file system calls.

Comment thread unsloth/models/loader.py

# New transformers need to check manually.
if SUPPORTS_LLAMA32:
if SUPPORTS_LLAMA32 and is_model and is_peft:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The optimization to guard the manual check with is_model and is_peft is a good improvement for performance. However, there is a logic bug in the surrounding code: the both_exist flag set within this block (at line 527) is never checked or used to raise an error for Llama 3.2+ models. This means the conflict detection is currently non-functional for these models.

Furthermore, if is_model and is_peft are both True, it already indicates that both AutoConfig and PeftConfig were successfully loaded for the same model_name, which confirms the conflict. You could potentially avoid the expensive HfFileSystem().glob() call entirely by raising the RuntimeError immediately when both are True, which would also fix the bug and prevent hangs in conflict scenarios.

Comment thread unsloth/models/loader.py

# New transformers need to check manually.
if SUPPORTS_LLAMA32:
if SUPPORTS_LLAMA32 and is_model and is_peft:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the issue in FastLanguageModel.from_pretrained, the both_exist flag set at line 1300 is never subsequently checked, rendering the manual conflict detection for Llama 3.2+ models ineffective. Since is_model and is_peft being True already confirms that both configurations are present and loadable, consider raising the conflict error directly to avoid the redundant and potentially slow HfFileSystem().glob() call.

Copy link
Copy Markdown
Member Author

@danielhanchen danielhanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The goal of this PR is to avoid redundant HfFileSystem().glob() calls in loader.py when either AutoConfig or PeftConfig already failed to load. As a summary, this PR adds and is_model and is_peft guards to the two if SUPPORTS_LLAMA32: blocks in FastLanguageModel.from_pretrained (line 511) and FastModel.from_pretrained (line 1285), skipping the filesystem check when it cannot produce a meaningful result.

Ran 12 independent reviewers (Gemini, Codex, 7 parallel code reviewers, 3 independent code analysis agents). Consolidated findings below.

Reviewers Severity Finding
12/12 Correct The is_model and is_peft guard is semantically correct -- the glob is a no-op when either config load failed
5/12 High (pre-existing) both_exist is set inside the SUPPORTS_LLAMA32 block but never read afterward -- conflict-detection RuntimeError never fires on transformers > 4.45.0
7/12 P2 Test file test_loader_glob_skip.py missing from this branch (split to separate PR)
2/12 Medium (pre-existing) HfFileSystem.glob() has no try/except -- network errors in offline mode raise unhandled exception
1/12 Medium (pre-existing) FastModel.get_model_name at line 1325 missing token/trust_remote_code/load_in_fp8 args vs FastLanguageModel equivalent
1/12 Low (pre-existing) os.path.split on Windows with POSIX HfFileSystem paths returns full path instead of filename

The PR's two-line change is correct and safe. All 12 reviewers approve.

The most significant pre-existing issue (5/12 flagged): the SUPPORTS_LLAMA32 block sets both_exist but nothing reads it afterward. The if both_exist: raise RuntimeError(...) check at line 493/1105 is gated with and not SUPPORTS_LLAMA32, so it never fires on modern transformers. The SUPPORTS_LLAMA32 block was supposed to re-detect the conflict via filesystem checks, but the subsequent raise was never added. This means repos with both config.json and adapter_config.json pass silently on transformers > 4.45.0.

I will push a follow-up commit to address the dead both_exist code and tighten the guard.

danielhanchen and others added 2 commits April 6, 2026 12:07
- Use str.rsplit("/", 1) instead of os.path.split to extract filenames
  from HfFileSystem paths. HfFileSystem always returns POSIX-style paths,
  but os.path.split uses the OS separator, so on Windows the entire path
  was returned as the "filename" and the config name comparison always
  failed.
- Wrap the HfFileSystem().glob() call in try/except to gracefully handle
  network failures (offline mode, timeouts, unreachable Hub). On failure
  both_exist stays False, which is the safe default.
Comment thread unsloth/models/loader.py Fixed
Comment thread unsloth/models/loader.py Fixed
Comment thread unsloth/models/loader.py Fixed
Comment thread unsloth/models/loader.py Fixed
Comment thread unsloth/models/loader.py Fixed
Comment thread unsloth/models/loader.py Fixed
Comment thread unsloth/models/loader.py Fixed
Comment thread unsloth/models/loader.py Fixed
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refines the Llama 3.2 support check by incorporating model and PEFT status and improves cross-platform path handling for HfFileSystem by using manual string splitting. Feedback was provided regarding the use of broad except Exception: blocks, which may mask critical errors and should be replaced with more specific exception handling.

Comment thread unsloth/models/loader.py Outdated
Comment on lines +534 to +535
except Exception:
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a bare except Exception: block is discouraged as it can mask unexpected errors, including keyboard interrupts or system-level exceptions. It is better to catch specific exceptions that are expected during the HfFileSystem operations.

Comment thread unsloth/models/loader.py Outdated
Comment on lines +1316 to +1317
except Exception:
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a bare except Exception: block is discouraged as it can mask unexpected errors, including keyboard interrupts or system-level exceptions. It is better to catch specific exceptions that are expected during the HfFileSystem operations.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Member Author

@danielhanchen danielhanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The goal of this PR is to avoid unnecessary HfFileSystem().glob() calls in the loader when either AutoConfig or PeftConfig already failed. As a summary, this PR guards the SUPPORTS_LLAMA32 glob blocks with is_model and is_peft, replaces os.path.split with str.rsplit("/", 1) for POSIX path handling, and adds try/except Exception: pass around the glob calls.

Ran 13 independent reviewers (Gemini, 7x reviewer.py, 3x subagent, Codex). Consolidated findings:

Reviewers Severity Finding
8/13 High except Exception: pass is too broad -- silently swallows auth errors, programming bugs, and network failures with no logging
5/13 High (pre-existing) both_exist is dead code -- assigned in the SUPPORTS_LLAMA32 block but never read afterward; the RuntimeError at L493 fires before the block
3/13 High The HfFileSystem().glob() call is semantically redundant -- when is_model=True and is_peft=True, both config.json and adapter_config.json must exist (they were just loaded by AutoConfig and PeftConfig respectively)
2/13 Low The Windows os.path.split breakage claim is overstated -- ntpath.split handles forward slashes via altsep
13/13 OK The is_model and is_peft guard is logically correct and eliminates 3 out of 4 unnecessary network calls

Recommended fix: Since the glob is redundant (both files are proven to exist when is_model and is_peft), replace the remote glob call with a direct both_exist = True assignment. This eliminates the try/except entirely and avoids the network call completely -- a better solution than guarding it.

            else:
                # Both AutoConfig and PeftConfig loaded successfully from this
                # remote repo, so both config.json and adapter_config.json
                # definitely exist -- no need for an extra HfFileSystem network call.
                both_exist = True

Concrete suggestions for each finding in inline comments below.

Comment thread unsloth/models/loader.py Outdated
):
both_exist = True
except Exception:
pass
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[8/13 reviewers] The except Exception: pass is too broad. It silently swallows MemoryError, AttributeError, TypeError, auth failures (RepositoryNotFoundError, GatedRepoError), and any other programming errors with zero logging. This makes failures very hard to diagnose.

However, this entire remote branch is unnecessary. When is_model=True and is_peft=True, both config.json (loaded by AutoConfig) and adapter_config.json (loaded by PeftConfig) are proven to exist. The HfFileSystem().glob() call is redundant information gathering.

Suggested change
pass
else:
# Both AutoConfig and PeftConfig loaded successfully from this
# remote repo, so both config.json and adapter_config.json
# definitely exist -- no need for an extra HfFileSystem network call.
both_exist = True

This eliminates the network call entirely, removes the need for try/except, and is provably correct.

Comment thread unsloth/models/loader.py Outdated
):
both_exist = True
except Exception:
pass
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[8/13 reviewers] Same issue as above -- the except Exception: pass and the entire glob call are redundant. Since is_model=True and is_peft=True, both config files are proven to exist.

Suggested change
pass
else:
# Both AutoConfig and PeftConfig loaded successfully from this
# remote repo, so both config.json and adapter_config.json
# definitely exist -- no need for an extra HfFileSystem network call.
both_exist = True

Comment thread unsloth/models/loader.py

# New transformers need to check manually.
if SUPPORTS_LLAMA32:
if SUPPORTS_LLAMA32 and is_model and is_peft:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[5/13 reviewers] Pre-existing issue (not introduced by this PR): both_exist is dead code after this block.

The flow is:

  1. L490: both_exist = (is_model and is_peft) and not SUPPORTS_LLAMA32 -- always False when SUPPORTS_LLAMA32=True
  2. L493: if both_exist: raise RuntimeError(...) -- fires BEFORE this block, only on old transformers
  3. L511-535: This block sets both_exist but it is NEVER READ again afterward

The RuntimeError for "both configs exist" effectively never fires on modern transformers. This is a pre-existing silent safety bypass. The fix would be to add if both_exist: raise RuntimeError(...) after this block, but that changes behavior and should be a separate PR.

When is_model and is_peft are both True, AutoConfig and PeftConfig
have already loaded successfully, proving both config.json and
adapter_config.json exist. The HfFileSystem network call to re-verify
this was redundant and could cause hangs on slow networks.

Replace the glob + try/except block with a direct both_exist = True
assignment.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

Comment thread unsloth/models/loader.py Fixed
Comment thread unsloth/models/loader.py Fixed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the from_pretrained method in unsloth/models/loader.py to include is_model and is_peft in the Llama 3.2 support check. It also removes unnecessary network calls to HfFileSystem for remote repositories, assuming configuration files exist if the initial load was successful. A review comment suggests refactoring the duplicated logic into a helper function to reduce redundancy.

Comment thread unsloth/models/loader.py

# New transformers need to check manually.
if SUPPORTS_LLAMA32:
if SUPPORTS_LLAMA32 and is_model and is_peft:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition is_model and is_peft is checked here and again in the if not is_model and not is_peft block on line 525. This logic is duplicated in the from_pretrained method for both FastLanguageModel and FastModel. Consider refactoring this into a helper function to improve maintainability and reduce code duplication.

HfFileSystem was only used for the glob() calls that were replaced
with direct both_exist = True assignments in the previous commit.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

Comment thread unsloth/models/loader.py
# Both AutoConfig and PeftConfig loaded successfully from this
# remote repo, so both config.json and adapter_config.json
# definitely exist -- no need for an extra HfFileSystem network call.
both_exist = True
Comment thread unsloth/models/loader.py
# Both AutoConfig and PeftConfig loaded successfully from this
# remote repo, so both config.json and adapter_config.json
# definitely exist -- no need for an extra HfFileSystem network call.
both_exist = True
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the model loading process by removing redundant HfFileSystem network calls and refining the Llama 3.2 support check to include is_model and is_peft flags. The reviewer suggests refactoring the repeated conditional logic into a helper function or variable to improve code maintainability.

Comment thread unsloth/models/loader.py

# New transformers need to check manually.
if SUPPORTS_LLAMA32:
if SUPPORTS_LLAMA32 and is_model and is_peft:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition SUPPORTS_LLAMA32 and is_model and is_peft is repeated in multiple places. Consider refactoring this into a helper function or a boolean variable to improve maintainability and readability.

Comment thread unsloth/models/loader.py

# New transformers need to check manually.
if SUPPORTS_LLAMA32:
if SUPPORTS_LLAMA32 and is_model and is_peft:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition SUPPORTS_LLAMA32 and is_model and is_peft is repeated in multiple places. Consider refactoring this into a helper function or a boolean variable to improve maintainability and readability.

@danielhanchen
Copy link
Copy Markdown
Member Author

Fixes pushed to original PR unslothai#4852. Closing staging copy.

danielhanchen added a commit that referenced this pull request May 25, 2026
…guard for PR unslothai#5754

Round 9 reviewer flagged a pile of handoff asymmetries: every
GPU-owning lifecycle change (training, export, chat, images) needed
its own bespoke unload sequence and they had drifted out of sync.
Some skipped llama-server is_active; some missed safetensors
loading_models; export and training did not check is_export_active.

Backend handoff (P1)
  * routes/inference.py: new _release_chat_for / _release_export_for
    helpers. Both treat llama-server as held when is_loaded OR
    is_active, safetensors as held when active_model_name OR
    loading_models is non-empty, and export as held when
    current_checkpoint OR is_export_active. Both helpers run their
    unloads in worker threads so async routes do not block the
    event loop.
  * routes/training.py: replaces its bespoke inline llama / safe /
    export unload sequence with await _release_chat_for / _release_
    export_for.
  * routes/export.py: same swap for the chat unload chain (export
    still does NOT call _release_export_for on itself).
  * routes/inference.py GGUF + standard chat-load paths: now use
    _release_export_for to drop a settled export, and the standard
    path's llama unload now also handles is_active=True (round 9
    review #8).

Backend reject-on-active export (P1 #5)
  * routes/inference.py: new _raise_if_export_active. Symmetric
    with _raise_if_training_active: a long-running export is
    refused with HTTP 409 instead of being silently killed when
    /images/load or /load arrives. Diffusion / images load and
    both chat-load paths call it.
  * core/inference/diffusion.py _release_other_gpu_owners_for_
    diffusion: no longer tears down an in-flight export job. Only
    drops a SETTLED export checkpoint (current_checkpoint
    populated, is_export_active False). Round 9 review #5 -- the
    previous behavior could terminate an in-flight export and
    leave a partial output artifact.

Token leak via logger.exception (P1 #6)
  * core/inference/diffusion.py: load-failure logging now uses
    logger.error(..., exc_msg) with the already-scrubbed string
    and exc_info=False. logger.exception() with the raw Exception
    would expose any hf_... token that diffusers / huggingface_hub
    embedded in the message or traceback locals, defeating the
    earlier in-flight scrub.

Dependency pinning (P1 #11)
  * pyproject.toml: huggingfacenotorch optional extra now pins
    diffusers>=0.37.0. Previously the floor was only set in
    studio/backend/requirements/no-torch-runtime.txt, so a normal
    pip install would resolve diffusers 0.36.0 (no
    Flux2KleinPipeline) and the default curated FLUX.2 klein
    Images model would fail at runtime.

Cache-delete exact match (P1 #14)
  * routes/models.py /delete-cached: llama.cpp and safetensors
    guards now match on exact repo-id (case-insensitive) instead
    of prefix. Diffusion guard already does this; the chat guards
    were the remaining surface where loading org/model-v2
    blocked deleting org/model.
danielhanchen added a commit that referenced this pull request May 25, 2026
Twelve P1 findings from round 26 reviewer aggregate, plus the CI
revert of round 25 P1 #5 to a less invasive location.

1. requirements/studio.txt + requirements/single-env/constraints.txt:
   revert the round 25 huggingface-hub bump (broke Studio Update CI,
   Mac Studio Update CI, Mac Studio UI CI, Studio UI CI all with
   ResolutionImpossible against transformers==4.57.6 which requires
   hub<1.0). Standard install path stays on the well-tested 4.57.6 +
   0.36.2 + trl 0.23.1 trio.

2. requirements/no-torch-runtime.txt + pyproject.toml
   [huggingfacenotorch]: bump huggingface_hub floor from >=0.34.0 to
   >=1.3.0,<2.0 -- this is where the actual transformers 5.x +
   hub 0.36.2 broken combo can land because the file installs
   --no-deps. transformers 5.x calls hub.is_offline_mode which only
   exists in hub 1.x.

3. utils/datasets/llm_assist.py: revert round 25 P1 #4 (helper/advisor
   sharing the global llama backend) which introduced three
   regressions: a chat-evict load race after the busy precheck, a
   finally-block that could unload a user chat model, and an
   identifier mismatch the delete guard could not canonicalize. Go
   back to PRIVATE LlamaCppBackend instances and expose the active
   helper/advisor repos through a new thread-safe registry
   (helper_advisor_owns_repo / _register_helper_advisor_repo /
   _unregister_helper_advisor_repo) so DELETE /api/models/delete-cached
   can still block the rmtree.

4. routes/models.py delete_cached_model: check the new helper/advisor
   registry up front and 409 if a helper/advisor still owns the
   target repo. Closes round 26 P1 #13 and #14 (helper/advisor
   identifiers were prefixed and would never equal the raw repo id).

5. routes/models.py get_lora_base_model: validate lora_path with
   _validate_logged_identifier before it is reflected in 404 detail
   and error logs (round 26 P1 #12).

6. routes/inference.py /unload: round 21 P1 #3 added a "or not
   is_loaded" fallback that let an unload of owner/B cancel a pending
   llama load of owner/A. Replace it with a narrow
   llama_is_starting_without_identifier branch that only fires when
   llama-server is mid-startup with neither identifier set (round 26
   P1 #5).

7. routes/inference.py /unload: poll loading_model_identifier for up
   to 5 s after asyncio.to_thread(unload_model) so a legitimate
   pending-load cancel does not 503 because the load thread has not
   yet observed _cancel_event in its finally (round 26 P2 #15).

8. models/training.py TrainingStartRequest: extend identifier
   hardening to hf_dataset, subset, train_split, eval_split. Round 22
   only guarded model_name (round 26 P1 #10).

9. models/data_recipe.py SeedInspectRequest: add _no_control_chars +
   _reject_embedded_hf_token field_validators on dataset_name (round
   26 P1 #11).

Tests: 105 targeted (diffusion + cached_gguf + llama_cpp_cache +
inference_model_validation + models_get_model_config) and 1768
broader backend tests pass locally. Pre-existing
test_desktop_auth.py, test_studio_api.py, and
test_training_worker_flash_attn.py failures reproduce on HEAD
without these changes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Five additional P1 findings round 27 reviewer flagged on top of the
round 27 commit 6c528fb (Counter refcount + handoff visibility were
already covered). Three remaining studio.txt / no-torch-runtime hub
suggestions are NOT applied because they would re-break CI; the
empirical evidence (round 26 commit 65ea3a2 restored CI green) takes
precedence over the reviewer's stale-state suggestion.

1. models/training.py TrainingStartRequest: extend the embedded HF
   token validator to subset, train_split, eval_split. Round 26 only
   added the control-char guard to those three; the token guard was
   asymmetric and would accept owner/data\\nFAKE hf_abcdef...
   payloads through subset / split fields.

2. models/datasets.py CheckFormatRequest: extend both validators
   (control chars + embedded HF token) to subset and train_split.
   Same asymmetric-fix bug as #1.

3. models/data_recipe.py SeedInspectRequest: extend both validators
   to subset and split. Same pattern.

4. utils/datasets/llm_assist.py precache_helper_gguf: register the
   helper repo in the helper/advisor refcount registry around the
   hf_hub_download loop, then unregister in the finally. Without
   this, the FastAPI-startup background pre-cache could be racing
   a concurrent DELETE /api/models/delete-cached against the same
   cache directory. The runtime helper / advisor calls already
   register (round 26 P1 #13/#14) but the precache was the
   asymmetric gap.

5. routes/models.py _loaded_model_matches_deleted_path: match
   bidirectionally (active under target OR target under active) so
   deleting a child directory of a loaded local model (.../my-flux/
   text_encoder while .../my-flux is loaded) trips the guard.
   Mirrors the diffusion delete-guard symmetric path-overlap check.

Tests: 105 targeted (diffusion + cache + inference_validation) and
the broader backend suite pass locally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants