fix: skip redundant HfFileSystem().glob() calls in loader.py#14
fix: skip redundant HfFileSystem().glob() calls in loader.py#14danielhanchen wants to merge 8 commits into
Conversation
Guard the SUPPORTS_LLAMA32 glob blocks with `is_model and is_peft` so the HfFileSystem HTTP call is only made when both configs could actually exist. This prevents indefinite hangs on slow/unreliable networks since the glob result is redundant when either AutoConfig or PeftConfig already failed to load.
for more information, see https://pre-commit.ci
Tests for the glob skip guard belong in their own PR to keep the loader change minimal and reviewable.
|
/gemini review |
|
@codex review |
|
Codex Review: Didn't find any major issues. Bravo. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Code Review
This pull request updates unsloth/models/loader.py to include is_model and is_peft checks when handling Llama 3.2 models. The review feedback identifies a logic bug where the both_exist flag is set but never utilized, rendering conflict detection ineffective. It is recommended to raise a RuntimeError immediately when both flags are true to resolve the bug and improve performance by avoiding redundant file system calls.
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
There was a problem hiding this comment.
The optimization to guard the manual check with is_model and is_peft is a good improvement for performance. However, there is a logic bug in the surrounding code: the both_exist flag set within this block (at line 527) is never checked or used to raise an error for Llama 3.2+ models. This means the conflict detection is currently non-functional for these models.
Furthermore, if is_model and is_peft are both True, it already indicates that both AutoConfig and PeftConfig were successfully loaded for the same model_name, which confirms the conflict. You could potentially avoid the expensive HfFileSystem().glob() call entirely by raising the RuntimeError immediately when both are True, which would also fix the bug and prevent hangs in conflict scenarios.
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
There was a problem hiding this comment.
Similar to the issue in FastLanguageModel.from_pretrained, the both_exist flag set at line 1300 is never subsequently checked, rendering the manual conflict detection for Llama 3.2+ models ineffective. Since is_model and is_peft being True already confirms that both configurations are present and loadable, consider raising the conflict error directly to avoid the redundant and potentially slow HfFileSystem().glob() call.
danielhanchen
left a comment
There was a problem hiding this comment.
Thank you for the PR! The goal of this PR is to avoid redundant HfFileSystem().glob() calls in loader.py when either AutoConfig or PeftConfig already failed to load. As a summary, this PR adds and is_model and is_peft guards to the two if SUPPORTS_LLAMA32: blocks in FastLanguageModel.from_pretrained (line 511) and FastModel.from_pretrained (line 1285), skipping the filesystem check when it cannot produce a meaningful result.
Ran 12 independent reviewers (Gemini, Codex, 7 parallel code reviewers, 3 independent code analysis agents). Consolidated findings below.
| Reviewers | Severity | Finding |
|---|---|---|
| 12/12 | Correct | The is_model and is_peft guard is semantically correct -- the glob is a no-op when either config load failed |
| 5/12 | High (pre-existing) | both_exist is set inside the SUPPORTS_LLAMA32 block but never read afterward -- conflict-detection RuntimeError never fires on transformers > 4.45.0 |
| 7/12 | P2 | Test file test_loader_glob_skip.py missing from this branch (split to separate PR) |
| 2/12 | Medium (pre-existing) | HfFileSystem.glob() has no try/except -- network errors in offline mode raise unhandled exception |
| 1/12 | Medium (pre-existing) | FastModel.get_model_name at line 1325 missing token/trust_remote_code/load_in_fp8 args vs FastLanguageModel equivalent |
| 1/12 | Low (pre-existing) | os.path.split on Windows with POSIX HfFileSystem paths returns full path instead of filename |
The PR's two-line change is correct and safe. All 12 reviewers approve.
The most significant pre-existing issue (5/12 flagged): the SUPPORTS_LLAMA32 block sets both_exist but nothing reads it afterward. The if both_exist: raise RuntimeError(...) check at line 493/1105 is gated with and not SUPPORTS_LLAMA32, so it never fires on modern transformers. The SUPPORTS_LLAMA32 block was supposed to re-detect the conflict via filesystem checks, but the subsequent raise was never added. This means repos with both config.json and adapter_config.json pass silently on transformers > 4.45.0.
I will push a follow-up commit to address the dead both_exist code and tighten the guard.
- Use str.rsplit("/", 1) instead of os.path.split to extract filenames
from HfFileSystem paths. HfFileSystem always returns POSIX-style paths,
but os.path.split uses the OS separator, so on Windows the entire path
was returned as the "filename" and the config name comparison always
failed.
- Wrap the HfFileSystem().glob() call in try/except to gracefully handle
network failures (offline mode, timeouts, unreachable Hub). On failure
both_exist stays False, which is the safe default.
for more information, see https://pre-commit.ci
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request refines the Llama 3.2 support check by incorporating model and PEFT status and improves cross-platform path handling for HfFileSystem by using manual string splitting. Feedback was provided regarding the use of broad except Exception: blocks, which may mask critical errors and should be replaced with more specific exception handling.
| except Exception: | ||
| pass |
| except Exception: | ||
| pass |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
danielhanchen
left a comment
There was a problem hiding this comment.
Thank you for the PR! The goal of this PR is to avoid unnecessary HfFileSystem().glob() calls in the loader when either AutoConfig or PeftConfig already failed. As a summary, this PR guards the SUPPORTS_LLAMA32 glob blocks with is_model and is_peft, replaces os.path.split with str.rsplit("/", 1) for POSIX path handling, and adds try/except Exception: pass around the glob calls.
Ran 13 independent reviewers (Gemini, 7x reviewer.py, 3x subagent, Codex). Consolidated findings:
| Reviewers | Severity | Finding |
|---|---|---|
| 8/13 | High | except Exception: pass is too broad -- silently swallows auth errors, programming bugs, and network failures with no logging |
| 5/13 | High (pre-existing) | both_exist is dead code -- assigned in the SUPPORTS_LLAMA32 block but never read afterward; the RuntimeError at L493 fires before the block |
| 3/13 | High | The HfFileSystem().glob() call is semantically redundant -- when is_model=True and is_peft=True, both config.json and adapter_config.json must exist (they were just loaded by AutoConfig and PeftConfig respectively) |
| 2/13 | Low | The Windows os.path.split breakage claim is overstated -- ntpath.split handles forward slashes via altsep |
| 13/13 | OK | The is_model and is_peft guard is logically correct and eliminates 3 out of 4 unnecessary network calls |
Recommended fix: Since the glob is redundant (both files are proven to exist when is_model and is_peft), replace the remote glob call with a direct both_exist = True assignment. This eliminates the try/except entirely and avoids the network call completely -- a better solution than guarding it.
else:
# Both AutoConfig and PeftConfig loaded successfully from this
# remote repo, so both config.json and adapter_config.json
# definitely exist -- no need for an extra HfFileSystem network call.
both_exist = True
Concrete suggestions for each finding in inline comments below.
| ): | ||
| both_exist = True | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
[8/13 reviewers] The except Exception: pass is too broad. It silently swallows MemoryError, AttributeError, TypeError, auth failures (RepositoryNotFoundError, GatedRepoError), and any other programming errors with zero logging. This makes failures very hard to diagnose.
However, this entire remote branch is unnecessary. When is_model=True and is_peft=True, both config.json (loaded by AutoConfig) and adapter_config.json (loaded by PeftConfig) are proven to exist. The HfFileSystem().glob() call is redundant information gathering.
| pass | |
| else: | |
| # Both AutoConfig and PeftConfig loaded successfully from this | |
| # remote repo, so both config.json and adapter_config.json | |
| # definitely exist -- no need for an extra HfFileSystem network call. | |
| both_exist = True |
This eliminates the network call entirely, removes the need for try/except, and is provably correct.
| ): | ||
| both_exist = True | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
[8/13 reviewers] Same issue as above -- the except Exception: pass and the entire glob call are redundant. Since is_model=True and is_peft=True, both config files are proven to exist.
| pass | |
| else: | |
| # Both AutoConfig and PeftConfig loaded successfully from this | |
| # remote repo, so both config.json and adapter_config.json | |
| # definitely exist -- no need for an extra HfFileSystem network call. | |
| both_exist = True |
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
There was a problem hiding this comment.
[5/13 reviewers] Pre-existing issue (not introduced by this PR): both_exist is dead code after this block.
The flow is:
- L490:
both_exist = (is_model and is_peft) and not SUPPORTS_LLAMA32-- alwaysFalsewhenSUPPORTS_LLAMA32=True - L493:
if both_exist: raise RuntimeError(...)-- fires BEFORE this block, only on old transformers - L511-535: This block sets
both_existbut it is NEVER READ again afterward
The RuntimeError for "both configs exist" effectively never fires on modern transformers. This is a pre-existing silent safety bypass. The fix would be to add if both_exist: raise RuntimeError(...) after this block, but that changes behavior and should be a separate PR.
When is_model and is_peft are both True, AutoConfig and PeftConfig have already loaded successfully, proving both config.json and adapter_config.json exist. The HfFileSystem network call to re-verify this was redundant and could cause hangs on slow networks. Replace the glob + try/except block with a direct both_exist = True assignment.
|
/gemini review |
|
@codex review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Code Review
This pull request updates the from_pretrained method in unsloth/models/loader.py to include is_model and is_peft in the Llama 3.2 support check. It also removes unnecessary network calls to HfFileSystem for remote repositories, assuming configuration files exist if the initial load was successful. A review comment suggests refactoring the duplicated logic into a helper function to reduce redundancy.
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
There was a problem hiding this comment.
The condition is_model and is_peft is checked here and again in the if not is_model and not is_peft block on line 525. This logic is duplicated in the from_pretrained method for both FastLanguageModel and FastModel. Consider refactoring this into a helper function to improve maintainability and reduce code duplication.
HfFileSystem was only used for the glob() calls that were replaced with direct both_exist = True assignments in the previous commit.
|
/gemini review |
|
@codex review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
| # Both AutoConfig and PeftConfig loaded successfully from this | ||
| # remote repo, so both config.json and adapter_config.json | ||
| # definitely exist -- no need for an extra HfFileSystem network call. | ||
| both_exist = True |
| # Both AutoConfig and PeftConfig loaded successfully from this | ||
| # remote repo, so both config.json and adapter_config.json | ||
| # definitely exist -- no need for an extra HfFileSystem network call. | ||
| both_exist = True |
There was a problem hiding this comment.
Code Review
This pull request optimizes the model loading process by removing redundant HfFileSystem network calls and refining the Llama 3.2 support check to include is_model and is_peft flags. The reviewer suggests refactoring the repeated conditional logic into a helper function or variable to improve code maintainability.
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
|
|
||
| # New transformers need to check manually. | ||
| if SUPPORTS_LLAMA32: | ||
| if SUPPORTS_LLAMA32 and is_model and is_peft: |
|
Fixes pushed to original PR unslothai#4852. Closing staging copy. |
…guard for PR unslothai#5754 Round 9 reviewer flagged a pile of handoff asymmetries: every GPU-owning lifecycle change (training, export, chat, images) needed its own bespoke unload sequence and they had drifted out of sync. Some skipped llama-server is_active; some missed safetensors loading_models; export and training did not check is_export_active. Backend handoff (P1) * routes/inference.py: new _release_chat_for / _release_export_for helpers. Both treat llama-server as held when is_loaded OR is_active, safetensors as held when active_model_name OR loading_models is non-empty, and export as held when current_checkpoint OR is_export_active. Both helpers run their unloads in worker threads so async routes do not block the event loop. * routes/training.py: replaces its bespoke inline llama / safe / export unload sequence with await _release_chat_for / _release_ export_for. * routes/export.py: same swap for the chat unload chain (export still does NOT call _release_export_for on itself). * routes/inference.py GGUF + standard chat-load paths: now use _release_export_for to drop a settled export, and the standard path's llama unload now also handles is_active=True (round 9 review #8). Backend reject-on-active export (P1 #5) * routes/inference.py: new _raise_if_export_active. Symmetric with _raise_if_training_active: a long-running export is refused with HTTP 409 instead of being silently killed when /images/load or /load arrives. Diffusion / images load and both chat-load paths call it. * core/inference/diffusion.py _release_other_gpu_owners_for_ diffusion: no longer tears down an in-flight export job. Only drops a SETTLED export checkpoint (current_checkpoint populated, is_export_active False). Round 9 review #5 -- the previous behavior could terminate an in-flight export and leave a partial output artifact. Token leak via logger.exception (P1 #6) * core/inference/diffusion.py: load-failure logging now uses logger.error(..., exc_msg) with the already-scrubbed string and exc_info=False. logger.exception() with the raw Exception would expose any hf_... token that diffusers / huggingface_hub embedded in the message or traceback locals, defeating the earlier in-flight scrub. Dependency pinning (P1 #11) * pyproject.toml: huggingfacenotorch optional extra now pins diffusers>=0.37.0. Previously the floor was only set in studio/backend/requirements/no-torch-runtime.txt, so a normal pip install would resolve diffusers 0.36.0 (no Flux2KleinPipeline) and the default curated FLUX.2 klein Images model would fail at runtime. Cache-delete exact match (P1 #14) * routes/models.py /delete-cached: llama.cpp and safetensors guards now match on exact repo-id (case-insensitive) instead of prefix. Diffusion guard already does this; the chat guards were the remaining surface where loading org/model-v2 blocked deleting org/model.
Twelve P1 findings from round 26 reviewer aggregate, plus the CI revert of round 25 P1 #5 to a less invasive location. 1. requirements/studio.txt + requirements/single-env/constraints.txt: revert the round 25 huggingface-hub bump (broke Studio Update CI, Mac Studio Update CI, Mac Studio UI CI, Studio UI CI all with ResolutionImpossible against transformers==4.57.6 which requires hub<1.0). Standard install path stays on the well-tested 4.57.6 + 0.36.2 + trl 0.23.1 trio. 2. requirements/no-torch-runtime.txt + pyproject.toml [huggingfacenotorch]: bump huggingface_hub floor from >=0.34.0 to >=1.3.0,<2.0 -- this is where the actual transformers 5.x + hub 0.36.2 broken combo can land because the file installs --no-deps. transformers 5.x calls hub.is_offline_mode which only exists in hub 1.x. 3. utils/datasets/llm_assist.py: revert round 25 P1 #4 (helper/advisor sharing the global llama backend) which introduced three regressions: a chat-evict load race after the busy precheck, a finally-block that could unload a user chat model, and an identifier mismatch the delete guard could not canonicalize. Go back to PRIVATE LlamaCppBackend instances and expose the active helper/advisor repos through a new thread-safe registry (helper_advisor_owns_repo / _register_helper_advisor_repo / _unregister_helper_advisor_repo) so DELETE /api/models/delete-cached can still block the rmtree. 4. routes/models.py delete_cached_model: check the new helper/advisor registry up front and 409 if a helper/advisor still owns the target repo. Closes round 26 P1 #13 and #14 (helper/advisor identifiers were prefixed and would never equal the raw repo id). 5. routes/models.py get_lora_base_model: validate lora_path with _validate_logged_identifier before it is reflected in 404 detail and error logs (round 26 P1 #12). 6. routes/inference.py /unload: round 21 P1 #3 added a "or not is_loaded" fallback that let an unload of owner/B cancel a pending llama load of owner/A. Replace it with a narrow llama_is_starting_without_identifier branch that only fires when llama-server is mid-startup with neither identifier set (round 26 P1 #5). 7. routes/inference.py /unload: poll loading_model_identifier for up to 5 s after asyncio.to_thread(unload_model) so a legitimate pending-load cancel does not 503 because the load thread has not yet observed _cancel_event in its finally (round 26 P2 #15). 8. models/training.py TrainingStartRequest: extend identifier hardening to hf_dataset, subset, train_split, eval_split. Round 22 only guarded model_name (round 26 P1 #10). 9. models/data_recipe.py SeedInspectRequest: add _no_control_chars + _reject_embedded_hf_token field_validators on dataset_name (round 26 P1 #11). Tests: 105 targeted (diffusion + cached_gguf + llama_cpp_cache + inference_model_validation + models_get_model_config) and 1768 broader backend tests pass locally. Pre-existing test_desktop_auth.py, test_studio_api.py, and test_training_worker_flash_attn.py failures reproduce on HEAD without these changes.
Five additional P1 findings round 27 reviewer flagged on top of the round 27 commit 6c528fb (Counter refcount + handoff visibility were already covered). Three remaining studio.txt / no-torch-runtime hub suggestions are NOT applied because they would re-break CI; the empirical evidence (round 26 commit 65ea3a2 restored CI green) takes precedence over the reviewer's stale-state suggestion. 1. models/training.py TrainingStartRequest: extend the embedded HF token validator to subset, train_split, eval_split. Round 26 only added the control-char guard to those three; the token guard was asymmetric and would accept owner/data\\nFAKE hf_abcdef... payloads through subset / split fields. 2. models/datasets.py CheckFormatRequest: extend both validators (control chars + embedded HF token) to subset and train_split. Same asymmetric-fix bug as #1. 3. models/data_recipe.py SeedInspectRequest: extend both validators to subset and split. Same pattern. 4. utils/datasets/llm_assist.py precache_helper_gguf: register the helper repo in the helper/advisor refcount registry around the hf_hub_download loop, then unregister in the finally. Without this, the FastAPI-startup background pre-cache could be racing a concurrent DELETE /api/models/delete-cached against the same cache directory. The runtime helper / advisor calls already register (round 26 P1 #13/#14) but the precache was the asymmetric gap. 5. routes/models.py _loaded_model_matches_deleted_path: match bidirectionally (active under target OR target under active) so deleting a child directory of a loaded local model (.../my-flux/ text_encoder while .../my-flux is loaded) trips the guard. Mirrors the diffusion delete-guard symmetric path-overlap check. Tests: 105 targeted (diffusion + cache + inference_validation) and the broader backend suite pass locally.
Staging mirror of unslothai#4852
Original PR: unslothai#4852
Author: rolandtannous
This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.
Original description
Summary
SUPPORTS_LLAMA32glob blocks inFastLanguageModel.from_pretrainedandFastModel.from_pretrainedwithis_model and is_peftso theHfFileSystemHTTP call is only made when both configs could actually existAutoConfigorPeftConfigalready failed to loadTest plan
pytest tests/test_loader_glob_skip.py -v— all 7 tests passunsloth/Qwen3.5-2B) — should no longer hang on slow networks