Skip to content

Studio: local diffusion image generation page#5754

Open
danielhanchen wants to merge 94 commits into
mainfrom
studio-diffusion-images
Open

Studio: local diffusion image generation page#5754
danielhanchen wants to merge 94 commits into
mainfrom
studio-diffusion-images

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

Adds a self-contained Images page in Studio for generating images from local diffusion GGUFs. Pairs with the diffusion GGUFs already published under https://huggingface.co/collections/unsloth/unsloth-diffusion-ggufs (FLUX.2, FLUX.2 klein, FLUX.1, Qwen-Image), and works with anything else the diffusers GGUFQuantizationConfig path supports.

Backend

  • studio/backend/core/inference/diffusion.py introduces a DiffusionBackend singleton that loads diffusion checkpoints in either the standard diffusers layout or the single-file GGUF layout. GGUF files are dynamically dequantised on-device via diffusers.GGUFQuantizationConfig, then the rest of the pipeline (VAE, text encoders, scheduler) is pulled from the matching diffusers base repo so end users only ever need one local file plus the metadata repo. Auto-picks bfloat16 on CUDA, float16 on MPS, and float32 on CPU; falls back to CPU offload on CUDA to fit FLUX.2 dev on a 16 GB card.
  • studio/backend/routes/inference.py adds four routes mirroring the llama-server lifecycle:
    • POST /api/inference/images/load
    • POST /api/inference/images/generate
    • POST /api/inference/images/unload
    • GET /api/inference/images/status
  • studio/backend/models/inference.py adds DiffusionLoadRequest, DiffusionGenerateRequest, and DiffusionGenerateResponse pydantic schemas. The generate validator rejects non-multiple-of-8 sizes up front so callers see a clean 422 rather than a cryptic VAE crash.
  • studio/backend/requirements/no-torch-runtime.txt pins gguf alongside the existing diffusers entry so GGUFQuantizationConfig works out of the box.
  • studio/backend/tests/test_diffusion_backend.py and studio/backend/tests/test_diffusion_routes.py add 27 CPU-only unit tests covering family detection, validation, lifecycle, and the full FastAPI round trip with diffusers stubbed via sys.modules. Stays well under one minute on a CPU runner.

Frontend

  • studio/frontend/src/features/images/ adds a standalone images-page.tsx with a curated FLUX.2 picker (klein 4B / klein 9B / dev / FLUX.1 dev), a HF token field for gated repos, prompt + negative prompt, resolution presets (square 1024/768/512, 832x1216, 1216x832), steps and guidance sliders, an optional seed, and a result gallery that renders the base64 PNGs inline with the run parameters as a caption.
  • studio/frontend/src/features/images/api.ts is the typed thin client over the four backend routes using the existing authFetch + readFastApiError helpers.
  • studio/frontend/src/app/routes/images.tsx lazy-loads the page on /images behind requireAuth.
  • studio/frontend/src/components/app-sidebar.tsx adds a PaintBrush02Icon nav item between Recipes and Export. Hidden in chat-only mode.
  • studio/frontend/src/app/router.tsx registers the new route in the existing addChildren list.

CI validation

Validated on ubuntu-latest, macos-14, and windows-latest on a staging fork (unslothai/unsloth-staging-1 PR #88) before opening this PR. The 27 backend tests pass on all three runners with the minimal dep set listed in the diff. No CI files are changed in this PR; the existing Backend CI and consolidated tests workflows will auto-pick up the new test files via their tests/** path filters.

Test plan

  • python -m pytest studio/backend/tests/test_diffusion_backend.py studio/backend/tests/test_diffusion_routes.py (27 passed)
  • Live image generation on a B200 with unsloth/FLUX.2-klein-4B-GGUF (running locally now, screenshots will be attached in the PR review)
  • Cross-platform CI on the staging fork (Ubuntu / macOS / Windows)

Backend
- core/inference/diffusion.py: DiffusionBackend singleton that loads
  diffusion GGUFs from Hugging Face via diffusers.GGUFQuantizationConfig
  and runs them on the active CUDA / MPS / CPU device. Supports FLUX.2,
  FLUX.2 klein, FLUX.1, Qwen-Image, Stable Diffusion 3, and SDXL.
- routes/inference.py: POST /api/inference/images/load,
  POST /api/inference/images/generate, POST /api/inference/images/unload,
  GET /api/inference/images/status mirroring the llama-server lifecycle.
- models/inference.py: DiffusionLoadRequest, DiffusionGenerateRequest,
  DiffusionGenerateResponse pydantic schemas with prompt / step / size
  validation up front so callers get clear 422s rather than VAE crashes.
- requirements/no-torch-runtime.txt: pin gguf alongside the existing
  diffusers entry so GGUFQuantizationConfig works out of the box.
- tests/test_diffusion_backend.py + tests/test_diffusion_routes.py:
  27 unit tests covering family detection, validation, lifecycle, and
  the full FastAPI round trip with the backend stubbed. No torch /
  diffusers / GPU required to run.

Frontend
- features/images/: standalone images-page.tsx with curated model picker
  (FLUX.2 klein 4B / 9B, FLUX.2 dev, FLUX.1 dev), HF token field,
  prompt + negative prompt, resolution presets, steps + guidance
  sliders, seed input, and a result gallery that renders base64 PNGs
  inline.
- app/routes/images.tsx: lazy /images route wired into router.tsx.
- components/app-sidebar.tsx: PaintBrush02Icon nav item between
  Recipes and Export, hidden in chat-only mode.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a local diffusion image generation backend and a corresponding UI page. It supports loading Hugging Face checkpoints in standard or GGUF formats, with specific support for the Flux and Qwen-Image families. The backend manages model lifecycles (load, generate, unload) to optimize GPU memory usage. Review feedback highlighted the use of the deprecated asyncio.get_event_loop() function in two locations, recommending asyncio.get_running_loop() instead.

Comment thread studio/backend/routes/inference.py Outdated
"""
backend = _get_diffusion_backend()
try:
status = await asyncio.get_event_loop().run_in_executor(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The asyncio.get_event_loop() function is deprecated since Python 3.10. Use asyncio.get_running_loop() instead to retrieve the current running event loop.

Suggested change
status = await asyncio.get_event_loop().run_in_executor(
status = await asyncio.get_running_loop().run_in_executor(

with _singleton_lock:
if _singleton is None:
_singleton = DiffusionBackend()
return _singleton
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The asyncio.get_event_loop() function is deprecated since Python 3.10. Use asyncio.get_running_loop() instead to retrieve the current running event loop.

Suggested change
return _singleton
loop = asyncio.get_running_loop()

danielhanchen and others added 4 commits May 24, 2026 14:38
SectionCard requires an icon prop. Pass GpuIcon, PaintBrush02Icon,
and SparklesIcon for the three sections so tsc -b stops failing on
TS2741 'Property icon is missing'.
Backend
- Fix FLUX.2 klein family default base_repo: black-forest-labs/FLUX.2-klein
  does not exist on the Hub. Point at the Apache 2.0 4B Base instead so
  the from_pretrained call works out of the box for ungated users.
- Serialise concurrent load_model calls with a dedicated _load_lock so
  two /images/load requests cannot both reach pipeline_cls.from_pretrained
  at the same time (would double-spend VRAM and corrupt _pipe).
- When the caller passes a full diffusers repo (no gguf_filename),
  use repo_id directly instead of silently substituting the family
  default. Closes the load-the-wrong-model regression flagged by review.
- Drop negative_prompt from the pipeline call when the loaded pipeline
  does not accept it (FLUX.2 / FLUX.2 klein). Inspect __call__ via
  inspect.signature so we do not maintain a manual class list.
- Best-effort unload the chat backend (llama-server) before a diffusion
  load so a 24 GB consumer GPU can swap between chat and diffusion
  without manual unload steps.

Frontend
- Replace the four curated entries with the actual filenames published
  on the Hub (lowercase flux-2-klein-Nb-Q4_K_S.gguf and flux2-dev*).
- Add an explicit base_repo per curated entry so the backend never
  falls back to the family default for the curated picker.
- Add the Apache 2.0 FLUX.2 klein base 4B entry so first-time users
  have an ungated, no-token-required default.
- Hide the negative prompt field for FLUX.2 / FLUX.2 klein and show a
  small explanatory note instead.

Tests
- Add 6 new backend tests: base_repo override, full-repo (no GGUF)
  no-substitution, concurrent serialise race, signature-based kwarg
  filter, negative_prompt strip on FLUX.2, negative_prompt preserved
  on supporting pipelines. 33 tests passing.
- unload_model now takes _load_lock so it cannot race with an in-flight
  load_model and have the load thread overwrite cleared state after
  unload returned is_loaded=false.
- Move stable-diffusion-xl out of _FAMILIES into _FULL_REPO_FAMILIES.
  SDXL uses a UNet (no transformer GGUF path is wired); listing it in
  the GGUF families panel was misleading. SDXL full-repo loads still
  work via family_override='stable-diffusion-xl'.
- Result gallery now uses h-auto + object-contain so portrait /
  landscape outputs render at their true aspect ratio instead of
  being cropped into a square thumbnail.
danielhanchen added a commit to unslothai/unsloth-staging-1 that referenced this pull request May 24, 2026
_release_chat_backend_for_diffusion now unloads both the GGUF
chat backend (llama-server) and the safetensors / HF chat backend
(get_inference_backend) before a diffusion load. Mirror the
behaviour on the chat-load side: both the Unsloth/transformers
load path and the GGUF load path now unload the diffusion pipeline
before claiming GPU memory. Closes the OOM-on-swap path flagged
by reviewers in both directions.
- _smart_base_repo: pick 9B base for unsloth/FLUX.2-klein-9B-GGUF
  and -base- variants per the repo id, instead of always falling
  back to the 4B family default.
- pipe_kwargs use_safetensors=True so diffusers refuses pickle .bin
  weights at load time (defends against compromised base_repo).
- Release the previous pipeline BEFORE allocating the new one so
  peak VRAM stays at one model's worth instead of two on swap.
- Reject empty gguf_filename when repo_id ends with -GGUF; the prior
  behavior tried from_pretrained on a GGUF-only repo and 500'd deep
  in diffusers with a confusing model-index error.
- Status returns gguf_filename (basename) instead of gguf_path so
  the local cache path / username does not leak to authenticated
  Studio sessions.
- requirements/no-torch-runtime.txt: pin diffusers>=0.37.0 so older
  installs cannot resolve a version without Flux2KleinPipeline.
- Frontend curated distilled klein entries now point at the
  matching non-base diffusers repos (FLUX.2-klein-4B / -9B) per
  the published model cards. Update api.ts to mirror the renamed
  status field.
danielhanchen added a commit to unslothai/unsloth-staging-1 that referenced this pull request May 25, 2026
pre-commit-ci Bot and others added 2 commits May 25, 2026 00:04
- routes/export.py load_checkpoint now unloads the diffusion
  pipeline alongside the existing inference + training unloads, so
  an export load after Images does not OOM the export subprocess.
- Remove the 'sd3.5' alias from the stable-diffusion-3 family.
  SD3.5 needs its own family + base_repo (and its own smoke test);
  pairing it with the SD3 Medium base produced a misleading load.
@danielhanchen
Copy link
Copy Markdown
Member Author

Iteration update on the PR, post-review fixes:

Review-driven fixes

  • Backend FLUX.2 klein default base_repo now points at a real Hub repo (black-forest-labs/FLUX.2-klein-base-4B), not the bare FLUX.2-klein slug that 404s.
  • Added _smart_base_repo so unsloth/FLUX.2-klein-9B-GGUF without an explicit base_repo resolves to the matching 9B base, not the 4B fallback.
  • Reject repo_id ending in -GGUF with no gguf_filename instead of letting from_pretrained raise a confusing model-index error deep in diffusers.
  • Custom full diffusers repo path no longer silently substitutes the family default; it now calls from_pretrained on the user-supplied repo_id.
  • pipeline.from_pretrained now passes use_safetensors=True so pickle-backed .bin weights are refused at load time.
  • load_model is serialised by a dedicated _load_lock; unload_model takes the same lock so unload cannot race with an in-flight load.
  • Previous diffusion pipeline is released before allocating the new one so peak VRAM stays at one model's worth during swap.
  • Symmetric chat handoff: GGUF chat load and standard Unsloth chat load both unload an active diffusion pipeline first, and the export load path does the same. _release_chat_backend_for_diffusion now unloads both the GGUF and safetensors chat backends.
  • _pipe_accepts_kwarg filters negative_prompt out for pipelines whose __call__ does not accept it (FLUX.2 / FLUX.2 klein) instead of letting the request 500.
  • Status returns gguf_filename (basename) instead of gguf_path (full local cache path) to avoid leaking the system username on default POSIX layouts.
  • SDXL moved into a separate _FULL_REPO_FAMILIES table; the GGUF status panel no longer advertises SDXL GGUF support that is not wired.
  • Removed the sd3.5 alias from the SD3 Medium family (different transformer config + base repo; needs its own family before we ship it).
  • Result gallery uses h-auto + object-contain so portrait / landscape outputs render at the true aspect ratio.

Frontend

  • Curated picker entries now use the canonical Hub filenames (flux-2-klein-4b-Q4_K_S.gguf, etc) and explicit per-entry base_repo so users hit the right Hub repo without relying on the auto-pick.
  • Added FLUX.2 klein base 4B (Apache 2.0) as the default first entry so first-time users have an ungated, no-token-required option.
  • Negative prompt field is hidden on FLUX.2 / FLUX.2 klein with a short explanatory note.

Verification

  • 38 backend unit tests pass locally (pytest tests/test_diffusion_backend.py tests/test_diffusion_routes.py).
  • Frontend npm run build succeeds.
  • Live /api/inference/images/{status,load,generate,unload} exercised against a running Studio instance on a B200; status JSON returns the new gguf_filename field and the curated picker renders correctly.
  • Cross-platform CI on unslothai/unsloth-staging-1 (Ubuntu / macOS 14 / Windows): all three runners green after the round 1 + round 2 fixes.
  • Browser screenshots: https://huggingface.co/datasets/danielhanchen/minimax-2.7-analysis/tree/main/pr5754_diffusion_images_round3

_release_chat_backend_for_diffusion was importing
get_inference_backend from core.inference.inference (the in-subprocess
class) and calling unload_model() without the required model_name
argument. The TypeError was swallowed and the active chat model
stayed resident, defeating the chat-to-diffusion lifecycle handoff.

Switch to the orchestrator's accessor at core.inference and pass
active_model_name through, mirroring the GGUF chat-load path. Add a
regression test that stubs both backends and verifies unload_model
is called with the active model name.
danielhanchen added a commit to unslothai/unsloth-staging-1 that referenced this pull request May 25, 2026
pre-commit-ci Bot and others added 3 commits May 25, 2026 00:23
- detect_family adds _FAMILY_EXCLUDE so 'stable-diffusion-3.5' no
  longer matches the SD3 Medium family and 'qwen-image-edit' no
  longer matches Qwen-Image. Both were misleading silent loads.
- from_single_file now forwards config=<effective_base>,
  subfolder='transformer', and the HF token. Diffusers-format GGUFs
  (FLUX.2 klein, Qwen-Image, SD3) need the matching base config or
  the transformer load picks the wrong shapes; gated GGUFs need the
  token both for download and config read.
- Move _release_chat_backend_for_diffusion + new
  _release_other_gpu_owners_for_diffusion to AFTER the GGUF download
  and pipeline class lookup so a typo or transient Hub error does
  not kill the user's currently-loaded chat model. Peak VRAM still
  stays at one model's worth because the releases run right before
  from_pretrained.
- _release_other_gpu_owners_for_diffusion: shut down the export
  subprocess and any active training subprocess before a diffusion
  load. Symmetric with the export load path.
- routes/training.py: unload diffusion before starting training so
  the new subprocess does not race FLUX/Qwen for VRAM.
- routes/export.py: also unload the GGUF llama-server before export
  load (the existing inference-backend unload only covered the
  safetensors path).
danielhanchen added a commit to unslothai/unsloth-staging-1 that referenced this pull request May 25, 2026
danielhanchen and others added 8 commits May 25, 2026 00:33
When a swap load fails after the previous pipeline is released,
status() previously reported is_loaded=false on top of the OLD
repo/family/base_repo metadata, which the frontend then rendered
as a misleading 'still loaded: X' label. Clear all metadata
atomically with the pipe drop so a failed swap reports a clean
empty status plus last_error. Add regression test.
… precision for PR #5754

- DiffusionBackend.status() now takes _lock so frontend polling
  cannot observe a torn snapshot mid-swap.
- Scrub hf_token / pipe_kwargs / single_file_kwargs from frame
  locals before logger.exception() so rich tracebacks and structlog
  formatters that render locals do not leak hf_... tokens into logs.
- routes/models.py delete_cached_repo: refuse to delete the cache
  underlying a currently-loaded diffusion pipeline (both the GGUF
  repo and the matching diffusers base_repo). Symmetric with the
  existing chat-load + GGUF guard.
- Frontend seed validation: reject non-integer and out-of-safe-
  integer-range inputs instead of silently rounding via Number(),
  which would otherwise send a different seed than what the user
  typed.
…e prompt (PR #5754)

QwenImagePipeline and FluxPipeline treat guidance_scale as the
distilled CFG factor and expose true_cfg_scale as the real
classifier-free guidance knob. Negative prompts only steer the
output when true_cfg_scale > 1, so forwarding only guidance_scale
left Qwen-Image on the default true_cfg_scale=4.0 and the user's
slider value silently ineffective for negative prompts.

When the loaded pipeline accepts both negative_prompt and
true_cfg_scale and the caller supplies a non-empty negative
prompt, forward guidance_scale through both kwargs so the
negative prompt actually steers generation. When no negative
prompt is supplied, true_cfg_scale is left at the model default
to avoid switching distilled CFG models into real-CFG mode (which
would double inference cost and degrade quality).

Adds two regression tests covering the forward-when-negative and
skip-when-no-negative paths.


The Studio backend's no-torch-runtime.txt is installed via
pip --no-deps so the diffusion stack's transitive imports must
be pinned explicitly. huggingface_hub's blob downloader (used by
diffusers.GGUFQuantizationConfig and by every from_single_file
call) imports requests + urllib3 + charset_normalizer at module
load time; a fresh --no-deps install would 500 on the first
/api/inference/images/load with PackageNotFoundError: 'requests'.

Adds requests, urllib3, and charset_normalizer to the
transitive-deps block next to the existing httpx chain.
…5754

Round 5 reviewer findings, mostly symmetric-lifecycle and input
validation gaps the earlier rounds left open.

Backend lifecycle (P1)
  * routes/training.py: training start now also unloads the GGUF
    llama-server subprocess; was previously only unloading the
    safetensors backend, so starting training while a GGUF chat
    model was loaded kept the subprocess pinned to VRAM.
  * routes/inference.py: new _raise_if_training_active helper. Both
    GGUF and standard chat loads, plus /api/inference/images/load,
    now refuse with HTTP 409 when training is active instead of
    silently stopping training to free VRAM.
  * core/inference/diffusion.py: _release_other_gpu_owners_for_
    diffusion no longer stops active training. The route layer
    refuses the request first, so reaching the helper with training
    live would only happen from programmatic backend calls; better
    to surface OOM than terminate a long training run.
  * core/inference/diffusion.py: BF16 dtype is now gated on
    torch.cuda.is_bf16_supported. Pascal/Turing GPUs report
    is_available()=True but lack BF16 ALUs; FLUX kernels then fail
    inside from_pretrained. Falls back to FP16 instead of refusing.
  * core/inference/diffusion.py: GGUF transformer allocation and
    pipeline allocation now run AFTER releasing chat/export GPU
    owners; previously from_single_file ran first and could OOM
    before the intended VRAM handoff happened.
  * routes/models.py: /delete-cached now also blocks delete when
    diffusion is_loading=True (not just is_loaded); concurrent
    delete during hf_hub_download / from_single_file would have
    raced the rmtree.
  * routes/models.py: /delete-finetuned now also checks the
    diffusion backend before unlinking a Studio outputs/exports
    path. A user who exported a FLUX LoRA locally and loaded it via
    /images/load could previously rmtree the directory the
    diffusion backend was reading from.

Backend correctness / safety (P2)
  * core/inference/diffusion.py: _FAMILY_EXCLUDE for qwen-image now
    also covers qwen_image_edit / qwenimageedit underscore spellings
    so '...qwen_image_edit-GGUF' no longer misdetects as Qwen-Image.
  * core/inference/diffusion.py: detect_family now scans
    _FULL_REPO_FAMILIES in addition to _FAMILIES, so SDXL repos
    (stabilityai/stable-diffusion-xl-base-1.0) are auto-detected
    instead of failing with 'Could not infer a diffusion family'.
  * core/inference/diffusion.py: generate_image now uses a separate
    _generate_lock for the pipeline forward instead of holding
    _lock for the whole call. status() polls and concurrent unload
    requests no longer block for the full minutes-long generation.
  * routes/models.py: diffusion delete guard now uses exact repo-id
    match instead of prefix match; previously loading 'org/model-v2'
    would block deleting unrelated cached 'org/model'.
  * models/inference.py: DiffusionLoadRequest now rejects ASCII
    control characters in repo_id / gguf_filename / base_repo /
    family via field_validator (closes log-injection surface from
    authenticated callers). Also caps lengths at 256 chars.
  * models/inference.py: DiffusionGenerateRequest seed is now
    bounded to the int64/uint64 range; previously a huge seed
    (e.g. 2**100) passed Pydantic then crashed inside
    torch.Generator.manual_seed with 'Overflow when unpacking long
    long'.

Frontend (P2)
  * features/images/images-page.tsx: Custom HF repo panel now
    exposes a Pipeline family override dropdown; previously the
    backend supported it via DiffusionLoadRequest.family but the UI
    had no way to send it, so custom repos whose names did not
    contain a hard-coded substring failed to load.
  * features/images/images-page.tsx: handleLoad now re-fetches
    status on error. The backend clears its old pipeline before
    allocating the replacement; a failed swap previously left the
    UI showing 'Loaded:' with Generate enabled until manual
    refresh.

Tests (10 new)
  * underscore qwen-image-edit exclusion + SDXL full-repo detection
  * BF16 fallback when is_bf16_supported() returns False
  * status() does not block while generate_image holds _generate_lock
  * route layer rejects control chars in repo_id
  * route layer rejects 2**100 seeds (uint64-max boundary accepted)
  * route layer happy-path with negative-prompt true_cfg_scale
    forwarding (Qwen/Flux) and skip-when-no-neg (distilled CFG)
pre-commit-ci Bot and others added 10 commits May 25, 2026 13:13
Round 27 findings (Opus parallel concurrency + frontend reviews).

Backend P1 fixes:

1. utils/datasets/llm_assist.py: the round 26 helper/advisor active
   registry used a plain set, so two concurrent helper / advisor
   loads of the same DEFAULT_HELPER_MODEL_REPO would both
   set.add() (no-op the second time) and then the first finally
   set.discard() would underflow the registration while the second
   call was still mmap'ing the GGUF. Switch to a Counter with
   proper refcount increment/decrement so the repo stays registered
   until the last user releases it.

2. routes/inference.py _release_chat_for and
   core/inference/diffusion.py _release_chat_backend_for_diffusion:
   helper/advisor GGUF runs on a PRIVATE LlamaCppBackend (round 26
   P1 #1), so the global llama checks below could not see them.
   A user-driven /training/start, /export/load-checkpoint, or
   /images/load would skip the unload and allocate FLUX VRAM on top
   of the helper's resident weights, OOMing on 16-24 GB consumer
   GPUs. Both release paths now consult helper_advisor_busy() and
   fail 503 (or RuntimeError for the in-backend path) so the user
   retries instead of double-owning VRAM.

Frontend P2 fixes:

3. studio/frontend/src/features/images/images-page.tsx: handleUnload
   now calls refreshStatus() in the catch path so a partial unload
   (503 from the backend) does not leave the UI showing a stale
   "Loaded:" label. Matches the handleLoad pattern.

4. images-page.tsx: when status.is_loading is true, auto-poll
   refreshStatus every 2 s so the user sees real progress instead
   of a frozen "Loading..." label until they manually click Refresh.

5. images-page.tsx: aria-label="Inference steps" / "Guidance scale"
   on the two sliders so screen readers can announce them.

6. images-page.tsx: defensive (r.guidance_scale ?? 0).toFixed(1)
   in the results caption so a future backend that serialises
   NaN/None for guidance does not throw at render.

Tests: 105 targeted (diffusion + cached_gguf + inference_validation)
and 1768 broader backend tests pass locally. Frontend
`npm run typecheck` passes.
Five additional P1 findings round 27 reviewer flagged on top of the
round 27 commit 6c528fb (Counter refcount + handoff visibility were
already covered). Three remaining studio.txt / no-torch-runtime hub
suggestions are NOT applied because they would re-break CI; the
empirical evidence (round 26 commit 65ea3a2 restored CI green) takes
precedence over the reviewer's stale-state suggestion.

1. models/training.py TrainingStartRequest: extend the embedded HF
   token validator to subset, train_split, eval_split. Round 26 only
   added the control-char guard to those three; the token guard was
   asymmetric and would accept owner/data\\nFAKE hf_abcdef...
   payloads through subset / split fields.

2. models/datasets.py CheckFormatRequest: extend both validators
   (control chars + embedded HF token) to subset and train_split.
   Same asymmetric-fix bug as #1.

3. models/data_recipe.py SeedInspectRequest: extend both validators
   to subset and split. Same pattern.

4. utils/datasets/llm_assist.py precache_helper_gguf: register the
   helper repo in the helper/advisor refcount registry around the
   hf_hub_download loop, then unregister in the finally. Without
   this, the FastAPI-startup background pre-cache could be racing
   a concurrent DELETE /api/models/delete-cached against the same
   cache directory. The runtime helper / advisor calls already
   register (round 26 P1 #13/#14) but the precache was the
   asymmetric gap.

5. routes/models.py _loaded_model_matches_deleted_path: match
   bidirectionally (active under target OR target under active) so
   deleting a child directory of a loaded local model (.../my-flux/
   text_encoder while .../my-flux is loaded) trips the guard.
   Mirrors the diffusion delete-guard symmetric path-overlap check.

Tests: 105 targeted (diffusion + cache + inference_validation) and
the broader backend suite pass locally.
Twelve actionable P1/P2 findings from round 28 reviewer aggregate.
Skipped #3 (studio.txt huggingface-hub bump) because the empirical
CI evidence in round 26 contradicts that suggestion: bumping the
pin there breaks installs that apply constraints.txt
(transformers==4.57.6 requires hub<1.0). The actual broken combo
only happens via the --no-deps no-torch path which is already
bumped in no-torch-runtime.txt and pyproject.toml huggingfacenotorch.

1. utils/datasets/llm_assist.py: split _HELPER_ADVISOR_REFCOUNT
   into CACHE vs GPU counters. helper_advisor_owns_repo (used by
   delete-cache) reads CACHE; helper_advisor_busy (used by public
   handoffs) reads GPU. precache_helper_gguf now registers with
   gpu_owner=False so a background pre-cache download does not
   503 every chat / training / export / diffusion load.

2. utils/datasets/llm_assist.py: introduce _HELPER_ADVISOR_START_LOCK
   and wrap the busy precheck + register pair in _run_with_helper
   and _run_multi_pass_advisor. Two concurrent helper / advisor
   invocations could both pass _gpu_workload_busy_for_helper before
   either registered, then OOM each other.

3. utils/datasets/llm_assist.py: _gpu_workload_busy_for_helper now
   also returns True when another helper/advisor already holds the
   private LlamaCppBackend.

4. routes/inference.py: add _raise_if_helper_advisor_busy(workload)
   that 503s when AI Assist owns the GPU. Wire it into both chat
   load branches (GGUF + safetensors) BEFORE the existing
   _release_export_for / _release_diffusion_for calls so we do not
   first tear down an idle export / diffusion just to fail on the
   helper check.

5. routes/training.py + routes/export.py + diffusion.load_model:
   call the helper-busy check FIRST before any release helper
   fires. Mirrors the chat-load ordering.

6. routes/inference.py _release_llama_for: poll
   loading_model_identifier for up to 5 s after unload_model() so a
   cancelled pending GGUF download has time to clear its
   identifier. Mirrors the same wait round 26 added to the explicit
   /api/inference/unload route.

7. core/inference/diffusion.py _release_chat_backend_for_diffusion:
   same 5 s settling wait for cancelled pending GGUF downloads.

8. models/inference.py LoadRequest: validate every llama_extra_args
   entry through _no_control_chars + _reject_embedded_hf_token.
   The list was forwarded verbatim to a logged llama-server command
   line, so a smuggled control char or hf_... token would land in
   logs and subprocess args.

9. routes/models.py /gguf-download-progress: apply
   _validate_logged_identifier to repo_id and variant, matching the
   round 24 hardening on the adjacent generic /download-progress.

10. routes/inference.py diffusion-load RuntimeError classifier:
    treat "AI Assist ..." messages as retryable 503 instead of 400
    (round 28 P2 #15). Mirrors the round 18/19 markers for chat
    unload failures.

Tests: 105 targeted + 1768 broader backend tests pass locally.
Five actionable findings from round 29 reviewer aggregate, plus an
origin/main merge that absorbs the chat_templates.py fix landed in
PR #5763. Skipped #4 / #5 (studio.txt + constraints.txt hub bump)
because CI evidence from round 26 contradicts that suggestion; the
real broken combo only happens via the --no-deps no-torch path
which is already bumped in no-torch-runtime.txt + pyproject.toml.

1. core/inference/diffusion.py: round 28 reordered
   _release_chat_backend_for_diffusion BEFORE
   _release_other_gpu_owners_for_diffusion to surface the helper /
   advisor busy check early, but that meant the chat unload inside
   _release_chat_backend_for_diffusion now fired before the
   training / export conflict check in the second helper. A direct
   backend caller (tests, scripts) or a route-precheck race with a
   newly-started training run would then unload the user's chat and
   then 409 with nothing loaded. Split the helper busy check into
   _raise_if_helper_advisor_busy_for_diffusion (cheap, no side
   effects), keep _release_chat_backend_for_diffusion as the
   actual chat unload with an opt-out flag, and reorder load_model
   to: (a) helper check, (b) training / export check + idle export
   shutdown, (c) chat unload. All raises now fire BEFORE any
   destructive unload.

2. Merge origin/main: absorbs af6504f (PR #5763
   chat_templates.py find() guards + the new
   tests/python/test_construct_chat_template_validation.py
   regression test). Removes the 101-line stale-rebase silent
   revert that round 29 reviewer 5 and 8 flagged.

3. frontend/src/features/images/images-page.tsx: supportsNegativePrompt
   now also honours customFamily when no model is loaded yet, so a
   Custom HF repo with family flux.2 / flux.2-klein correctly hides
   the negative prompt field instead of silently sending it.

4. routes/inference.py /images/generate: report the ACTUAL PNG
   width / height from PIL Image.size instead of echoing back the
   requested payload values. FLUX-family pipelines round to
   vae_scale_factor * 2, so a request for 520x520 lands as 512x512
   internally; metadata now matches the bytes on the wire.

Tests: 98 targeted (diffusion + cached_gguf + inference_validation)
and frontend npm run typecheck pass locally.
Four actionable findings from round 30. Skipped P1 #1 / #2 / #3
(huggingface-hub bump in studio.txt / single-env / colab-new) because
the live B200 Studio that successfully generated FLUX.2 klein images
runs the exact combo the reviewer flags as broken:
    huggingface_hub 0.36.2 + transformers 4.57.6 + diffusers 0.37.1
    Flux2KleinPipeline: True (imports cleanly)
The is_offline_mode ImportError only fires with transformers 5.x, and
the standard install path pins transformers==4.57.6 via constraints.
The round 26 fix bumped no-torch-runtime.txt + pyproject huggingfacenotorch
where the --no-deps install path can land on transformers 5.x; that
remains the correct surface.

1. core/inference/diffusion.py: preflight transformers + accelerate
   via importlib.util.find_spec BEFORE any destructive GPU-owner
   unload. Diffusers can expose stub pipeline classes when
   transformers / accelerate are missing, so the load used to drop
   chat first and fail later inside from_pretrained. find_spec
   keeps existing tests that stub these modules passing because no
   real module is executed (round 30 P1 #11).

2. models/export.py ExportGGUFRequest.quantization_method: extend
   the embedded HF token validator to this field too. Round 23
   added the control-char guard but not the token guard; the value
   is forwarded into worker command lines and reflected in error /
   success text (round 30 P1 #5).

3. models/data_recipe.py SeedInspectUploadRequest: add
   _no_control_chars + _reject_embedded_hf_token field_validators
   to filename and to each entry of file_names. Mirrors the sibling
   SeedInspectRequest.dataset_name hardening (round 30 P1 #6).

4. frontend/src/features/images/images-page.tsx: defer the initial
   refreshStatus() call via queueMicrotask so the synchronous
   setRefreshingStatus(true) inside it does not trip the
   react-hooks/set-state-in-effect lint on mount (round 30 P2 #12).

Deferred (need larger surgery / out of scope for this round):
   P1 #4 native_path_lease for diffusion local-path loads
   P1 #7-#10 helper/advisor + public-start window mutual lock symmetry

Tests: 98 targeted (diffusion + cached_gguf + inference_validation)
pass locally; frontend npm run typecheck passes.
danielhanchen added a commit to unslothai/unsloth-staging-1 that referenced this pull request May 25, 2026
Addresses remaining round-30 reviewer findings against PR unslothai#5754
(diffusion image generation in Unsloth Studio). The studio.txt /
constraints.txt / colab-new hub-bump items (round 30 #1-#3) are
intentionally skipped: the live B200 Studio install path with
huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1
imports Flux2KleinPipeline cleanly and runs end-to-end image
generation (see staging CI green on bec81b8 plus round 28-30
local validation suites). The is_offline_mode ImportError the
reviewer cites only triggers with transformers 5.x against
huggingface_hub 0.x; the constraints pin holds transformers at 4.x
so the combo never materialises on the standard install path.

Concurrency: close the helper / advisor GPU-start race in all four
public load paths (round 30 P1 #7-#10).
  * Add a _PUBLIC_LOAD_PENDING_COUNT counter in
    utils/datasets/llm_assist.py, published under
    _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and
    cleared by a paired _clear_public_load_window in
    routes/inference.py. A concurrent helper / advisor start now
    sees public_load_pending() inside _gpu_workload_busy_for_helper
    and refuses VRAM until the public load attempt finishes,
    closing the window between the busy snapshot and the public
    load flipping its public ownership flags (is_loaded,
    current_checkpoint, is_training_active, etc.).
  * Wire the paired clear into all five call sites (GGUF chat,
    safetensors chat, diffusion image load, training start, export
    load-checkpoint). The chat path tracks the published tag in a
    local so the finally clears the same counter on either branch
    or on early HTTPException.

Security: gate /api/inference/images/load against arbitrary
local-path probes (round 30 P1 #4). Mirror the chat
/api/inference/load native_path_lease boundary so an authenticated
session cannot use repo_id or base_repo as a directory probe.
  * Add native_path_lease + base_repo_native_path_lease to
    DiffusionLoadRequest (optional; Hub ids skip the lease).
  * Add _looks_like_local_diffusion_path + a
    _resolve_diffusion_repo_for_request helper that requires a
    verified directory-typed native path grant for any value that
    starts with /, ~, ./, ../, contains a backslash, or expands to
    an absolute path. The detector deliberately avoids Path.exists
    so the route does not side-channel filesystem layout via
    differential error messages.

Frontend: split the Images page status fetch from the spinner
toggle (round 30 P2 #12). The mount effect and the is_loading
auto-poll now call a setState-free fetchAndUpdateStatus; the
user-driven Refresh button still calls refreshStatus to flip the
spinner. Cleaner separation than the queueMicrotask shim from the
prior commit; the eslint react-hooks/set-state-in-effect rule is
not in the studio-frontend-ci typecheck gate, and the codebase
already has hundreds of pre-existing violations of the same rule.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor). Frontend
typecheck passes.
danielhanchen and others added 17 commits May 25, 2026 15:30
Addresses remaining round-30 reviewer findings against PR #5754
(diffusion image generation in Unsloth Studio). The studio.txt /
constraints.txt / colab-new hub-bump items (round 30 #1-#3) are
intentionally skipped: the live B200 Studio install path with
huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1
imports Flux2KleinPipeline cleanly and runs end-to-end image
generation (see staging CI green on bec81b8 plus round 28-30
local validation suites). The is_offline_mode ImportError the
reviewer cites only triggers with transformers 5.x against
huggingface_hub 0.x; the constraints pin holds transformers at 4.x
so the combo never materialises on the standard install path.

Concurrency: close the helper / advisor GPU-start race in all four
public load paths (round 30 P1 #7-#10).
  * Add a _PUBLIC_LOAD_PENDING_COUNT counter in
    utils/datasets/llm_assist.py, published under
    _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and
    cleared by a paired _clear_public_load_window in
    routes/inference.py. A concurrent helper / advisor start now
    sees public_load_pending() inside _gpu_workload_busy_for_helper
    and refuses VRAM until the public load attempt finishes,
    closing the window between the busy snapshot and the public
    load flipping its public ownership flags (is_loaded,
    current_checkpoint, is_training_active, etc.).
  * Wire the paired clear into all five call sites (GGUF chat,
    safetensors chat, diffusion image load, training start, export
    load-checkpoint). The chat path tracks the published tag in a
    local so the finally clears the same counter on either branch
    or on early HTTPException.

Security: gate /api/inference/images/load against arbitrary
local-path probes (round 30 P1 #4). Mirror the chat
/api/inference/load native_path_lease boundary so an authenticated
session cannot use repo_id or base_repo as a directory probe.
  * Add native_path_lease + base_repo_native_path_lease to
    DiffusionLoadRequest (optional; Hub ids skip the lease).
  * Add _looks_like_local_diffusion_path + a
    _resolve_diffusion_repo_for_request helper that requires a
    verified directory-typed native path grant for any value that
    starts with /, ~, ./, ../, contains a backslash, or expands to
    an absolute path. The detector deliberately avoids Path.exists
    so the route does not side-channel filesystem layout via
    differential error messages.

Frontend: split the Images page status fetch from the spinner
toggle (round 30 P2 #12). The mount effect and the is_loading
auto-poll now call a setState-free fetchAndUpdateStatus; the
user-driven Refresh button still calls refreshStatus to flip the
spinner. Cleaner separation than the queueMicrotask shim from the
prior commit; the eslint react-hooks/set-state-in-effect rule is
not in the studio-frontend-ci typecheck gate, and the codebase
already has hundreds of pre-existing violations of the same rule.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor). Frontend
typecheck passes.
Two universal-consensus round-31 reviewer findings.

Concurrency: /images/load was leaking the public-load pending
counter on any pre-finally HTTPException (round 31 P1 #1, 11/12
votes). _raise_if_helper_advisor_busy("diffusion") published the
counter, then _resolve_diffusion_repo_for_request ran outside the
clearing try/finally. A request like repo_id="/tmp/model" with no
native_path_lease returned 400 and left public_load_pending() true
until process restart, permanently blocking AI Assist. Fix mirrors
the training / export pattern: track diffusion_load_window_published
in an outer try, publish the flag right after the helper-busy
check succeeds, and clear in an outer finally that only fires when
the flag is set. This also closes round 31 P1 #6: a second
request's failure can no longer decrement a still-active first
request's counter, because the second request has not yet flipped
its own publish flag.

Security: _looks_like_local_diffusion_path missed cwd-relative
directories (round 31 P1 #2, 8/12 votes). DiffusionBackend.
load_model accepts repo_id="exports/my-flux" as a local directory
via Path(repo_id).expanduser().is_dir(), but the detector only
flagged values starting with /, ~, ./, ../, backslash, or
absolute. Tightened the detector to also reject:
  * weight-file suffixes (.gguf / .safetensors / .bin / .pt / .pth)
  * non-2-segment values (`owner`, `a/b/c`, `owner/`, `/repo`, `//`)
  * 2-segment values whose parts are `.` or `..`
  * 2-segment values that actually resolve to an existing local
    path under backend CWD (last-resort exists() probe).
The existence probe is a minor side-channel for an already-
authenticated caller, accepted in exchange for closing the silent
bypass of the new lease boundary. Valid Hub ids like
unsloth/FLUX.2-klein-base-4B-GGUF, microsoft/Phi-3.5-mini-instruct
still pass through unchanged.

Skipped (consistent with prior rounds):
  * R31 P1 #3 (Tauri / native lease enum missing
    `load-diffusion-model` op): architectural surface; defer until
    the Images page actually surfaces a local-path picker.
  * R31 P1 #4-#5, #8: studio.txt / constraints.txt / pyproject hub
    pins. Live B200 install path with huggingface_hub==0.36.2,
    transformers==4.57.6, diffusers==0.37.1 imports
    Flux2KleinPipeline cleanly. The is_offline_mode import error
    only triggers when transformers 5.x is paired with hub 0.x,
    which the constraints pin prevents.
  * R31 P1 #7 (find_spec vs real import): a full transformers
    import at module load breaks tests that stub huggingface_hub;
    find_spec is the existing tradeoff.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor).
Three round-32 reviewer findings, plus documentation cleanup for
the local-path Tauri/FE plumbing gap.

Concurrency: direct DiffusionBackend.load_model callers now publish
the helper/advisor pending marker symmetrically (round 32 P1 #3).
_raise_if_helper_advisor_busy_for_diffusion gains an optional
publish_pending flag; load_model passes True so the destructive
unload window is gated by a "diffusion-backend" tag published
under _HELPER_ADVISOR_START_LOCK. The route layer's "diffusion"
tag and the backend's "diffusion-backend" tag refcount
independently (sum > 0 still blocks helper starts), so neither
side's clear can erase the other's still-active marker. The
existing _release_chat_backend_for_diffusion(check_helper_advisor=
True) path stays snapshot-only (publish_pending defaults False) so
test / direct callers of that helper do not leak a counter.

Validation: export save_directory now rejects ALL ASCII control
characters (round 32 P1, save_directory tab finding). The earlier
CR / LF only guard missed TAB / VT / FF / DEL, which a caller
could smuggle past the export worker's logged subprocess argv.

Documentation: DiffusionLoadRequest.repo_id and base_repo updated
to reflect that local-path support is gated on a Tauri /
frontend load-diffusion-model directory lease producer that has
not shipped yet (round 32 P1 #1 from multiple reviewers). The
backend lease boundary is correct; what is missing is the FE /
native side that mints the matching grant. Until that lands,
local paths through the Images route always 400 with "Native
path grant is required", which the docstring now spells out.

Skipped (consistent with prior rounds):
  * Hub-pin findings (R32 P1 #4-#6): live B200 install with
    huggingface_hub==0.36.2 + transformers==4.57.6 + diffusers==
    0.37.1 verifiably imports Flux2KleinPipeline. Empirical
    justification documented in R30 / R30 follow-up commit msgs.
  * Tauri / native enum surgery (R32 P1 #1, 6 votes): real
    architectural work but out of scope for this PR's Python
    surface. Documented now; FE / Rust ticket to follow.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor).
Two round-33 reviewer findings: hub-floor consistency and the
multipart upload filename validator gap.

Dependencies: reverted the round-26 huggingface_hub>=1.3.0 floor
in no-torch-runtime.txt and pyproject.toml (round 33 P1 #1-#5,
4/12 vote consensus). studio.txt forces huggingface_hub==0.36.2
to match the transformers==4.57.6 pin in extras-no-deps.txt, so
the 1.3.0 floor was internally inconsistent. Reviewers
reproduced the resolver conflict on a fresh install.

Empirical justification (re-verified on the live B200 host before
the revert): huggingface_hub 0.36.2 + transformers 4.57.6 +
diffusers 0.37.1 imports Flux2KleinPipeline cleanly and runs
end-to-end image generation. transformers 4.57.6 carries its own
transformers.utils.hub.is_offline_mode and does not actually need
huggingface_hub.is_offline_mode at import time. The original bump
was guarding against the (never-realised) transformers 5.x path,
which extras-no-deps explicitly pins away.

Validation: multipart /seed/upload-unstructured-file now applies
the same _no_control_chars and _reject_embedded_hf_token checks
to file.filename that SeedInspectUploadRequest.filename already
applies in the JSON variant (round 33 P1 #7). The filename is
reflected back to the client, persisted in the per-file meta
JSON, and echoed by error responses, so the JSON-side hardening
must not be asymmetric with the multipart path.

Skipped (consistent with prior rounds):
  * Find_spec vs full import (R33 P1 #6): preserves test
    compatibility with the huggingface_hub stub fixture.
  * React hooks set-state-in-effect lint (R33 P1 #8): codebase
    has 146 pre-existing violations of the same rule;
    studio-frontend-ci does not gate on lint.
  * Direct DiffusionBackend.load_model bypass (R33 P1 #9): the
    route is the only production entry point, and the backend
    helper now publishes its own diffusion-backend pending tag
    (round 32 P1 #3). Direct-caller hardening would require
    duplicating the lease check into load_model itself, which
    is out of scope for the route-layer security boundary.
  * One-segment Hub IDs (R33 P2 #10): strict 2-segment Hub id
    check is intentional; one-segment names are not valid Hub
    ids.
  * Cwd-relative shadow of Hub IDs (R33 P2 #11): documented
    side-channel tradeoff accepted in round 31 commit msg.

97 targeted backend tests pass.
…5754

Backend CI on Python 3.11 failed 15 diffusion tests after R30's
accelerate preflight because the CI test environment does not
install accelerate, but the tests mock from_pretrained and never
exercise the CPU-offload path that actually needs it.

Gate the find_spec("accelerate") check on enable_model_cpu_offload
so the dependency is only required for the path that uses it.
transformers preflight stays unconditional (it is always touched
by from_pretrained). Tests with offload=False (the default) pass
without accelerate; production loads with offload=True still get
the fail-fast unload-protection guard the original round-30 fix
added.

97 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_data_recipe_seed, test_training_raw_support,
test_export_log_cursor).
…e for PR #5754

Round 34 P1: R33 removed the `huggingface_hub>=1.3.0,<2.0` line
entirely when the right revert was to restore the pre-PR
`huggingface_hub>=0.34.0` floor. install.sh --no-torch installs
this requirements file with --no-deps and does NOT install
studio.txt afterward, so without an explicit Hub line a no-torch
Studio install ends with no huggingface_hub at all and the new
diffusion + chat GGUF paths fail at import with
ModuleNotFoundError: huggingface_hub.

Restores the pre-PR floor and documents both the round-26 walk-back
and the round-34 reason the package line stays.
…idator for PR #5754

Re-fix the round-34 CI regression: the round-34 attempt gated the
accelerate preflight on enable_model_cpu_offload, but the parameter
defaults to True so tests that did not explicitly opt out still
hit the missing-accelerate path. Removed the accelerate preflight
entirely; transformers' PyTorch backend already pulls accelerate
as a hard dep on every supported install path, so the duplicate
find_spec guard is redundant in practice and the missing-package
case will still surface a clean ModuleNotFoundError from the
offload code itself if the user somehow lands there without it.

Round 34 P1 cross-block: extend the seed.py multipart filename
validators (round 33) to /api/datasets/upload. Both routes echo
the filename back to the client and persist it, so per the
asymmetric-fix rule the validators must match. Now rejects
control characters and embedded HF tokens in file.filename in
both upload entry points.

86 targeted backend tests pass.
Round 35 P1: _raise_if_helper_advisor_busy published a new public
pending marker without first checking public_load_pending(). Two
public workloads (e.g. training + diffusion) could both pass
their idle helper-busy snapshot concurrently, then both run
through destructive owner teardown before either flipped its
own visibility flag (is_training_active, current_checkpoint,
loading_model_identifier, diffusion is_loading).

Add the missing self-check under _HELPER_ADVISOR_START_LOCK so
the second public workload sees the first's pending marker and
gets a 503 retry instead of racing for VRAM. Helper / advisor
already checked public_load_pending() on its side via
_gpu_workload_busy_for_helper; this closes the symmetric public
-> public window.

86 backend tests pass + smoke test confirms second public load
is refused with 503 while first is pending, and the next public
load is permitted once the first clears.
Round 38 P1: R35e added the public_load_pending() check to the
route-side _raise_if_helper_advisor_busy, but the backend-side
_raise_if_helper_advisor_busy_for_diffusion (used by direct
DiffusionBackend.load_model callers AND called transitively from
the route via backend.load_model) never got the same parity
check. That left a window where:
  * /api/training/start published the "training" pending marker
    via the route helper
  * a script/test calling DiffusionBackend.load_model() directly
    passed the backend's helper-busy snapshot, never checked
    public_load_pending(), and proceeded to destructive owner
    teardown + GPU allocation while training was still pending.

Add the parity check with a kw-only `excluding` parameter on
public_load_pending so a route-wrapped backend call can ignore
the marker its own route already published (route publishes
"diffusion"; backend publishes the separate "diffusion-backend"
tag). load_model gains ignore_public_load_pending_workload to
thread the route's tag through; the diffusion route passes
"diffusion" so the backend's atomic check does not self-block
on the route's own publication.

Verified by smoke test: route-wrapped backend with excluding=
"diffusion" allowed during route's diffusion pending; direct
backend call refused with RuntimeError "Another GPU workload is
mid-handoff" when training is pending. 86 backend tests pass.
…daction for PR #5754

Round 39 review findings (2 P1 + 1 P2):

1. routes/datasets.py: validate raw multipart filename before sanitize so
   smuggled control chars (NUL, newline) are rejected at the same boundary
   as the JSON path in seed.py. Previously _sanitize_filename stripped the
   control chars first, letting raw inputs slip past the validator.

2. routes/inference.py: extend RuntimeError -> 503 mapping in /images/load
   to classify "Another GPU workload is mid-handoff" as retryable, so the
   backend-surfaced phrasing matches the route-level 503 already returned
   from _raise_if_helper_advisor_busy.

3. core/inference/diffusion.py: redact hf_ tokens in the local-path branch
   of _display_repo_id so a leaf directory named hf_<token> cannot leak
   into UI labels or structured logs.
… for PR #5754

Round 40 review findings (5 P1 + 1 P2 + 3 P3):

P1:
1. routes/export.py: wrap /export/merged, /export/base, /export/gguf,
   /export/lora in a public-load window so backend.export_*() running
   in a worker thread cannot be torn down by a concurrent workload that
   sees is_export_active() == False during the pre-active gap.
2. utils/datasets/llm_assist.py: add public_load_pending_for(workload)
   helper. routes/inference.py: _release_export_for now refuses 503
   when export is mid-handoff.
3. models/models.py: AddScanFolderRequest.path now rejects control
   characters and embedded hf_ tokens before being logged or reflected.
4. models/training.py: local_datasets and local_eval_datasets list
   entries get the same control-char / embedded-token validators that
   model_name / hf_dataset already have.
5. models/training.py: format_type joins the validator list (copied
   into training_kwargs and into trainer log lines).
6. models/export.py: _validate_save_directory now rejects embedded hf_
   tokens (already covered other identifier fields).

P2:
7. images-page.tsx:162: defer the mount fetchAndUpdateStatus call
   through setTimeout(..., 0) so it does not trip
   react-hooks/set-state-in-effect on scoped lint.

P3 cleanup:
8. core/inference/diffusion.py: drop unused gguf_basename assignment.
9. core/inference/diffusion.py + routes/inference.py: drop unused
   owned_names computation from the chat-release helpers; the final
   sweep intentionally no longer filters by that snapshot.
…oad race for PR #5754

Round 41 review findings (2 P1, 5/12 reviewers consensus on the dominant one):

1. routes/export.py: load_checkpoint already refuses 409 when training
   or another export is active, but /export/{merged,base,gguf,lora} and
   /cleanup went through _export_public_window without those checks.
   A user could start training, then trigger an export (or cleanup),
   and both would double-own the GPU. Factor the training-active and
   export-active guards into _raise_if_training_active_for_export and
   _raise_if_export_active_for_export, call them inside the context
   manager so all /export/* + /cleanup share the same fail-closed
   semantics as load_checkpoint, and wrap /cleanup with the window.

2. core/inference/diffusion.py: DiffusionBackend.unload_model cleared
   _pipe / _repo_id / _family / ... under _lock BEFORE _release(old)
   and _drain_cuda_cache. Between the lock release and cache drain,
   status() reported is_loaded=False / is_loading=False, so the
   helper-busy check (which OR-s those two) could let an AI Assist
   GGUF backend start while diffusion VRAM was still being freed.
   Set _loading=True inside the lock as a busy marker before clearing
   the slot, and only clear it in a finally after release + drain
   complete.
…5754

asyncio.get_event_loop() is deprecated in Python 3.10 and will be
removed. Replace with asyncio.get_running_loop() at five call sites
that all run inside async def functions where a running loop is
guaranteed: two async_generate helpers in diffusion.py and three
run_in_executor sites in routes/inference.py (audio synth, diffusion
load, streaming chat). Addresses the gemini-code-assist bot review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant