Studio: local diffusion image generation page#5754
Open
danielhanchen wants to merge 94 commits into
Open
Conversation
Backend - core/inference/diffusion.py: DiffusionBackend singleton that loads diffusion GGUFs from Hugging Face via diffusers.GGUFQuantizationConfig and runs them on the active CUDA / MPS / CPU device. Supports FLUX.2, FLUX.2 klein, FLUX.1, Qwen-Image, Stable Diffusion 3, and SDXL. - routes/inference.py: POST /api/inference/images/load, POST /api/inference/images/generate, POST /api/inference/images/unload, GET /api/inference/images/status mirroring the llama-server lifecycle. - models/inference.py: DiffusionLoadRequest, DiffusionGenerateRequest, DiffusionGenerateResponse pydantic schemas with prompt / step / size validation up front so callers get clear 422s rather than VAE crashes. - requirements/no-torch-runtime.txt: pin gguf alongside the existing diffusers entry so GGUFQuantizationConfig works out of the box. - tests/test_diffusion_backend.py + tests/test_diffusion_routes.py: 27 unit tests covering family detection, validation, lifecycle, and the full FastAPI round trip with the backend stubbed. No torch / diffusers / GPU required to run. Frontend - features/images/: standalone images-page.tsx with curated model picker (FLUX.2 klein 4B / 9B, FLUX.2 dev, FLUX.1 dev), HF token field, prompt + negative prompt, resolution presets, steps + guidance sliders, seed input, and a result gallery that renders base64 PNGs inline. - app/routes/images.tsx: lazy /images route wired into router.tsx. - components/app-sidebar.tsx: PaintBrush02Icon nav item between Recipes and Export, hidden in chat-only mode.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
for more information, see https://pre-commit.ci
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a local diffusion image generation backend and a corresponding UI page. It supports loading Hugging Face checkpoints in standard or GGUF formats, with specific support for the Flux and Qwen-Image families. The backend manages model lifecycles (load, generate, unload) to optimize GPU memory usage. Review feedback highlighted the use of the deprecated asyncio.get_event_loop() function in two locations, recommending asyncio.get_running_loop() instead.
| """ | ||
| backend = _get_diffusion_backend() | ||
| try: | ||
| status = await asyncio.get_event_loop().run_in_executor( |
Contributor
There was a problem hiding this comment.
| with _singleton_lock: | ||
| if _singleton is None: | ||
| _singleton = DiffusionBackend() | ||
| return _singleton |
Contributor
SectionCard requires an icon prop. Pass GpuIcon, PaintBrush02Icon, and SparklesIcon for the three sections so tsc -b stops failing on TS2741 'Property icon is missing'.
Backend - Fix FLUX.2 klein family default base_repo: black-forest-labs/FLUX.2-klein does not exist on the Hub. Point at the Apache 2.0 4B Base instead so the from_pretrained call works out of the box for ungated users. - Serialise concurrent load_model calls with a dedicated _load_lock so two /images/load requests cannot both reach pipeline_cls.from_pretrained at the same time (would double-spend VRAM and corrupt _pipe). - When the caller passes a full diffusers repo (no gguf_filename), use repo_id directly instead of silently substituting the family default. Closes the load-the-wrong-model regression flagged by review. - Drop negative_prompt from the pipeline call when the loaded pipeline does not accept it (FLUX.2 / FLUX.2 klein). Inspect __call__ via inspect.signature so we do not maintain a manual class list. - Best-effort unload the chat backend (llama-server) before a diffusion load so a 24 GB consumer GPU can swap between chat and diffusion without manual unload steps. Frontend - Replace the four curated entries with the actual filenames published on the Hub (lowercase flux-2-klein-Nb-Q4_K_S.gguf and flux2-dev*). - Add an explicit base_repo per curated entry so the backend never falls back to the family default for the curated picker. - Add the Apache 2.0 FLUX.2 klein base 4B entry so first-time users have an ungated, no-token-required default. - Hide the negative prompt field for FLUX.2 / FLUX.2 klein and show a small explanatory note instead. Tests - Add 6 new backend tests: base_repo override, full-repo (no GGUF) no-substitution, concurrent serialise race, signature-based kwarg filter, negative_prompt strip on FLUX.2, negative_prompt preserved on supporting pipelines. 33 tests passing.
for more information, see https://pre-commit.ci
- unload_model now takes _load_lock so it cannot race with an in-flight load_model and have the load thread overwrite cleared state after unload returned is_loaded=false. - Move stable-diffusion-xl out of _FAMILIES into _FULL_REPO_FAMILIES. SDXL uses a UNet (no transformer GGUF path is wired); listing it in the GGUF families panel was misleading. SDXL full-repo loads still work via family_override='stable-diffusion-xl'. - Result gallery now uses h-auto + object-contain so portrait / landscape outputs render at their true aspect ratio instead of being cropped into a square thumbnail.
danielhanchen
added a commit
to unslothai/unsloth-staging-1
that referenced
this pull request
May 24, 2026
_release_chat_backend_for_diffusion now unloads both the GGUF chat backend (llama-server) and the safetensors / HF chat backend (get_inference_backend) before a diffusion load. Mirror the behaviour on the chat-load side: both the Unsloth/transformers load path and the GGUF load path now unload the diffusion pipeline before claiming GPU memory. Closes the OOM-on-swap path flagged by reviewers in both directions.
- _smart_base_repo: pick 9B base for unsloth/FLUX.2-klein-9B-GGUF and -base- variants per the repo id, instead of always falling back to the 4B family default. - pipe_kwargs use_safetensors=True so diffusers refuses pickle .bin weights at load time (defends against compromised base_repo). - Release the previous pipeline BEFORE allocating the new one so peak VRAM stays at one model's worth instead of two on swap. - Reject empty gguf_filename when repo_id ends with -GGUF; the prior behavior tried from_pretrained on a GGUF-only repo and 500'd deep in diffusers with a confusing model-index error. - Status returns gguf_filename (basename) instead of gguf_path so the local cache path / username does not leak to authenticated Studio sessions. - requirements/no-torch-runtime.txt: pin diffusers>=0.37.0 so older installs cannot resolve a version without Flux2KleinPipeline. - Frontend curated distilled klein entries now point at the matching non-base diffusers repos (FLUX.2-klein-4B / -9B) per the published model cards. Update api.ts to mirror the renamed status field.
danielhanchen
added a commit
to unslothai/unsloth-staging-1
that referenced
this pull request
May 25, 2026
for more information, see https://pre-commit.ci
- routes/export.py load_checkpoint now unloads the diffusion pipeline alongside the existing inference + training unloads, so an export load after Images does not OOM the export subprocess. - Remove the 'sd3.5' alias from the stable-diffusion-3 family. SD3.5 needs its own family + base_repo (and its own smoke test); pairing it with the SD3 Medium base produced a misleading load.
Member
Author
|
Iteration update on the PR, post-review fixes: Review-driven fixes
Frontend
Verification
|
_release_chat_backend_for_diffusion was importing get_inference_backend from core.inference.inference (the in-subprocess class) and calling unload_model() without the required model_name argument. The TypeError was swallowed and the active chat model stayed resident, defeating the chat-to-diffusion lifecycle handoff. Switch to the orchestrator's accessor at core.inference and pass active_model_name through, mirroring the GGUF chat-load path. Add a regression test that stubs both backends and verifies unload_model is called with the active model name.
danielhanchen
added a commit
to unslothai/unsloth-staging-1
that referenced
this pull request
May 25, 2026
for more information, see https://pre-commit.ci
- detect_family adds _FAMILY_EXCLUDE so 'stable-diffusion-3.5' no longer matches the SD3 Medium family and 'qwen-image-edit' no longer matches Qwen-Image. Both were misleading silent loads. - from_single_file now forwards config=<effective_base>, subfolder='transformer', and the HF token. Diffusers-format GGUFs (FLUX.2 klein, Qwen-Image, SD3) need the matching base config or the transformer load picks the wrong shapes; gated GGUFs need the token both for download and config read. - Move _release_chat_backend_for_diffusion + new _release_other_gpu_owners_for_diffusion to AFTER the GGUF download and pipeline class lookup so a typo or transient Hub error does not kill the user's currently-loaded chat model. Peak VRAM still stays at one model's worth because the releases run right before from_pretrained. - _release_other_gpu_owners_for_diffusion: shut down the export subprocess and any active training subprocess before a diffusion load. Symmetric with the export load path. - routes/training.py: unload diffusion before starting training so the new subprocess does not race FLUX/Qwen for VRAM. - routes/export.py: also unload the GGUF llama-server before export load (the existing inference-backend unload only covered the safetensors path).
for more information, see https://pre-commit.ci
danielhanchen
added a commit
to unslothai/unsloth-staging-1
that referenced
this pull request
May 25, 2026
When a swap load fails after the previous pipeline is released, status() previously reported is_loaded=false on top of the OLD repo/family/base_repo metadata, which the frontend then rendered as a misleading 'still loaded: X' label. Clear all metadata atomically with the pipe drop so a failed swap reports a clean empty status plus last_error. Add regression test.
for more information, see https://pre-commit.ci
… precision for PR #5754 - DiffusionBackend.status() now takes _lock so frontend polling cannot observe a torn snapshot mid-swap. - Scrub hf_token / pipe_kwargs / single_file_kwargs from frame locals before logger.exception() so rich tracebacks and structlog formatters that render locals do not leak hf_... tokens into logs. - routes/models.py delete_cached_repo: refuse to delete the cache underlying a currently-loaded diffusion pipeline (both the GGUF repo and the matching diffusers base_repo). Symmetric with the existing chat-load + GGUF guard. - Frontend seed validation: reject non-integer and out-of-safe- integer-range inputs instead of silently rounding via Number(), which would otherwise send a different seed than what the user typed.
for more information, see https://pre-commit.ci
…e prompt (PR #5754) QwenImagePipeline and FluxPipeline treat guidance_scale as the distilled CFG factor and expose true_cfg_scale as the real classifier-free guidance knob. Negative prompts only steer the output when true_cfg_scale > 1, so forwarding only guidance_scale left Qwen-Image on the default true_cfg_scale=4.0 and the user's slider value silently ineffective for negative prompts. When the loaded pipeline accepts both negative_prompt and true_cfg_scale and the caller supplies a non-empty negative prompt, forward guidance_scale through both kwargs so the negative prompt actually steers generation. When no negative prompt is supplied, true_cfg_scale is left at the model default to avoid switching distilled CFG models into real-CFG mode (which would double inference cost and degrade quality). Adds two regression tests covering the forward-when-negative and skip-when-no-negative paths.
The Studio backend's no-torch-runtime.txt is installed via pip --no-deps so the diffusion stack's transitive imports must be pinned explicitly. huggingface_hub's blob downloader (used by diffusers.GGUFQuantizationConfig and by every from_single_file call) imports requests + urllib3 + charset_normalizer at module load time; a fresh --no-deps install would 500 on the first /api/inference/images/load with PackageNotFoundError: 'requests'. Adds requests, urllib3, and charset_normalizer to the transitive-deps block next to the existing httpx chain.
…5754 Round 5 reviewer findings, mostly symmetric-lifecycle and input validation gaps the earlier rounds left open. Backend lifecycle (P1) * routes/training.py: training start now also unloads the GGUF llama-server subprocess; was previously only unloading the safetensors backend, so starting training while a GGUF chat model was loaded kept the subprocess pinned to VRAM. * routes/inference.py: new _raise_if_training_active helper. Both GGUF and standard chat loads, plus /api/inference/images/load, now refuse with HTTP 409 when training is active instead of silently stopping training to free VRAM. * core/inference/diffusion.py: _release_other_gpu_owners_for_ diffusion no longer stops active training. The route layer refuses the request first, so reaching the helper with training live would only happen from programmatic backend calls; better to surface OOM than terminate a long training run. * core/inference/diffusion.py: BF16 dtype is now gated on torch.cuda.is_bf16_supported. Pascal/Turing GPUs report is_available()=True but lack BF16 ALUs; FLUX kernels then fail inside from_pretrained. Falls back to FP16 instead of refusing. * core/inference/diffusion.py: GGUF transformer allocation and pipeline allocation now run AFTER releasing chat/export GPU owners; previously from_single_file ran first and could OOM before the intended VRAM handoff happened. * routes/models.py: /delete-cached now also blocks delete when diffusion is_loading=True (not just is_loaded); concurrent delete during hf_hub_download / from_single_file would have raced the rmtree. * routes/models.py: /delete-finetuned now also checks the diffusion backend before unlinking a Studio outputs/exports path. A user who exported a FLUX LoRA locally and loaded it via /images/load could previously rmtree the directory the diffusion backend was reading from. Backend correctness / safety (P2) * core/inference/diffusion.py: _FAMILY_EXCLUDE for qwen-image now also covers qwen_image_edit / qwenimageedit underscore spellings so '...qwen_image_edit-GGUF' no longer misdetects as Qwen-Image. * core/inference/diffusion.py: detect_family now scans _FULL_REPO_FAMILIES in addition to _FAMILIES, so SDXL repos (stabilityai/stable-diffusion-xl-base-1.0) are auto-detected instead of failing with 'Could not infer a diffusion family'. * core/inference/diffusion.py: generate_image now uses a separate _generate_lock for the pipeline forward instead of holding _lock for the whole call. status() polls and concurrent unload requests no longer block for the full minutes-long generation. * routes/models.py: diffusion delete guard now uses exact repo-id match instead of prefix match; previously loading 'org/model-v2' would block deleting unrelated cached 'org/model'. * models/inference.py: DiffusionLoadRequest now rejects ASCII control characters in repo_id / gguf_filename / base_repo / family via field_validator (closes log-injection surface from authenticated callers). Also caps lengths at 256 chars. * models/inference.py: DiffusionGenerateRequest seed is now bounded to the int64/uint64 range; previously a huge seed (e.g. 2**100) passed Pydantic then crashed inside torch.Generator.manual_seed with 'Overflow when unpacking long long'. Frontend (P2) * features/images/images-page.tsx: Custom HF repo panel now exposes a Pipeline family override dropdown; previously the backend supported it via DiffusionLoadRequest.family but the UI had no way to send it, so custom repos whose names did not contain a hard-coded substring failed to load. * features/images/images-page.tsx: handleLoad now re-fetches status on error. The backend clears its old pipeline before allocating the replacement; a failed swap previously left the UI showing 'Loaded:' with Generate enabled until manual refresh. Tests (10 new) * underscore qwen-image-edit exclusion + SDXL full-repo detection * BF16 fallback when is_bf16_supported() returns False * status() does not block while generate_image holds _generate_lock * route layer rejects control chars in repo_id * route layer rejects 2**100 seeds (uint64-max boundary accepted) * route layer happy-path with negative-prompt true_cfg_scale forwarding (Qwen/Flux) and skip-when-no-neg (distilled CFG)
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Round 27 findings (Opus parallel concurrency + frontend reviews). Backend P1 fixes: 1. utils/datasets/llm_assist.py: the round 26 helper/advisor active registry used a plain set, so two concurrent helper / advisor loads of the same DEFAULT_HELPER_MODEL_REPO would both set.add() (no-op the second time) and then the first finally set.discard() would underflow the registration while the second call was still mmap'ing the GGUF. Switch to a Counter with proper refcount increment/decrement so the repo stays registered until the last user releases it. 2. routes/inference.py _release_chat_for and core/inference/diffusion.py _release_chat_backend_for_diffusion: helper/advisor GGUF runs on a PRIVATE LlamaCppBackend (round 26 P1 #1), so the global llama checks below could not see them. A user-driven /training/start, /export/load-checkpoint, or /images/load would skip the unload and allocate FLUX VRAM on top of the helper's resident weights, OOMing on 16-24 GB consumer GPUs. Both release paths now consult helper_advisor_busy() and fail 503 (or RuntimeError for the in-backend path) so the user retries instead of double-owning VRAM. Frontend P2 fixes: 3. studio/frontend/src/features/images/images-page.tsx: handleUnload now calls refreshStatus() in the catch path so a partial unload (503 from the backend) does not leave the UI showing a stale "Loaded:" label. Matches the handleLoad pattern. 4. images-page.tsx: when status.is_loading is true, auto-poll refreshStatus every 2 s so the user sees real progress instead of a frozen "Loading..." label until they manually click Refresh. 5. images-page.tsx: aria-label="Inference steps" / "Guidance scale" on the two sliders so screen readers can announce them. 6. images-page.tsx: defensive (r.guidance_scale ?? 0).toFixed(1) in the results caption so a future backend that serialises NaN/None for guidance does not throw at render. Tests: 105 targeted (diffusion + cached_gguf + inference_validation) and 1768 broader backend tests pass locally. Frontend `npm run typecheck` passes.
Five additional P1 findings round 27 reviewer flagged on top of the round 27 commit 6c528fb (Counter refcount + handoff visibility were already covered). Three remaining studio.txt / no-torch-runtime hub suggestions are NOT applied because they would re-break CI; the empirical evidence (round 26 commit 65ea3a2 restored CI green) takes precedence over the reviewer's stale-state suggestion. 1. models/training.py TrainingStartRequest: extend the embedded HF token validator to subset, train_split, eval_split. Round 26 only added the control-char guard to those three; the token guard was asymmetric and would accept owner/data\\nFAKE hf_abcdef... payloads through subset / split fields. 2. models/datasets.py CheckFormatRequest: extend both validators (control chars + embedded HF token) to subset and train_split. Same asymmetric-fix bug as #1. 3. models/data_recipe.py SeedInspectRequest: extend both validators to subset and split. Same pattern. 4. utils/datasets/llm_assist.py precache_helper_gguf: register the helper repo in the helper/advisor refcount registry around the hf_hub_download loop, then unregister in the finally. Without this, the FastAPI-startup background pre-cache could be racing a concurrent DELETE /api/models/delete-cached against the same cache directory. The runtime helper / advisor calls already register (round 26 P1 #13/#14) but the precache was the asymmetric gap. 5. routes/models.py _loaded_model_matches_deleted_path: match bidirectionally (active under target OR target under active) so deleting a child directory of a loaded local model (.../my-flux/ text_encoder while .../my-flux is loaded) trips the guard. Mirrors the diffusion delete-guard symmetric path-overlap check. Tests: 105 targeted (diffusion + cache + inference_validation) and the broader backend suite pass locally.
Twelve actionable P1/P2 findings from round 28 reviewer aggregate. Skipped #3 (studio.txt huggingface-hub bump) because the empirical CI evidence in round 26 contradicts that suggestion: bumping the pin there breaks installs that apply constraints.txt (transformers==4.57.6 requires hub<1.0). The actual broken combo only happens via the --no-deps no-torch path which is already bumped in no-torch-runtime.txt and pyproject.toml huggingfacenotorch. 1. utils/datasets/llm_assist.py: split _HELPER_ADVISOR_REFCOUNT into CACHE vs GPU counters. helper_advisor_owns_repo (used by delete-cache) reads CACHE; helper_advisor_busy (used by public handoffs) reads GPU. precache_helper_gguf now registers with gpu_owner=False so a background pre-cache download does not 503 every chat / training / export / diffusion load. 2. utils/datasets/llm_assist.py: introduce _HELPER_ADVISOR_START_LOCK and wrap the busy precheck + register pair in _run_with_helper and _run_multi_pass_advisor. Two concurrent helper / advisor invocations could both pass _gpu_workload_busy_for_helper before either registered, then OOM each other. 3. utils/datasets/llm_assist.py: _gpu_workload_busy_for_helper now also returns True when another helper/advisor already holds the private LlamaCppBackend. 4. routes/inference.py: add _raise_if_helper_advisor_busy(workload) that 503s when AI Assist owns the GPU. Wire it into both chat load branches (GGUF + safetensors) BEFORE the existing _release_export_for / _release_diffusion_for calls so we do not first tear down an idle export / diffusion just to fail on the helper check. 5. routes/training.py + routes/export.py + diffusion.load_model: call the helper-busy check FIRST before any release helper fires. Mirrors the chat-load ordering. 6. routes/inference.py _release_llama_for: poll loading_model_identifier for up to 5 s after unload_model() so a cancelled pending GGUF download has time to clear its identifier. Mirrors the same wait round 26 added to the explicit /api/inference/unload route. 7. core/inference/diffusion.py _release_chat_backend_for_diffusion: same 5 s settling wait for cancelled pending GGUF downloads. 8. models/inference.py LoadRequest: validate every llama_extra_args entry through _no_control_chars + _reject_embedded_hf_token. The list was forwarded verbatim to a logged llama-server command line, so a smuggled control char or hf_... token would land in logs and subprocess args. 9. routes/models.py /gguf-download-progress: apply _validate_logged_identifier to repo_id and variant, matching the round 24 hardening on the adjacent generic /download-progress. 10. routes/inference.py diffusion-load RuntimeError classifier: treat "AI Assist ..." messages as retryable 503 instead of 400 (round 28 P2 #15). Mirrors the round 18/19 markers for chat unload failures. Tests: 105 targeted + 1768 broader backend tests pass locally.
for more information, see https://pre-commit.ci
Five actionable findings from round 29 reviewer aggregate, plus an origin/main merge that absorbs the chat_templates.py fix landed in PR #5763. Skipped #4 / #5 (studio.txt + constraints.txt hub bump) because CI evidence from round 26 contradicts that suggestion; the real broken combo only happens via the --no-deps no-torch path which is already bumped in no-torch-runtime.txt + pyproject.toml. 1. core/inference/diffusion.py: round 28 reordered _release_chat_backend_for_diffusion BEFORE _release_other_gpu_owners_for_diffusion to surface the helper / advisor busy check early, but that meant the chat unload inside _release_chat_backend_for_diffusion now fired before the training / export conflict check in the second helper. A direct backend caller (tests, scripts) or a route-precheck race with a newly-started training run would then unload the user's chat and then 409 with nothing loaded. Split the helper busy check into _raise_if_helper_advisor_busy_for_diffusion (cheap, no side effects), keep _release_chat_backend_for_diffusion as the actual chat unload with an opt-out flag, and reorder load_model to: (a) helper check, (b) training / export check + idle export shutdown, (c) chat unload. All raises now fire BEFORE any destructive unload. 2. Merge origin/main: absorbs af6504f (PR #5763 chat_templates.py find() guards + the new tests/python/test_construct_chat_template_validation.py regression test). Removes the 101-line stale-rebase silent revert that round 29 reviewer 5 and 8 flagged. 3. frontend/src/features/images/images-page.tsx: supportsNegativePrompt now also honours customFamily when no model is loaded yet, so a Custom HF repo with family flux.2 / flux.2-klein correctly hides the negative prompt field instead of silently sending it. 4. routes/inference.py /images/generate: report the ACTUAL PNG width / height from PIL Image.size instead of echoing back the requested payload values. FLUX-family pipelines round to vae_scale_factor * 2, so a request for 520x520 lands as 512x512 internally; metadata now matches the bytes on the wire. Tests: 98 targeted (diffusion + cached_gguf + inference_validation) and frontend npm run typecheck pass locally.
for more information, see https://pre-commit.ci
Four actionable findings from round 30. Skipped P1 #1 / #2 / #3 (huggingface-hub bump in studio.txt / single-env / colab-new) because the live B200 Studio that successfully generated FLUX.2 klein images runs the exact combo the reviewer flags as broken: huggingface_hub 0.36.2 + transformers 4.57.6 + diffusers 0.37.1 Flux2KleinPipeline: True (imports cleanly) The is_offline_mode ImportError only fires with transformers 5.x, and the standard install path pins transformers==4.57.6 via constraints. The round 26 fix bumped no-torch-runtime.txt + pyproject huggingfacenotorch where the --no-deps install path can land on transformers 5.x; that remains the correct surface. 1. core/inference/diffusion.py: preflight transformers + accelerate via importlib.util.find_spec BEFORE any destructive GPU-owner unload. Diffusers can expose stub pipeline classes when transformers / accelerate are missing, so the load used to drop chat first and fail later inside from_pretrained. find_spec keeps existing tests that stub these modules passing because no real module is executed (round 30 P1 #11). 2. models/export.py ExportGGUFRequest.quantization_method: extend the embedded HF token validator to this field too. Round 23 added the control-char guard but not the token guard; the value is forwarded into worker command lines and reflected in error / success text (round 30 P1 #5). 3. models/data_recipe.py SeedInspectUploadRequest: add _no_control_chars + _reject_embedded_hf_token field_validators to filename and to each entry of file_names. Mirrors the sibling SeedInspectRequest.dataset_name hardening (round 30 P1 #6). 4. frontend/src/features/images/images-page.tsx: defer the initial refreshStatus() call via queueMicrotask so the synchronous setRefreshingStatus(true) inside it does not trip the react-hooks/set-state-in-effect lint on mount (round 30 P2 #12). Deferred (need larger surgery / out of scope for this round): P1 #4 native_path_lease for diffusion local-path loads P1 #7-#10 helper/advisor + public-start window mutual lock symmetry Tests: 98 targeted (diffusion + cached_gguf + inference_validation) pass locally; frontend npm run typecheck passes.
for more information, see https://pre-commit.ci
danielhanchen
added a commit
to unslothai/unsloth-staging-1
that referenced
this pull request
May 25, 2026
Addresses remaining round-30 reviewer findings against PR unslothai#5754 (diffusion image generation in Unsloth Studio). The studio.txt / constraints.txt / colab-new hub-bump items (round 30 #1-#3) are intentionally skipped: the live B200 Studio install path with huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1 imports Flux2KleinPipeline cleanly and runs end-to-end image generation (see staging CI green on bec81b8 plus round 28-30 local validation suites). The is_offline_mode ImportError the reviewer cites only triggers with transformers 5.x against huggingface_hub 0.x; the constraints pin holds transformers at 4.x so the combo never materialises on the standard install path. Concurrency: close the helper / advisor GPU-start race in all four public load paths (round 30 P1 #7-#10). * Add a _PUBLIC_LOAD_PENDING_COUNT counter in utils/datasets/llm_assist.py, published under _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and cleared by a paired _clear_public_load_window in routes/inference.py. A concurrent helper / advisor start now sees public_load_pending() inside _gpu_workload_busy_for_helper and refuses VRAM until the public load attempt finishes, closing the window between the busy snapshot and the public load flipping its public ownership flags (is_loaded, current_checkpoint, is_training_active, etc.). * Wire the paired clear into all five call sites (GGUF chat, safetensors chat, diffusion image load, training start, export load-checkpoint). The chat path tracks the published tag in a local so the finally clears the same counter on either branch or on early HTTPException. Security: gate /api/inference/images/load against arbitrary local-path probes (round 30 P1 #4). Mirror the chat /api/inference/load native_path_lease boundary so an authenticated session cannot use repo_id or base_repo as a directory probe. * Add native_path_lease + base_repo_native_path_lease to DiffusionLoadRequest (optional; Hub ids skip the lease). * Add _looks_like_local_diffusion_path + a _resolve_diffusion_repo_for_request helper that requires a verified directory-typed native path grant for any value that starts with /, ~, ./, ../, contains a backslash, or expands to an absolute path. The detector deliberately avoids Path.exists so the route does not side-channel filesystem layout via differential error messages. Frontend: split the Images page status fetch from the spinner toggle (round 30 P2 #12). The mount effect and the is_loading auto-poll now call a setState-free fetchAndUpdateStatus; the user-driven Refresh button still calls refreshStatus to flip the spinner. Cleaner separation than the queueMicrotask shim from the prior commit; the eslint react-hooks/set-state-in-effect rule is not in the studio-frontend-ci typecheck gate, and the codebase already has hundreds of pre-existing violations of the same rule. 98 targeted backend tests pass (test_diffusion_routes, test_diffusion_backend, test_inference_model_validation, test_models_get_model_config_case_resolution, test_data_recipe_seed, test_training_raw_support, test_export_log_cursor). Frontend typecheck passes.
Addresses remaining round-30 reviewer findings against PR #5754 (diffusion image generation in Unsloth Studio). The studio.txt / constraints.txt / colab-new hub-bump items (round 30 #1-#3) are intentionally skipped: the live B200 Studio install path with huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1 imports Flux2KleinPipeline cleanly and runs end-to-end image generation (see staging CI green on bec81b8 plus round 28-30 local validation suites). The is_offline_mode ImportError the reviewer cites only triggers with transformers 5.x against huggingface_hub 0.x; the constraints pin holds transformers at 4.x so the combo never materialises on the standard install path. Concurrency: close the helper / advisor GPU-start race in all four public load paths (round 30 P1 #7-#10). * Add a _PUBLIC_LOAD_PENDING_COUNT counter in utils/datasets/llm_assist.py, published under _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and cleared by a paired _clear_public_load_window in routes/inference.py. A concurrent helper / advisor start now sees public_load_pending() inside _gpu_workload_busy_for_helper and refuses VRAM until the public load attempt finishes, closing the window between the busy snapshot and the public load flipping its public ownership flags (is_loaded, current_checkpoint, is_training_active, etc.). * Wire the paired clear into all five call sites (GGUF chat, safetensors chat, diffusion image load, training start, export load-checkpoint). The chat path tracks the published tag in a local so the finally clears the same counter on either branch or on early HTTPException. Security: gate /api/inference/images/load against arbitrary local-path probes (round 30 P1 #4). Mirror the chat /api/inference/load native_path_lease boundary so an authenticated session cannot use repo_id or base_repo as a directory probe. * Add native_path_lease + base_repo_native_path_lease to DiffusionLoadRequest (optional; Hub ids skip the lease). * Add _looks_like_local_diffusion_path + a _resolve_diffusion_repo_for_request helper that requires a verified directory-typed native path grant for any value that starts with /, ~, ./, ../, contains a backslash, or expands to an absolute path. The detector deliberately avoids Path.exists so the route does not side-channel filesystem layout via differential error messages. Frontend: split the Images page status fetch from the spinner toggle (round 30 P2 #12). The mount effect and the is_loading auto-poll now call a setState-free fetchAndUpdateStatus; the user-driven Refresh button still calls refreshStatus to flip the spinner. Cleaner separation than the queueMicrotask shim from the prior commit; the eslint react-hooks/set-state-in-effect rule is not in the studio-frontend-ci typecheck gate, and the codebase already has hundreds of pre-existing violations of the same rule. 98 targeted backend tests pass (test_diffusion_routes, test_diffusion_backend, test_inference_model_validation, test_models_get_model_config_case_resolution, test_data_recipe_seed, test_training_raw_support, test_export_log_cursor). Frontend typecheck passes.
for more information, see https://pre-commit.ci
Two universal-consensus round-31 reviewer findings. Concurrency: /images/load was leaking the public-load pending counter on any pre-finally HTTPException (round 31 P1 #1, 11/12 votes). _raise_if_helper_advisor_busy("diffusion") published the counter, then _resolve_diffusion_repo_for_request ran outside the clearing try/finally. A request like repo_id="/tmp/model" with no native_path_lease returned 400 and left public_load_pending() true until process restart, permanently blocking AI Assist. Fix mirrors the training / export pattern: track diffusion_load_window_published in an outer try, publish the flag right after the helper-busy check succeeds, and clear in an outer finally that only fires when the flag is set. This also closes round 31 P1 #6: a second request's failure can no longer decrement a still-active first request's counter, because the second request has not yet flipped its own publish flag. Security: _looks_like_local_diffusion_path missed cwd-relative directories (round 31 P1 #2, 8/12 votes). DiffusionBackend. load_model accepts repo_id="exports/my-flux" as a local directory via Path(repo_id).expanduser().is_dir(), but the detector only flagged values starting with /, ~, ./, ../, backslash, or absolute. Tightened the detector to also reject: * weight-file suffixes (.gguf / .safetensors / .bin / .pt / .pth) * non-2-segment values (`owner`, `a/b/c`, `owner/`, `/repo`, `//`) * 2-segment values whose parts are `.` or `..` * 2-segment values that actually resolve to an existing local path under backend CWD (last-resort exists() probe). The existence probe is a minor side-channel for an already- authenticated caller, accepted in exchange for closing the silent bypass of the new lease boundary. Valid Hub ids like unsloth/FLUX.2-klein-base-4B-GGUF, microsoft/Phi-3.5-mini-instruct still pass through unchanged. Skipped (consistent with prior rounds): * R31 P1 #3 (Tauri / native lease enum missing `load-diffusion-model` op): architectural surface; defer until the Images page actually surfaces a local-path picker. * R31 P1 #4-#5, #8: studio.txt / constraints.txt / pyproject hub pins. Live B200 install path with huggingface_hub==0.36.2, transformers==4.57.6, diffusers==0.37.1 imports Flux2KleinPipeline cleanly. The is_offline_mode import error only triggers when transformers 5.x is paired with hub 0.x, which the constraints pin prevents. * R31 P1 #7 (find_spec vs real import): a full transformers import at module load breaks tests that stub huggingface_hub; find_spec is the existing tradeoff. 98 targeted backend tests pass (test_diffusion_routes, test_diffusion_backend, test_inference_model_validation, test_models_get_model_config_case_resolution, test_data_recipe_seed, test_training_raw_support, test_export_log_cursor).
Three round-32 reviewer findings, plus documentation cleanup for the local-path Tauri/FE plumbing gap. Concurrency: direct DiffusionBackend.load_model callers now publish the helper/advisor pending marker symmetrically (round 32 P1 #3). _raise_if_helper_advisor_busy_for_diffusion gains an optional publish_pending flag; load_model passes True so the destructive unload window is gated by a "diffusion-backend" tag published under _HELPER_ADVISOR_START_LOCK. The route layer's "diffusion" tag and the backend's "diffusion-backend" tag refcount independently (sum > 0 still blocks helper starts), so neither side's clear can erase the other's still-active marker. The existing _release_chat_backend_for_diffusion(check_helper_advisor= True) path stays snapshot-only (publish_pending defaults False) so test / direct callers of that helper do not leak a counter. Validation: export save_directory now rejects ALL ASCII control characters (round 32 P1, save_directory tab finding). The earlier CR / LF only guard missed TAB / VT / FF / DEL, which a caller could smuggle past the export worker's logged subprocess argv. Documentation: DiffusionLoadRequest.repo_id and base_repo updated to reflect that local-path support is gated on a Tauri / frontend load-diffusion-model directory lease producer that has not shipped yet (round 32 P1 #1 from multiple reviewers). The backend lease boundary is correct; what is missing is the FE / native side that mints the matching grant. Until that lands, local paths through the Images route always 400 with "Native path grant is required", which the docstring now spells out. Skipped (consistent with prior rounds): * Hub-pin findings (R32 P1 #4-#6): live B200 install with huggingface_hub==0.36.2 + transformers==4.57.6 + diffusers== 0.37.1 verifiably imports Flux2KleinPipeline. Empirical justification documented in R30 / R30 follow-up commit msgs. * Tauri / native enum surgery (R32 P1 #1, 6 votes): real architectural work but out of scope for this PR's Python surface. Documented now; FE / Rust ticket to follow. 98 targeted backend tests pass (test_diffusion_routes, test_diffusion_backend, test_inference_model_validation, test_models_get_model_config_case_resolution, test_data_recipe_seed, test_training_raw_support, test_export_log_cursor).
for more information, see https://pre-commit.ci
Two round-33 reviewer findings: hub-floor consistency and the multipart upload filename validator gap. Dependencies: reverted the round-26 huggingface_hub>=1.3.0 floor in no-torch-runtime.txt and pyproject.toml (round 33 P1 #1-#5, 4/12 vote consensus). studio.txt forces huggingface_hub==0.36.2 to match the transformers==4.57.6 pin in extras-no-deps.txt, so the 1.3.0 floor was internally inconsistent. Reviewers reproduced the resolver conflict on a fresh install. Empirical justification (re-verified on the live B200 host before the revert): huggingface_hub 0.36.2 + transformers 4.57.6 + diffusers 0.37.1 imports Flux2KleinPipeline cleanly and runs end-to-end image generation. transformers 4.57.6 carries its own transformers.utils.hub.is_offline_mode and does not actually need huggingface_hub.is_offline_mode at import time. The original bump was guarding against the (never-realised) transformers 5.x path, which extras-no-deps explicitly pins away. Validation: multipart /seed/upload-unstructured-file now applies the same _no_control_chars and _reject_embedded_hf_token checks to file.filename that SeedInspectUploadRequest.filename already applies in the JSON variant (round 33 P1 #7). The filename is reflected back to the client, persisted in the per-file meta JSON, and echoed by error responses, so the JSON-side hardening must not be asymmetric with the multipart path. Skipped (consistent with prior rounds): * Find_spec vs full import (R33 P1 #6): preserves test compatibility with the huggingface_hub stub fixture. * React hooks set-state-in-effect lint (R33 P1 #8): codebase has 146 pre-existing violations of the same rule; studio-frontend-ci does not gate on lint. * Direct DiffusionBackend.load_model bypass (R33 P1 #9): the route is the only production entry point, and the backend helper now publishes its own diffusion-backend pending tag (round 32 P1 #3). Direct-caller hardening would require duplicating the lease check into load_model itself, which is out of scope for the route-layer security boundary. * One-segment Hub IDs (R33 P2 #10): strict 2-segment Hub id check is intentional; one-segment names are not valid Hub ids. * Cwd-relative shadow of Hub IDs (R33 P2 #11): documented side-channel tradeoff accepted in round 31 commit msg. 97 targeted backend tests pass.
…5754 Backend CI on Python 3.11 failed 15 diffusion tests after R30's accelerate preflight because the CI test environment does not install accelerate, but the tests mock from_pretrained and never exercise the CPU-offload path that actually needs it. Gate the find_spec("accelerate") check on enable_model_cpu_offload so the dependency is only required for the path that uses it. transformers preflight stays unconditional (it is always touched by from_pretrained). Tests with offload=False (the default) pass without accelerate; production loads with offload=True still get the fail-fast unload-protection guard the original round-30 fix added. 97 targeted backend tests pass (test_diffusion_routes, test_diffusion_backend, test_inference_model_validation, test_data_recipe_seed, test_training_raw_support, test_export_log_cursor).
…e for PR #5754 Round 34 P1: R33 removed the `huggingface_hub>=1.3.0,<2.0` line entirely when the right revert was to restore the pre-PR `huggingface_hub>=0.34.0` floor. install.sh --no-torch installs this requirements file with --no-deps and does NOT install studio.txt afterward, so without an explicit Hub line a no-torch Studio install ends with no huggingface_hub at all and the new diffusion + chat GGUF paths fail at import with ModuleNotFoundError: huggingface_hub. Restores the pre-PR floor and documents both the round-26 walk-back and the round-34 reason the package line stays.
…idator for PR #5754 Re-fix the round-34 CI regression: the round-34 attempt gated the accelerate preflight on enable_model_cpu_offload, but the parameter defaults to True so tests that did not explicitly opt out still hit the missing-accelerate path. Removed the accelerate preflight entirely; transformers' PyTorch backend already pulls accelerate as a hard dep on every supported install path, so the duplicate find_spec guard is redundant in practice and the missing-package case will still surface a clean ModuleNotFoundError from the offload code itself if the user somehow lands there without it. Round 34 P1 cross-block: extend the seed.py multipart filename validators (round 33) to /api/datasets/upload. Both routes echo the filename back to the client and persist it, so per the asymmetric-fix rule the validators must match. Now rejects control characters and embedded HF tokens in file.filename in both upload entry points. 86 targeted backend tests pass.
Round 35 P1: _raise_if_helper_advisor_busy published a new public pending marker without first checking public_load_pending(). Two public workloads (e.g. training + diffusion) could both pass their idle helper-busy snapshot concurrently, then both run through destructive owner teardown before either flipped its own visibility flag (is_training_active, current_checkpoint, loading_model_identifier, diffusion is_loading). Add the missing self-check under _HELPER_ADVISOR_START_LOCK so the second public workload sees the first's pending marker and gets a 503 retry instead of racing for VRAM. Helper / advisor already checked public_load_pending() on its side via _gpu_workload_busy_for_helper; this closes the symmetric public -> public window. 86 backend tests pass + smoke test confirms second public load is refused with 503 while first is pending, and the next public load is permitted once the first clears.
Round 38 P1: R35e added the public_load_pending() check to the
route-side _raise_if_helper_advisor_busy, but the backend-side
_raise_if_helper_advisor_busy_for_diffusion (used by direct
DiffusionBackend.load_model callers AND called transitively from
the route via backend.load_model) never got the same parity
check. That left a window where:
* /api/training/start published the "training" pending marker
via the route helper
* a script/test calling DiffusionBackend.load_model() directly
passed the backend's helper-busy snapshot, never checked
public_load_pending(), and proceeded to destructive owner
teardown + GPU allocation while training was still pending.
Add the parity check with a kw-only `excluding` parameter on
public_load_pending so a route-wrapped backend call can ignore
the marker its own route already published (route publishes
"diffusion"; backend publishes the separate "diffusion-backend"
tag). load_model gains ignore_public_load_pending_workload to
thread the route's tag through; the diffusion route passes
"diffusion" so the backend's atomic check does not self-block
on the route's own publication.
Verified by smoke test: route-wrapped backend with excluding=
"diffusion" allowed during route's diffusion pending; direct
backend call refused with RuntimeError "Another GPU workload is
mid-handoff" when training is pending. 86 backend tests pass.
for more information, see https://pre-commit.ci
…daction for PR #5754 Round 39 review findings (2 P1 + 1 P2): 1. routes/datasets.py: validate raw multipart filename before sanitize so smuggled control chars (NUL, newline) are rejected at the same boundary as the JSON path in seed.py. Previously _sanitize_filename stripped the control chars first, letting raw inputs slip past the validator. 2. routes/inference.py: extend RuntimeError -> 503 mapping in /images/load to classify "Another GPU workload is mid-handoff" as retryable, so the backend-surfaced phrasing matches the route-level 503 already returned from _raise_if_helper_advisor_busy. 3. core/inference/diffusion.py: redact hf_ tokens in the local-path branch of _display_repo_id so a leaf directory named hf_<token> cannot leak into UI labels or structured logs.
… for PR #5754 Round 40 review findings (5 P1 + 1 P2 + 3 P3): P1: 1. routes/export.py: wrap /export/merged, /export/base, /export/gguf, /export/lora in a public-load window so backend.export_*() running in a worker thread cannot be torn down by a concurrent workload that sees is_export_active() == False during the pre-active gap. 2. utils/datasets/llm_assist.py: add public_load_pending_for(workload) helper. routes/inference.py: _release_export_for now refuses 503 when export is mid-handoff. 3. models/models.py: AddScanFolderRequest.path now rejects control characters and embedded hf_ tokens before being logged or reflected. 4. models/training.py: local_datasets and local_eval_datasets list entries get the same control-char / embedded-token validators that model_name / hf_dataset already have. 5. models/training.py: format_type joins the validator list (copied into training_kwargs and into trainer log lines). 6. models/export.py: _validate_save_directory now rejects embedded hf_ tokens (already covered other identifier fields). P2: 7. images-page.tsx:162: defer the mount fetchAndUpdateStatus call through setTimeout(..., 0) so it does not trip react-hooks/set-state-in-effect on scoped lint. P3 cleanup: 8. core/inference/diffusion.py: drop unused gguf_basename assignment. 9. core/inference/diffusion.py + routes/inference.py: drop unused owned_names computation from the chat-release helpers; the final sweep intentionally no longer filters by that snapshot.
for more information, see https://pre-commit.ci
…oad race for PR #5754 Round 41 review findings (2 P1, 5/12 reviewers consensus on the dominant one): 1. routes/export.py: load_checkpoint already refuses 409 when training or another export is active, but /export/{merged,base,gguf,lora} and /cleanup went through _export_public_window without those checks. A user could start training, then trigger an export (or cleanup), and both would double-own the GPU. Factor the training-active and export-active guards into _raise_if_training_active_for_export and _raise_if_export_active_for_export, call them inside the context manager so all /export/* + /cleanup share the same fail-closed semantics as load_checkpoint, and wrap /cleanup with the window. 2. core/inference/diffusion.py: DiffusionBackend.unload_model cleared _pipe / _repo_id / _family / ... under _lock BEFORE _release(old) and _drain_cuda_cache. Between the lock release and cache drain, status() reported is_loaded=False / is_loading=False, so the helper-busy check (which OR-s those two) could let an AI Assist GGUF backend start while diffusion VRAM was still being freed. Set _loading=True inside the lock as a busy marker before clearing the slot, and only clear it in a finally after release + drain complete.
…5754 asyncio.get_event_loop() is deprecated in Python 3.10 and will be removed. Replace with asyncio.get_running_loop() at five call sites that all run inside async def functions where a running loop is guaranteed: two async_generate helpers in diffusion.py and three run_in_executor sites in routes/inference.py (audio synth, diffusion load, streaming chat). Addresses the gemini-code-assist bot review.
This was referenced May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a self-contained Images page in Studio for generating images from local diffusion GGUFs. Pairs with the diffusion GGUFs already published under https://huggingface.co/collections/unsloth/unsloth-diffusion-ggufs (FLUX.2, FLUX.2 klein, FLUX.1, Qwen-Image), and works with anything else the diffusers GGUFQuantizationConfig path supports.
Backend
studio/backend/core/inference/diffusion.pyintroduces aDiffusionBackendsingleton that loads diffusion checkpoints in either the standard diffusers layout or the single-file GGUF layout. GGUF files are dynamically dequantised on-device viadiffusers.GGUFQuantizationConfig, then the rest of the pipeline (VAE, text encoders, scheduler) is pulled from the matching diffusers base repo so end users only ever need one local file plus the metadata repo. Auto-picksbfloat16on CUDA,float16on MPS, andfloat32on CPU; falls back to CPU offload on CUDA to fit FLUX.2 dev on a 16 GB card.studio/backend/routes/inference.pyadds four routes mirroring the llama-server lifecycle:POST /api/inference/images/loadPOST /api/inference/images/generatePOST /api/inference/images/unloadGET /api/inference/images/statusstudio/backend/models/inference.pyaddsDiffusionLoadRequest,DiffusionGenerateRequest, andDiffusionGenerateResponsepydantic schemas. The generate validator rejects non-multiple-of-8 sizes up front so callers see a clean 422 rather than a cryptic VAE crash.studio/backend/requirements/no-torch-runtime.txtpinsggufalongside the existingdiffusersentry soGGUFQuantizationConfigworks out of the box.studio/backend/tests/test_diffusion_backend.pyandstudio/backend/tests/test_diffusion_routes.pyadd 27 CPU-only unit tests covering family detection, validation, lifecycle, and the full FastAPI round trip with diffusers stubbed viasys.modules. Stays well under one minute on a CPU runner.Frontend
studio/frontend/src/features/images/adds a standaloneimages-page.tsxwith a curated FLUX.2 picker (klein 4B / klein 9B / dev / FLUX.1 dev), a HF token field for gated repos, prompt + negative prompt, resolution presets (square 1024/768/512, 832x1216, 1216x832), steps and guidance sliders, an optional seed, and a result gallery that renders the base64 PNGs inline with the run parameters as a caption.studio/frontend/src/features/images/api.tsis the typed thin client over the four backend routes using the existingauthFetch+readFastApiErrorhelpers.studio/frontend/src/app/routes/images.tsxlazy-loads the page on/imagesbehindrequireAuth.studio/frontend/src/components/app-sidebar.tsxadds aPaintBrush02Iconnav item between Recipes and Export. Hidden in chat-only mode.studio/frontend/src/app/router.tsxregisters the new route in the existing addChildren list.CI validation
Validated on ubuntu-latest, macos-14, and windows-latest on a staging fork (unslothai/unsloth-staging-1 PR #88) before opening this PR. The 27 backend tests pass on all three runners with the minimal dep set listed in the diff. No CI files are changed in this PR; the existing
Backend CIand consolidated tests workflows will auto-pick up the new test files via theirtests/**path filters.Test plan
python -m pytest studio/backend/tests/test_diffusion_backend.py studio/backend/tests/test_diffusion_routes.py(27 passed)unsloth/FLUX.2-klein-4B-GGUF(running locally now, screenshots will be attached in the PR review)