Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
b2b660f
Studio: add local diffusion image generation page
danielhanchen May 24, 2026
a08686c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 24, 2026
f8504e3
Studio: fix Images page SectionCard required icon prop
danielhanchen May 24, 2026
bf5c4ac
Fix/adjust diffusion review findings for PR #5754
danielhanchen May 24, 2026
669964f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 24, 2026
d6f2a23
Fix/adjust diffusion lifecycle + UI for PR #5754
danielhanchen May 24, 2026
faa6822
Fix/adjust diffusion symmetric chat handoff for PR #5754
danielhanchen May 24, 2026
8074a2b
Fix/adjust diffusion: smart base, safetensors, peak VRAM, GGUF guard
danielhanchen May 25, 2026
8c10cf5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
6089720
Fix/adjust diffusion: export unload + sd3.5 alias for PR #5754
danielhanchen May 25, 2026
1601b78
Fix safetensors chat backend unload for PR #5754
danielhanchen May 25, 2026
0f9b19b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
65f7a26
Fix/adjust diffusion lifecycle for round 3 findings (PR #5754)
danielhanchen May 25, 2026
fbf06be
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
f44b55c
Fix/adjust diffusion: clear stale metadata on failed swap for PR #5754
danielhanchen May 25, 2026
18b50c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
0f3ed08
Fix/adjust diffusion: token leak + cache guard + locked status + seed…
danielhanchen May 25, 2026
ec507c5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
fb0a31b
Fix/adjust diffusion: forward true_cfg_scale on Qwen/Flux for negativ…
danielhanchen May 25, 2026
f3f3f06
Fix/adjust diffusion: pin requests chain in no-deps runtime for PR #5754
danielhanchen May 25, 2026
f06895b
Fix/adjust diffusion: round 5 lifecycle + validation hardening for PR…
danielhanchen May 25, 2026
8858104
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
04de106
Fix/adjust diffusion: round 6 race-free lifecycle + delete guards for…
danielhanchen May 25, 2026
0fd9e90
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
fa8efaf
Fix/adjust diffusion: round 7 swap-aware guards + race-free generate …
danielhanchen May 25, 2026
92eccc3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
c1f9aac
Fix/adjust diffusion: round 8 async unloads + tighter handoffs for PR…
danielhanchen May 25, 2026
0ae2055
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
1193c81
Fix/adjust diffusion: round 9 shared release helpers + export-active …
danielhanchen May 25, 2026
b34fc62
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
641cdcc
Fix/adjust diffusion: round 10 fix export-active asymmetry + GGUF cha…
danielhanchen May 25, 2026
1698b66
Fix/adjust diffusion: tolerate ExportBackend without is_export_active…
danielhanchen May 25, 2026
4b1b149
Fix/adjust diffusion: round 11 export-active defense-in-depth + state…
danielhanchen May 25, 2026
921c602
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
8b8980a
Fix/adjust diffusion: round 12 local-path GGUF + per-variant delete +…
danielhanchen May 25, 2026
d8b785a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
ae41bfd
Fix/adjust diffusion: round 13 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
ff98c6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
54adfdf
Fix _resolve_local_gguf_child traversal check for Windows for PR #5754
danielhanchen May 25, 2026
f501ab8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
e03ed3d
Fix/adjust diffusion: round 14 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
a9b3d1a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
59aa75b
Fix/adjust diffusion: round 15 P1+P2+P3 batch for PR #5754
danielhanchen May 25, 2026
2f9bb69
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
05184ad
Fix llama_cpp source-inspection tests for split load_model for PR #5754
danielhanchen May 25, 2026
7c8f1eb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
2ef9b0e
Fix/adjust diffusion: round 16 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
e948a96
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
6ac6757
Fix Windows test failures from round 16 changes for PR #5754
danielhanchen May 25, 2026
e2f41e4
Fix/adjust diffusion: round 17 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
72ec670
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
da27143
Fix/adjust diffusion: round 18 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
369573b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
c20ed25
Fix/adjust diffusion: round 19 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
c520a47
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
ff3bad3
Fix/adjust diffusion: round 20 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
04bd9b2
Fix/adjust diffusion: round 21 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
63f3faf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
09c5114
Fix/adjust diffusion: round 22 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
c6c4378
Fix/adjust diffusion: round 23 P1+P2 batch for PR #5754
danielhanchen May 25, 2026
0a7fe59
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
48740c2
Fix/adjust diffusion: round 24 P1 batch for PR #5754
danielhanchen May 25, 2026
3df9386
Merge remote-tracking branch 'origin/main' into studio-diffusion-images
danielhanchen May 25, 2026
7b5fe1c
Fix/adjust diffusion: round 25 P1 batch for PR #5754
danielhanchen May 25, 2026
4785f76
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
fd7d334
Fix studio.txt vs constraints.txt huggingface-hub conflict (PR #5754)
danielhanchen May 25, 2026
65ea3a2
Fix/adjust diffusion: round 26 P1 batch for PR #5754
danielhanchen May 25, 2026
e17aea6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
6c528fb
Fix/adjust diffusion: round 27 P1 + P2 batch for PR #5754
danielhanchen May 25, 2026
79da5d9
Fix/adjust diffusion: round 27 follow-up P1 batch for PR #5754
danielhanchen May 25, 2026
c4c9e2a
Fix/adjust diffusion: round 28 P1 + P2 batch for PR #5754
danielhanchen May 25, 2026
760bd38
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
1f5f13c
Merge remote-tracking branch 'origin/main' into studio-diffusion-images
danielhanchen May 25, 2026
bec81b8
Fix/adjust diffusion: round 29 P1 + P2 batch for PR #5754
danielhanchen May 25, 2026
b8152a5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
3b60d40
Fix/adjust diffusion: round 30 P1 + P2 batch for PR #5754
danielhanchen May 25, 2026
91e3a28
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
cae3712
Fix/adjust diffusion: round 30 follow-up P1 batch for PR #5754
danielhanchen May 25, 2026
5350d4c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
0897494
Fix/adjust diffusion: round 31 P1 batch for PR #5754
danielhanchen May 25, 2026
90b51cc
Fix/adjust diffusion: round 32 P1 batch for PR #5754
danielhanchen May 25, 2026
a1bec65
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
e3ce1c8
Fix/adjust diffusion: round 33 P1 batch for PR #5754
danielhanchen May 25, 2026
4e1c622
Fix/adjust diffusion: gate accelerate preflight on cpu_offload for PR…
danielhanchen May 25, 2026
081377f
Fix/adjust diffusion: restore huggingface_hub line to no-torch-runtim…
danielhanchen May 25, 2026
09ca2b2
Fix/adjust diffusion: drop accelerate preflight + datasets upload val…
danielhanchen May 25, 2026
aeba18d
Fix/adjust diffusion: public_load_pending self-check for PR #5754
danielhanchen May 25, 2026
e30c5ed
Fix/adjust diffusion: backend public_load_pending parity for PR #5754
danielhanchen May 25, 2026
d0f4bb5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
f5186e2
Fix/adjust diffusion: filename validator order, 503 mapping, token re…
danielhanchen May 25, 2026
784a9ed
Fix/adjust diffusion: export public-load window, identifier hardening…
danielhanchen May 25, 2026
029ca74
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2026
ca68fd5
Fix/adjust diffusion: export active-state guards, cleanup window, unl…
danielhanchen May 25, 2026
07b0cf7
Replace asyncio.get_event_loop with asyncio.get_running_loop for PR #…
danielhanchen May 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
489 changes: 489 additions & 0 deletions studio/backend/core/inference/diffusion.py

Large diffs are not rendered by default.

67 changes: 67 additions & 0 deletions studio/backend/models/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -1421,3 +1421,70 @@ class AnthropicMessagesResponse(BaseModel):
stop_reason: Optional[str] = None
stop_sequence: Optional[str] = None
usage: AnthropicUsage = Field(default_factory = AnthropicUsage)


# ── Diffusion image generation ────────────────────────────────────


class DiffusionLoadRequest(BaseModel):
"""Load a diffusion image-generation model.

repo_id is the HF repo (either GGUF-only or full diffusers layout).
gguf_filename selects the quant when repo_id is a GGUF repo.
base_repo overrides the auto-picked diffusers base used for the
VAE / text encoders when loading a GGUF-only repo.
"""

repo_id: str = Field(..., description = "HF repo id")
gguf_filename: Optional[str] = Field(
None, description = "GGUF filename inside repo_id (Q4_K_S, Q8_0, ...)"
)
base_repo: Optional[str] = Field(
None,
description = "Diffusers base repo to source VAE + text encoders from",
)
family: Optional[str] = Field(
None,
description = "Force pipeline family: flux.2-klein | flux.2 | flux.1 | qwen-image | stable-diffusion-3 | stable-diffusion-xl",
)
hf_token: Optional[str] = Field(
None, description = "HuggingFace token for gated models"
)
enable_model_cpu_offload: bool = Field(
True,
description = "Offload submodules to CPU between forwards. Trades a small speed hit for ~6 GB less VRAM on FLUX-class models.",
)


class DiffusionGenerateRequest(BaseModel):
"""Generate a single image from the currently-loaded diffusion model."""

prompt: str = Field(..., min_length = 1, max_length = 4000)
negative_prompt: Optional[str] = Field(None, max_length = 4000)
num_inference_steps: int = Field(24, ge = 1, le = 200)
guidance_scale: float = Field(3.5, ge = 0.0, le = 20.0)
width: int = Field(1024, ge = 64, le = 2048)
height: int = Field(1024, ge = 64, le = 2048)
seed: Optional[int] = Field(
None, description = "Deterministic seed for reproducible outputs"
)

@field_validator("width", "height")
@classmethod
def _multiple_of_eight(cls, v: int) -> int:
if v % 8:
raise ValueError("width and height must be multiples of 8")
return v


class DiffusionGenerateResponse(BaseModel):
image_b64: str = Field(..., description = "Base64-encoded PNG")
image_mime: str = "image/png"
width: int
height: int
num_inference_steps: int
guidance_scale: float
seed: Optional[int] = None
duration_ms: int
model: Optional[str] = None
family: Optional[str] = None
3 changes: 3 additions & 0 deletions studio/backend/requirements/no-torch-runtime.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ peft>=0.18.0,!=0.11.0
huggingface_hub>=0.34.0
hf_transfer
diffusers
# Required by diffusers.GGUFQuantizationConfig (used by the Images page
# to load FLUX.2 / FLUX.1 / Qwen-Image / SDXL GGUFs from the Hub).
gguf

# Transitive deps required because this file is installed with --no-deps.
# Without these, `from transformers import AutoConfig` fails at import time.
Expand Down
127 changes: 127 additions & 0 deletions studio/backend/routes/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,9 @@ def _friendly_error(exc: Exception) -> str:
ListOpenAIContainersResponse,
OpenAIContainerRequest,
OpenAIContainerSummary,
DiffusionLoadRequest,
DiffusionGenerateRequest,
DiffusionGenerateResponse,
)
from core.inference.anthropic_compat import (
anthropic_messages_to_openai,
Expand Down Expand Up @@ -1584,6 +1587,130 @@ async def generate_audio(
)


# =====================================================================
# Diffusion image generation (/images/*)
# =====================================================================
#
# Lifecycle mirrors the GGUF chat backend: explicit load -> generate ->
# unload. Diffusion pipelines compete for the same GPU as llama-server,
# so callers on < 24 GB GPUs should unload the chat model first.


def _get_diffusion_backend():
"""Lazy import so non-diffusion installs do not pay the diffusers
cost at process start. The backend itself is a process-wide
singleton; reusing it across requests keeps pipeline state alive."""
from core.inference.diffusion import get_diffusion_backend

return get_diffusion_backend()


@router.post("/images/load")
async def diffusion_load(
payload: DiffusionLoadRequest,
current_subject: str = Depends(get_current_subject),
):
"""Load a diffusion image-generation model.

Pass either a full diffusers repo or a GGUF-only repo plus the
desired ``gguf_filename``. Returns the new status payload (same
shape as ``/images/status``).
"""
backend = _get_diffusion_backend()
try:
status = await asyncio.get_event_loop().run_in_executor(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The asyncio.get_event_loop() function is deprecated since Python 3.10. Use asyncio.get_running_loop() instead to retrieve the current running event loop.

Suggested change
status = await asyncio.get_event_loop().run_in_executor(
status = await asyncio.get_running_loop().run_in_executor(

None,
lambda: backend.load_model(
repo_id = payload.repo_id,
gguf_filename = payload.gguf_filename,
base_repo = payload.base_repo,
family_override = payload.family,
hf_token = payload.hf_token,
enable_model_cpu_offload = payload.enable_model_cpu_offload,
),
)
return JSONResponse(content = status)
except RuntimeError as exc:
raise HTTPException(status_code = 400, detail = str(exc))
except Exception as exc:
logger.exception("Diffusion load failed")
raise HTTPException(status_code = 500, detail = str(exc))


@router.post("/images/unload")
async def diffusion_unload(
current_subject: str = Depends(get_current_subject),
):
"""Unload the current diffusion model and free GPU memory."""
backend = _get_diffusion_backend()
return backend.unload_model()


@router.get("/images/status")
async def diffusion_status(
current_subject: str = Depends(get_current_subject),
):
"""Return diffusion backend status (loaded, family, device, etc.)."""
backend = _get_diffusion_backend()
return backend.status()


@router.post("/images/generate", response_model = DiffusionGenerateResponse)
async def diffusion_generate(
payload: DiffusionGenerateRequest,
current_subject: str = Depends(get_current_subject),
):
"""Generate a single image from the loaded diffusion model.

Returns a base64 PNG plus the generation parameters that produced
it so the frontend can render the result and the user can reproduce
it via the same seed.
"""
backend = _get_diffusion_backend()
if not backend.is_loaded:
raise HTTPException(
status_code = 400,
detail = "No diffusion model is loaded. POST /api/inference/images/load first.",
)

start = time.time()
try:
from core.inference.diffusion import async_generate, encode_png_base64

image = await async_generate(
backend,
prompt = payload.prompt,
negative_prompt = payload.negative_prompt,
num_inference_steps = payload.num_inference_steps,
guidance_scale = payload.guidance_scale,
width = payload.width,
height = payload.height,
seed = payload.seed,
)
except ValueError as exc:
raise HTTPException(status_code = 400, detail = str(exc))
except RuntimeError as exc:
raise HTTPException(status_code = 400, detail = str(exc))
except Exception as exc:
logger.exception("Diffusion generation failed")
raise HTTPException(status_code = 500, detail = str(exc))

duration_ms = int((time.time() - start) * 1000)
status = backend.status()
return DiffusionGenerateResponse(
image_b64 = encode_png_base64(image),
image_mime = "image/png",
width = payload.width,
height = payload.height,
num_inference_steps = payload.num_inference_steps,
guidance_scale = payload.guidance_scale,
seed = payload.seed,
duration_ms = duration_ms,
model = status.get("repo_id"),
family = status.get("family"),
)


# =====================================================================
# OpenAI-Compatible Chat Completions (/chat/completions)
# =====================================================================
Expand Down
Loading
Loading