Skip to content

Studio: clearer error for diffusion GGUFs loaded as chat models#5857

Merged
danielhanchen merged 4 commits into
mainfrom
fix/studio-diffusion-gguf-load-error
May 31, 2026
Merged

Studio: clearer error for diffusion GGUFs loaded as chat models#5857
danielhanchen merged 4 commits into
mainfrom
fix/studio-diffusion-gguf-load-error

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

Fixes #5842. Loading a diffusion / image GGUF (FLUX, Qwen-Image, ...) through Studio's model loader fails with a misleading message:

Failed to load model: llama-server failed to start. Check that the GGUF file is valid and you have enough memory.

The file is valid and there is plenty of free memory. The real reason is in the llama-server log:

error loading model: unknown model architecture: 'qwen_image'

llama.cpp proper has no diffusion architectures, so it cannot run these GGUFs as chat models. The generic message sent the reporter to debug memory and file corruption that were never the problem.

Change

LlamaCppBackend.load_model now classifies the captured llama-server output before raising, via a small pure _classify_llama_start_failure staticmethod (so it is unit-testable without spawning a subprocess):

  • Known diffusion architectures (qwen_image, flux, sd1, sdxl, sd3, aura, hidream, cosmos, ltxv, hyvid, wan) get a message pointing at Studio's Images page. Pairs with Studio: local diffusion image generation page #5754, which adds local diffusion GGUF image generation for exactly these (FLUX, Qwen-Image).
  • Any other unknown model architecture: '<arch>' gets a precise "this model type is not supported by llama-server" message that names the architecture, instead of blaming memory.
  • The existing Ollama compatibility message and the out-of-memory / missing-binary fallback are unchanged.

Tests

studio/backend/tests/test_llama_cpp_start_failure_classification.py covers Qwen-Image, every diffusion arch, a non-diffusion unknown arch, the Ollama path, generic OOM, and empty output.

python -m pytest studio/backend/tests/test_llama_cpp_start_failure_classification.py studio/backend/tests/test_llama_cpp_wait_for_health.py -q
21 passed

All 267 llama_cpp backend tests pass.

llama-server fails with "unknown model architecture: '<arch>'" when a
diffusion / image GGUF (FLUX, Qwen-Image, ...) is loaded as a chat model.
The file is valid and there is plenty of memory, but load_model reported
the generic "invalid file or out of memory" message, sending users to
debug the wrong problem (#5842).

Classify the captured llama-server output instead: route known diffusion
architectures to Studio's Images page, give other unknown architectures a
precise "unsupported architecture" message, and keep the existing Ollama
compatibility and out-of-memory messages unchanged.

Adds unit tests for the classification.
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors and improves error reporting when llama-server fails to start by introducing a dedicated helper method, _classify_llama_start_failure, along with comprehensive unit tests. It specifically detects when a user attempts to load a diffusion/image GGUF (such as FLUX or Qwen-Image) as a chat model and directs them to the Images page instead of showing a misleading out-of-memory error. The review feedback points out a potential issue where substring matching for architecture names (like 'wan') could lead to false positives (e.g., matching 'taiwan'), and suggests using exact matching instead.

arch = arch_match.group(1)
if any(d in arch for d in LlamaCppBackend._DIFFUSION_ARCHES):
return (
f"'{arch}' is a diffusion (image-generation) GGUF, which "

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using substring matching (any(d in arch for d in ...)), especially with short architecture names like 'wan', can lead to false positives. For example, any future or custom text model architecture containing 'wan' (such as 'taiwan') would be incorrectly classified as a diffusion model, directing the user to the Images page with a misleading error message. Since GGUF architecture identifiers are standardized, exact matching is much safer and more robust.

Suggested change
f"'{arch}' is a diffusion (image-generation) GGUF, which "
if arch in LlamaCppBackend._DIFFUSION_ARCHES:

@Imagineer99

Imagineer99 commented May 29, 2026

Copy link
Copy Markdown
Collaborator
image

In this pr the warning works as expected.

danielhanchen and others added 2 commits May 31, 2026 08:48
…rch loads

- Match general.architecture exactly (frozenset) instead of substring, so a
  chat arch that merely contains a short token like "wan" or "sd1" (e.g.
  "taiwan") is no longer misrouted to the Images page
- Keep the "use Ollama directly" hint for non-diffusion Ollama models that
  fail with unknown model architecture; diffusion still routes to Images
- Add lumina2 to the diffusion arch list
- Give the test structlog stub a get_logger so it no longer breaks sibling
  tests collected in the same pytest process
- Add regression tests for the substring and Ollama-ordering cases
@danielhanchen danielhanchen merged commit 6cc2220 into main May 31, 2026
32 checks passed
@danielhanchen danielhanchen deleted the fix/studio-diffusion-gguf-load-error branch May 31, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Unsloth Studio unclear errors loading models. Out of memory or invalid model?

2 participants