Studio: clearer error for diffusion GGUFs loaded as chat models by danielhanchen · Pull Request #5857 · unslothai/unsloth

danielhanchen · 2026-05-29T12:18:35Z

Summary

Fixes #5842. Loading a diffusion / image GGUF (FLUX, Qwen-Image, ...) through Studio's model loader fails with a misleading message:

Failed to load model: llama-server failed to start. Check that the GGUF file is valid and you have enough memory.

The file is valid and there is plenty of free memory. The real reason is in the llama-server log:

error loading model: unknown model architecture: 'qwen_image'

llama.cpp proper has no diffusion architectures, so it cannot run these GGUFs as chat models. The generic message sent the reporter to debug memory and file corruption that were never the problem.

Change

LlamaCppBackend.load_model now classifies the captured llama-server output before raising, via a small pure _classify_llama_start_failure staticmethod (so it is unit-testable without spawning a subprocess):

Known diffusion architectures (qwen_image, flux, sd1, sdxl, sd3, aura, hidream, cosmos, ltxv, hyvid, wan) get a message pointing at Studio's Images page. Pairs with Studio: local diffusion image generation page #5754, which adds local diffusion GGUF image generation for exactly these (FLUX, Qwen-Image).
Any other unknown model architecture: '<arch>' gets a precise "this model type is not supported by llama-server" message that names the architecture, instead of blaming memory.
The existing Ollama compatibility message and the out-of-memory / missing-binary fallback are unchanged.

Tests

studio/backend/tests/test_llama_cpp_start_failure_classification.py covers Qwen-Image, every diffusion arch, a non-diffusion unknown arch, the Ollama path, generic OOM, and empty output.

python -m pytest studio/backend/tests/test_llama_cpp_start_failure_classification.py studio/backend/tests/test_llama_cpp_wait_for_health.py -q
21 passed

All 267 llama_cpp backend tests pass.

llama-server fails with "unknown model architecture: '<arch>'" when a diffusion / image GGUF (FLUX, Qwen-Image, ...) is loaded as a chat model. The file is valid and there is plenty of memory, but load_model reported the generic "invalid file or out of memory" message, sending users to debug the wrong problem (#5842). Classify the captured llama-server output instead: route known diffusion architectures to Studio's Images page, give other unknown architectures a precise "unsupported architecture" message, and keep the existing Ollama compatibility and out-of-memory messages unchanged. Adds unit tests for the classification.

chatgpt-codex-connector · 2026-05-29T12:18:41Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request refactors and improves error reporting when llama-server fails to start by introducing a dedicated helper method, _classify_llama_start_failure, along with comprehensive unit tests. It specifically detects when a user attempts to load a diffusion/image GGUF (such as FLUX or Qwen-Image) as a chat model and directs them to the Images page instead of showing a misleading out-of-memory error. The review feedback points out a potential issue where substring matching for architecture names (like 'wan') could lead to false positives (e.g., matching 'taiwan'), and suggests using exact matching instead.

gemini-code-assist · 2026-05-29T12:19:37Z

+            arch = arch_match.group(1)
+            if any(d in arch for d in LlamaCppBackend._DIFFUSION_ARCHES):
+                return (
+                    f"'{arch}' is a diffusion (image-generation) GGUF, which "


Using substring matching (any(d in arch for d in ...)), especially with short architecture names like 'wan', can lead to false positives. For example, any future or custom text model architecture containing 'wan' (such as 'taiwan') would be incorrectly classified as a diffusion model, directing the user to the Images page with a misleading error message. Since GGUF architecture identifiers are standardized, exact matching is much safer and more robust.

Suggested change

f"'{arch}' is a diffusion (image-generation) GGUF, which "

if arch in LlamaCppBackend._DIFFUSION_ARCHES:

Imagineer99 · 2026-05-29T12:45:45Z

In this pr the warning works as expected.

…rch loads - Match general.architecture exactly (frozenset) instead of substring, so a chat arch that merely contains a short token like "wan" or "sd1" (e.g. "taiwan") is no longer misrouted to the Images page - Keep the "use Ollama directly" hint for non-diffusion Ollama models that fail with unknown model architecture; diffusion still routes to Images - Add lumina2 to the diffusion arch list - Give the test structlog stub a get_logger so it no longer breaks sibling tests collected in the same pytest process - Add regression tests for the substring and Ollama-ordering cases

for more information, see https://pre-commit.ci

danielhanchen requested a review from rolandtannous as a code owner May 29, 2026 12:18

[pre-commit.ci] auto fixes from pre-commit.com hooks

c2fa477

for more information, see https://pre-commit.ci

danielhanchen mentioned this pull request May 29, 2026

Fix: [Bug] Unsloth Studio unclear errors loading models. Out of memory or invalid model? #5847

Closed

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

danielhanchen and others added 2 commits May 31, 2026 08:48

[pre-commit.ci] auto fixes from pre-commit.com hooks

10a4b0d

for more information, see https://pre-commit.ci

danielhanchen merged commit 6cc2220 into main May 31, 2026
32 checks passed

danielhanchen deleted the fix/studio-diffusion-gguf-load-error branch May 31, 2026 09:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Studio: clearer error for diffusion GGUFs loaded as chat models#5857

Studio: clearer error for diffusion GGUFs loaded as chat models#5857
danielhanchen merged 4 commits into
mainfrom
fix/studio-diffusion-gguf-load-error

danielhanchen commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

Imagineer99 commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	f"'{arch}' is a diffusion (image-generation) GGUF, which "
	if arch in LlamaCppBackend._DIFFUSION_ARCHES:

Uh oh!

Conversation

danielhanchen commented May 29, 2026

Summary

Change

Tests

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Imagineer99 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Imagineer99 commented May 29, 2026 •

edited

Loading