Studio: clearer error for diffusion GGUFs loaded as chat models#5857
Conversation
llama-server fails with "unknown model architecture: '<arch>'" when a diffusion / image GGUF (FLUX, Qwen-Image, ...) is loaded as a chat model. The file is valid and there is plenty of memory, but load_model reported the generic "invalid file or out of memory" message, sending users to debug the wrong problem (#5842). Classify the captured llama-server output instead: route known diffusion architectures to Studio's Images page, give other unknown architectures a precise "unsupported architecture" message, and keep the existing Ollama compatibility and out-of-memory messages unchanged. Adds unit tests for the classification.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request refactors and improves error reporting when llama-server fails to start by introducing a dedicated helper method, _classify_llama_start_failure, along with comprehensive unit tests. It specifically detects when a user attempts to load a diffusion/image GGUF (such as FLUX or Qwen-Image) as a chat model and directs them to the Images page instead of showing a misleading out-of-memory error. The review feedback points out a potential issue where substring matching for architecture names (like 'wan') could lead to false positives (e.g., matching 'taiwan'), and suggests using exact matching instead.
| arch = arch_match.group(1) | ||
| if any(d in arch for d in LlamaCppBackend._DIFFUSION_ARCHES): | ||
| return ( | ||
| f"'{arch}' is a diffusion (image-generation) GGUF, which " |
There was a problem hiding this comment.
Using substring matching (any(d in arch for d in ...)), especially with short architecture names like 'wan', can lead to false positives. For example, any future or custom text model architecture containing 'wan' (such as 'taiwan') would be incorrectly classified as a diffusion model, directing the user to the Images page with a misleading error message. Since GGUF architecture identifiers are standardized, exact matching is much safer and more robust.
| f"'{arch}' is a diffusion (image-generation) GGUF, which " | |
| if arch in LlamaCppBackend._DIFFUSION_ARCHES: |
…rch loads - Match general.architecture exactly (frozenset) instead of substring, so a chat arch that merely contains a short token like "wan" or "sd1" (e.g. "taiwan") is no longer misrouted to the Images page - Keep the "use Ollama directly" hint for non-diffusion Ollama models that fail with unknown model architecture; diffusion still routes to Images - Add lumina2 to the diffusion arch list - Give the test structlog stub a get_logger so it no longer breaks sibling tests collected in the same pytest process - Add regression tests for the substring and Ollama-ordering cases
for more information, see https://pre-commit.ci

Summary
Fixes #5842. Loading a diffusion / image GGUF (FLUX, Qwen-Image, ...) through Studio's model loader fails with a misleading message:
The file is valid and there is plenty of free memory. The real reason is in the llama-server log:
llama.cpp proper has no diffusion architectures, so it cannot run these GGUFs as chat models. The generic message sent the reporter to debug memory and file corruption that were never the problem.
Change
LlamaCppBackend.load_modelnow classifies the captured llama-server output before raising, via a small pure_classify_llama_start_failurestaticmethod (so it is unit-testable without spawning a subprocess):qwen_image,flux,sd1,sdxl,sd3,aura,hidream,cosmos,ltxv,hyvid,wan) get a message pointing at Studio's Images page. Pairs with Studio: local diffusion image generation page #5754, which adds local diffusion GGUF image generation for exactly these (FLUX, Qwen-Image).unknown model architecture: '<arch>'gets a precise "this model type is not supported by llama-server" message that names the architecture, instead of blaming memory.Tests
studio/backend/tests/test_llama_cpp_start_failure_classification.pycovers Qwen-Image, every diffusion arch, a non-diffusion unknown arch, the Ollama path, generic OOM, and empty output.All 267
llama_cppbackend tests pass.