fix(mtmd): handle Gemma 4 audio projector embedding size by abetlen · Pull Request #24091 · ggml-org/llama.cpp

abetlen · 2026-06-03T21:23:04Z

Overview

PR #24077 seems to have introduced a regression that broke model loading of the 4B Gemma 4 model due to it's audio embedding projector.

Closes #24084

All credit to @MrSimonC here just confirming the fix works and providing the patch here.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

tha80 · 2026-06-03T22:36:31Z

can confirm, that this simple one-line PR fixes the regression for me. 👍

Sbenazar · 2026-06-04T02:12:31Z

Built this branch and can confirm it fixes the crash on an AMD Radeon 780M (RADV, Vulkan). Before the patch, loading gemma-4-E4B-it together with its mmproj aborted in clip_n_mmproj_embd with Unknown projector type. With the GEMMA4A case added, llama-server loads and E4B audio transcribes fine. Thanks for the quick turnaround.

CISC · 2026-06-04T07:56:12Z

            return ctx->model.mm_input_proj_w->ne[0];
        case PROJECTOR_TYPE_GEMMA4V:
        case PROJECTOR_TYPE_GEMMA4UV:
+        case PROJECTOR_TYPE_GEMMA4A:


Not incorrect, but this used to be further down (where PROJECTOR_TYPE_GEMMA4UA now is), using hparams.projection_dim instead, don't know if there's a specific reason for that? cc/ @ngxson

hmm ok seems like all of them should be in the same code block here

the ctx->model.hparams.projection_dim code branch should be removed, it was an oversight from the initial PR for gemma 4 audio; well, that's the nasty thing about reviewing a large AI-generated PR

ngxson · 2026-06-04T09:14:24Z

@ggml-org/maintainers can someone please give approval(s), thanks!

* origin/master: (57 commits) server : disable on-device spec checkpoints (ggml-org#24108) arg: fix double mtp downloads (ggml-org#24128) webui: [a11y] fix keyboard navigation issues in chat interface and sidebar (ggml-org#23132) Move duplicated imatrix code into single common imatrix-loader.cpp (ggml-org#22445) ui: Fixed packages (ggml-org#24119) ui: added single line reasoning preview (ggml-org#23601) return filter to save memory (ggml-org#24125) convert: Fix Gemma 4 Unified conversion (ggml-org#24118) ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (ggml-org#22209) server: avoid unnecessary checkpoint restore when new tokens are present (ggml-org#24110) agents: refactor, include more guidelines (ggml-org#24111) webui: fix tool selector toggle/counter, key tools by stable identity (ggml-org#24065) build : use umbrella Headers directory for XCFramework module map (ggml-org#23974) server : add header to tools/server/server-http.h (ggml-org#24089) cmake: skip cvector-generator and export-lora when CPU backend is disabled (ggml-org#24053) fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091) readme : add status badges (ggml-org#24104) tests : refactor test-save-load-state to accept token input (ggml-org#24073) metal : reduce rset heartbeat from 500ms -> 5ms (ggml-org#24074) ggml-webgpu: FlashAttention refactor + standardize quantization support (ggml-org#23834) ...

) * mtmd: handle Gemma 4 audio projector embedding size * rm projection_dim from clip_n_mmproj_embd --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> (cherry picked from commit e3ba22d)

#168) * mtmd, model: allow skip build_vit() (ggml-org#24077) * add model * nits (cherry picked from commit a731805) * mtmd: fix Gemma 4 unified FPE (ggml-org#24088) (cherry picked from commit 94a220c) * mtmd: enable non-causal vision for gemma 4 unified (ggml-org#24082) (cherry picked from commit c8d6a00) * fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091) * mtmd: handle Gemma 4 audio projector embedding size * rm projection_dim from clip_n_mmproj_embd --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> (cherry picked from commit e3ba22d) * convert: Fix Gemma 4 Unified conversion (ggml-org#24118) * Fix Gemma 4 Unified conversion * Set audio hidden size to audio_embed_dim (cherry picked from commit e802356) * ggml-metal: fall back to CPU for im2col when KH*KW exceeds threadgroup limit The Metal im2col kernel launches KH*KW threads per threadgroup (one per kernel element). For large conv kernels — e.g. the Gemma 4 unified vision (gemma4uv) patch embedding — KH*KW exceeds the Apple GPU 1024-thread cap and the kernel hits a runtime GGML_ASSERT instead of producing a result. Guard supports_op so an oversized im2col is declined; the backend scheduler then runs that one op on CPU while the rest of the graph stays on the GPU. Fixes Gemma 4 12B vision on the Metal backend (verified end-to-end: loads mmproj + describes an image correctly on an M5 Max). --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Andrei <abetlen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

mtmd: handle Gemma 4 audio projector embedding size

b4e8bba

abetlen requested a review from a team as a code owner June 3, 2026 21:23

github-actions Bot added the examples label Jun 3, 2026

CISC reviewed Jun 4, 2026

View reviewed changes

rm projection_dim from clip_n_mmproj_embd

fd6cad5

CISC approved these changes Jun 4, 2026

View reviewed changes

ggerganov approved these changes Jun 4, 2026

View reviewed changes

ngxson merged commit e3ba22d into ggml-org:master Jun 4, 2026
24 of 25 checks passed

ngxson mentioned this pull request Jun 4, 2026

Eval bug: Gemma 4 E4B crashes with corrupt stack on 94a220cd6 #24100

Closed

TheTom mentioned this pull request Jun 5, 2026

mtmd: port gemma4uv/gemma4ua support — fixes Gemma 4 12B vision (#163) TheTom/llama-cpp-turboquant#168

Merged

LostRuins mentioned this pull request Jun 5, 2026

mtmd, model: add Gemma 4 "unified" variant #24077

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mtmd): handle Gemma 4 audio projector embedding size#24091

fix(mtmd): handle Gemma 4 audio projector embedding size#24091
ngxson merged 2 commits into
ggml-org:masterfrom
abetlen:fix/gemma4a-mmproj-embd

abetlen commented Jun 3, 2026

Uh oh!

tha80 commented Jun 3, 2026

Uh oh!

Sbenazar commented Jun 4, 2026

Uh oh!

CISC Jun 4, 2026

Uh oh!

ngxson Jun 4, 2026

Uh oh!

ngxson commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

abetlen commented Jun 3, 2026

Overview

Requirements

Uh oh!

tha80 commented Jun 3, 2026

Uh oh!

Sbenazar commented Jun 4, 2026

Uh oh!

CISC Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants