Skip to content

fix(mtmd): handle Gemma 4 audio projector embedding size#24091

Merged
ngxson merged 2 commits into
ggml-org:masterfrom
abetlen:fix/gemma4a-mmproj-embd
Jun 4, 2026
Merged

fix(mtmd): handle Gemma 4 audio projector embedding size#24091
ngxson merged 2 commits into
ggml-org:masterfrom
abetlen:fix/gemma4a-mmproj-embd

Conversation

@abetlen
Copy link
Copy Markdown
Collaborator

@abetlen abetlen commented Jun 3, 2026

Overview

PR #24077 seems to have introduced a regression that broke model loading of the 4B Gemma 4 model due to it's audio embedding projector.

Closes #24084

All credit to @MrSimonC here just confirming the fix works and providing the patch here.

Requirements

@abetlen abetlen requested a review from a team as a code owner June 3, 2026 21:23
@tha80
Copy link
Copy Markdown
Contributor

tha80 commented Jun 3, 2026

can confirm, that this simple one-line PR fixes the regression for me. 👍

@Sbenazar
Copy link
Copy Markdown

Sbenazar commented Jun 4, 2026

Built this branch and can confirm it fixes the crash on an AMD Radeon 780M (RADV, Vulkan). Before the patch, loading gemma-4-E4B-it together with its mmproj aborted in clip_n_mmproj_embd with Unknown projector type. With the GEMMA4A case added, llama-server loads and E4B audio transcribes fine. Thanks for the quick turnaround.

Comment thread tools/mtmd/clip.cpp
return ctx->model.mm_input_proj_w->ne[0];
case PROJECTOR_TYPE_GEMMA4V:
case PROJECTOR_TYPE_GEMMA4UV:
case PROJECTOR_TYPE_GEMMA4A:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not incorrect, but this used to be further down (where PROJECTOR_TYPE_GEMMA4UA now is), using hparams.projection_dim instead, don't know if there's a specific reason for that? cc/ @ngxson

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ok seems like all of them should be in the same code block here

the ctx->model.hparams.projection_dim code branch should be removed, it was an oversight from the initial PR for gemma 4 audio; well, that's the nasty thing about reviewing a large AI-generated PR

@ngxson
Copy link
Copy Markdown
Contributor

ngxson commented Jun 4, 2026

@ggml-org/maintainers can someone please give approval(s), thanks!

@ngxson ngxson merged commit e3ba22d into ggml-org:master Jun 4, 2026
24 of 25 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jun 4, 2026
* origin/master: (57 commits)
server : disable on-device spec checkpoints (ggml-org#24108)
arg: fix double mtp downloads (ggml-org#24128)
webui: [a11y] fix keyboard navigation issues in chat interface and sidebar (ggml-org#23132)
Move duplicated imatrix code into single common imatrix-loader.cpp (ggml-org#22445)
ui: Fixed packages (ggml-org#24119)
ui: added single line reasoning preview (ggml-org#23601)
return filter to save memory (ggml-org#24125)
convert: Fix Gemma 4 Unified conversion (ggml-org#24118)
ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (ggml-org#22209)
server: avoid unnecessary checkpoint restore when new tokens are present (ggml-org#24110)
agents: refactor, include more guidelines (ggml-org#24111)
webui: fix tool selector toggle/counter, key tools by stable identity (ggml-org#24065)
build : use umbrella Headers directory for XCFramework module map (ggml-org#23974)
server : add header to tools/server/server-http.h (ggml-org#24089)
cmake: skip cvector-generator and export-lora when CPU backend is disabled (ggml-org#24053)
fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091)
readme : add status badges (ggml-org#24104)
tests : refactor test-save-load-state to accept token input (ggml-org#24073)
metal : reduce rset heartbeat from 500ms -> 5ms (ggml-org#24074)
ggml-webgpu: FlashAttention refactor + standardize quantization support (ggml-org#23834)
...
TheTom pushed a commit to TheTom/llama-cpp-turboquant that referenced this pull request Jun 5, 2026
)

* mtmd: handle Gemma 4 audio projector embedding size

* rm projection_dim from clip_n_mmproj_embd

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
(cherry picked from commit e3ba22d)
TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request Jun 5, 2026
#168)

* mtmd, model: allow skip build_vit() (ggml-org#24077)

* add model

* nits

(cherry picked from commit a731805)

* mtmd: fix Gemma 4 unified FPE (ggml-org#24088)

(cherry picked from commit 94a220c)

* mtmd: enable non-causal vision for gemma 4 unified (ggml-org#24082)

(cherry picked from commit c8d6a00)

* fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091)

* mtmd: handle Gemma 4 audio projector embedding size

* rm projection_dim from clip_n_mmproj_embd

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
(cherry picked from commit e3ba22d)

* convert: Fix Gemma 4 Unified conversion (ggml-org#24118)

* Fix Gemma 4 Unified conversion

* Set audio hidden size to audio_embed_dim

(cherry picked from commit e802356)

* ggml-metal: fall back to CPU for im2col when KH*KW exceeds threadgroup limit

The Metal im2col kernel launches KH*KW threads per threadgroup (one per
kernel element). For large conv kernels — e.g. the Gemma 4 unified vision
(gemma4uv) patch embedding — KH*KW exceeds the Apple GPU 1024-thread cap and
the kernel hits a runtime GGML_ASSERT instead of producing a result.

Guard supports_op so an oversized im2col is declined; the backend scheduler
then runs that one op on CPU while the rest of the graph stays on the GPU.

Fixes Gemma 4 12B vision on the Metal backend (verified end-to-end:
loads mmproj + describes an image correctly on an M5 Max).

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Andrei <abetlen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: llama-server SIGABRT when loading Gemma 4 mmproj with audio encoder

6 participants