fix(mtmd): handle Gemma 4 audio projector embedding size#24091
Conversation
|
can confirm, that this simple one-line PR fixes the regression for me. 👍 |
|
Built this branch and can confirm it fixes the crash on an AMD Radeon 780M (RADV, Vulkan). Before the patch, loading gemma-4-E4B-it together with its mmproj aborted in clip_n_mmproj_embd with Unknown projector type. With the GEMMA4A case added, llama-server loads and E4B audio transcribes fine. Thanks for the quick turnaround. |
| return ctx->model.mm_input_proj_w->ne[0]; | ||
| case PROJECTOR_TYPE_GEMMA4V: | ||
| case PROJECTOR_TYPE_GEMMA4UV: | ||
| case PROJECTOR_TYPE_GEMMA4A: |
There was a problem hiding this comment.
Not incorrect, but this used to be further down (where PROJECTOR_TYPE_GEMMA4UA now is), using hparams.projection_dim instead, don't know if there's a specific reason for that? cc/ @ngxson
There was a problem hiding this comment.
hmm ok seems like all of them should be in the same code block here
the ctx->model.hparams.projection_dim code branch should be removed, it was an oversight from the initial PR for gemma 4 audio; well, that's the nasty thing about reviewing a large AI-generated PR
|
@ggml-org/maintainers can someone please give approval(s), thanks! |
* origin/master: (57 commits) server : disable on-device spec checkpoints (ggml-org#24108) arg: fix double mtp downloads (ggml-org#24128) webui: [a11y] fix keyboard navigation issues in chat interface and sidebar (ggml-org#23132) Move duplicated imatrix code into single common imatrix-loader.cpp (ggml-org#22445) ui: Fixed packages (ggml-org#24119) ui: added single line reasoning preview (ggml-org#23601) return filter to save memory (ggml-org#24125) convert: Fix Gemma 4 Unified conversion (ggml-org#24118) ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (ggml-org#22209) server: avoid unnecessary checkpoint restore when new tokens are present (ggml-org#24110) agents: refactor, include more guidelines (ggml-org#24111) webui: fix tool selector toggle/counter, key tools by stable identity (ggml-org#24065) build : use umbrella Headers directory for XCFramework module map (ggml-org#23974) server : add header to tools/server/server-http.h (ggml-org#24089) cmake: skip cvector-generator and export-lora when CPU backend is disabled (ggml-org#24053) fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091) readme : add status badges (ggml-org#24104) tests : refactor test-save-load-state to accept token input (ggml-org#24073) metal : reduce rset heartbeat from 500ms -> 5ms (ggml-org#24074) ggml-webgpu: FlashAttention refactor + standardize quantization support (ggml-org#23834) ...
) * mtmd: handle Gemma 4 audio projector embedding size * rm projection_dim from clip_n_mmproj_embd --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> (cherry picked from commit e3ba22d)
#168) * mtmd, model: allow skip build_vit() (ggml-org#24077) * add model * nits (cherry picked from commit a731805) * mtmd: fix Gemma 4 unified FPE (ggml-org#24088) (cherry picked from commit 94a220c) * mtmd: enable non-causal vision for gemma 4 unified (ggml-org#24082) (cherry picked from commit c8d6a00) * fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091) * mtmd: handle Gemma 4 audio projector embedding size * rm projection_dim from clip_n_mmproj_embd --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> (cherry picked from commit e3ba22d) * convert: Fix Gemma 4 Unified conversion (ggml-org#24118) * Fix Gemma 4 Unified conversion * Set audio hidden size to audio_embed_dim (cherry picked from commit e802356) * ggml-metal: fall back to CPU for im2col when KH*KW exceeds threadgroup limit The Metal im2col kernel launches KH*KW threads per threadgroup (one per kernel element). For large conv kernels — e.g. the Gemma 4 unified vision (gemma4uv) patch embedding — KH*KW exceeds the Apple GPU 1024-thread cap and the kernel hits a runtime GGML_ASSERT instead of producing a result. Guard supports_op so an oversized im2col is declined; the backend scheduler then runs that one op on CPU while the rest of the graph stays on the GPU. Fixes Gemma 4 12B vision on the Metal backend (verified end-to-end: loads mmproj + describes an image correctly on an M5 Max). --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Andrei <abetlen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Overview
PR #24077 seems to have introduced a regression that broke model loading of the 4B Gemma 4 model due to it's audio embedding projector.
Closes #24084
All credit to @MrSimonC here just confirming the fix works and providing the patch here.
Requirements