Skip to content

mtmd: port gemma4uv/gemma4ua support — fixes Gemma 4 12B vision (#163)#168

Merged
TheTom merged 6 commits into
feature/turboquant-kv-cachefrom
feat/gemma4uv
Jun 5, 2026
Merged

mtmd: port gemma4uv/gemma4ua support — fixes Gemma 4 12B vision (#163)#168
TheTom merged 6 commits into
feature/turboquant-kv-cachefrom
feat/gemma4uv

Conversation

@TheTom
Copy link
Copy Markdown
Owner

@TheTom TheTom commented Jun 5, 2026

Summary

Fixes #163. The fork could not load the new Gemma 4 12B mmproj — it errored load_hparams: unknown projector type: gemma4uv because the fork lacked the gemma4uv (unified, encoder-less vision) and gemma4ua (unified audio) projectors. The SIGFPE in the original report is the upstream symptom; on this fork the model never got that far.

This cherry-picks the upstream fix ggml-org#24077 ("mtmd, model: allow skip build_vit()", commit a731805) into the fork.

Conflicts resolved

Clean 3-way except 3 non-mtmd files (converter + vocab). Notably: normalizer_lowercase lines that appeared in the diff context are pre-existing upstream code (from an earlier commit the fork hasn't synced), not part of ggml-org#24077 — dropped them so the port only carries ggml-org#24077's actual changes (suppress_tokens, the gemma4uv/ua graphs, projector enums, skip-build_vit).

Testing — verified end to end on CUDA (GB10, sm_121)

Downloaded ggml-org/gemma-4-12B-it-GGUF (Q4_K_M + mmproj-Q8_0) and ran llama-mtmd-cli on a real photo:

$ llama-mtmd-cli -m gemma-4-12B-it-Q4_K_M.gguf --mmproj mmproj-gemma-4-12B-it-Q8_0.gguf \
    --image cat.jpg -p "Describe this image in one sentence." --jinja
> A close-up, slightly blurry shot of a tabby cat's face with large dark eyes and a pink nose.
  • mmproj loads (gemma4uv projector recognized, no SIGFPE)
  • image encodes, model describes it accurately
  • Builds clean on CUDA (sm_121) and Metal, 0 errors both
  • Text-only path unaffected

Supersedes #166 (the standalone d_head guard is included here via ggml-org#24077's clip.cpp hunk).

Credit: @guarismo for the report, upstream ggml-org#24077.

* add model

* nits

(cherry picked from commit a731805)
abetlen and others added 5 commits June 4, 2026 20:44
)

* mtmd: handle Gemma 4 audio projector embedding size

* rm projection_dim from clip_n_mmproj_embd

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
(cherry picked from commit e3ba22d)
* Fix Gemma 4 Unified conversion

* Set audio hidden size to audio_embed_dim

(cherry picked from commit e802356)
…p limit

The Metal im2col kernel launches KH*KW threads per threadgroup (one per
kernel element). For large conv kernels — e.g. the Gemma 4 unified vision
(gemma4uv) patch embedding — KH*KW exceeds the Apple GPU 1024-thread cap and
the kernel hits a runtime GGML_ASSERT instead of producing a result.

Guard supports_op so an oversized im2col is declined; the backend scheduler
then runs that one op on CPU while the rest of the graph stays on the GPU.

Fixes Gemma 4 12B vision on the Metal backend (verified end-to-end:
loads mmproj + describes an image correctly on an M5 Max).
@TheTom
Copy link
Copy Markdown
Owner Author

TheTom commented Jun 5, 2026

Updated — the branch now fully supports Gemma 4 12B vision on both backends.

Commit stack (6):

Metal needed an extra fix. The gemma4uv patch-embedding conv decomposes to im2col, and the Metal im2col kernel launches KHKW threads/threadgroup. For Gemma 4's large patch conv, KHKW exceeds the Apple GPU 1024-thread cap, so it hit a runtime GGML_ASSERT(KH*KW <= max_threads) (this assert exists upstream too — no upstream fix). The guard makes Metal decline the oversized im2col so the scheduler runs that one op on CPU; rest of the graph stays on GPU.

Verified end-to-end on both:

  • CUDA (GB10, sm_121): loads mmproj + describes a cat image correctly
  • Metal (M5 Max): same, EXIT 0, accurate description (CLIP graph uses unsupported operators warning confirms the im2col→CPU split)

Builds clean on both backends, text-only path unaffected.

@TheTom TheTom merged commit a62320c into feature/turboquant-kv-cache Jun 5, 2026
32 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: gemma-4-12B-it-GGUF crashes with Floating point exception

4 participants