Skip to content

fix(mtmd): floating point exception in new Gemma 4 12B model#24088

Merged
ngxson merged 1 commit into
ggml-org:masterfrom
abetlen:fix/gemma4-fpe-exception
Jun 3, 2026
Merged

fix(mtmd): floating point exception in new Gemma 4 12B model#24088
ngxson merged 1 commit into
ggml-org:masterfrom
abetlen:fix/gemma4-fpe-exception

Conversation

@abetlen
Copy link
Copy Markdown
Collaborator

@abetlen abetlen commented Jun 3, 2026

Overview

The new Gemma 4 12B model does not use an encoder and therefore n_head and d_head are 0. This causes a floating point exception when kq_scale and d_head are calculated with a 0 denominator.

Closes #24085

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes, gpt5.5 xhigh in codex to diagnose the initial floating point exception

@abetlen abetlen requested a review from a team as a code owner June 3, 2026 19:26
@abetlen abetlen changed the title mtmd: fix floating point exception in new Gemma 4 12B model fix(mtmd): floating point exception in new Gemma 4 12B model Jun 3, 2026
@ngxson
Copy link
Copy Markdown
Contributor

ngxson commented Jun 3, 2026

nice, thanks! cc @ggml-org/maintainers for the 2nd approval

@ngxson ngxson merged commit 94a220c into ggml-org:master Jun 3, 2026
25 checks passed
@eminence
Copy link
Copy Markdown

eminence commented Jun 3, 2026

Less than 90 minutes from bug report to bug fix? Awesome job everyone! 🎉

TheTom pushed a commit to TheTom/llama-cpp-turboquant that referenced this pull request Jun 5, 2026
TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request Jun 5, 2026
#168)

* mtmd, model: allow skip build_vit() (ggml-org#24077)

* add model

* nits

(cherry picked from commit a731805)

* mtmd: fix Gemma 4 unified FPE (ggml-org#24088)

(cherry picked from commit 94a220c)

* mtmd: enable non-causal vision for gemma 4 unified (ggml-org#24082)

(cherry picked from commit c8d6a00)

* fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091)

* mtmd: handle Gemma 4 audio projector embedding size

* rm projection_dim from clip_n_mmproj_embd

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
(cherry picked from commit e3ba22d)

* convert: Fix Gemma 4 Unified conversion (ggml-org#24118)

* Fix Gemma 4 Unified conversion

* Set audio hidden size to audio_embed_dim

(cherry picked from commit e802356)

* ggml-metal: fall back to CPU for im2col when KH*KW exceeds threadgroup limit

The Metal im2col kernel launches KH*KW threads per threadgroup (one per
kernel element). For large conv kernels — e.g. the Gemma 4 unified vision
(gemma4uv) patch embedding — KH*KW exceeds the Apple GPU 1024-thread cap and
the kernel hits a runtime GGML_ASSERT instead of producing a result.

Guard supports_op so an oversized im2col is declined; the backend scheduler
then runs that one op on CPU while the rest of the graph stays on the GPU.

Fixes Gemma 4 12B vision on the Metal backend (verified end-to-end:
loads mmproj + describes an image correctly on an M5 Max).

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Andrei <abetlen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: gemma-4-12B-it-GGUF crashes with Floating point exception

4 participants