Skip to content

mtmd, model: add Gemma 4 "unified" variant#24077

Merged
ngxson merged 2 commits into
masterfrom
xsn/g4u
Jun 3, 2026
Merged

mtmd, model: add Gemma 4 "unified" variant#24077
ngxson merged 2 commits into
masterfrom
xsn/g4u

Conversation

@ngxson
Copy link
Copy Markdown
Contributor

@ngxson ngxson commented Jun 3, 2026

Overview

More info about this PR will be added soon

Requirements

@ngxson ngxson requested review from a team and CISC as code owners June 3, 2026 14:36
@github-actions github-actions Bot added model Model specific examples python python script changes labels Jun 3, 2026
@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented Jun 3, 2026

@CISC I need to merge this now, but can push fixes in follow-up PR if you spot any problems

@ngxson ngxson merged commit a731805 into master Jun 3, 2026
28 checks passed
@ngxson ngxson changed the title mtmd, model: allow skip build_vit() mtmd, model: add Gemma 4 "unified" variant Jun 3, 2026
@tha80
Copy link
Copy Markdown
Contributor

tha80 commented Jun 3, 2026

For those of you waiting for more information:

https://developers.googleblog.com/gemma-4-12b-the-developer-guide/

Comment thread tools/mtmd/clip.cpp
return ctx->model.mm_fc_w->ne[1];
case PROJECTOR_TYPE_LFM2A:
return ctx->model.position_embeddings->ne[0];
case PROJECTOR_TYPE_GEMMA4A:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, why is case PROJECTOR_TYPE_GEMMA4A: removed? Now loading gemma E4B will give GGML_ABORT("Unknown projector type");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, why is case PROJECTOR_TYPE_GEMMA4A: removed? Now loading gemma E4B will give GGML_ABORT("Unknown projector type");

Same issue here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pilonull it was fixed in #24091 a while after my earlier comment.

@LostRuins
Copy link
Copy Markdown
Collaborator

LostRuins added a commit to LostRuins/koboldcpp that referenced this pull request Jun 4, 2026
Comment thread tools/mtmd/clip.cpp
} break;
case PROJECTOR_TYPE_GEMMA4UV:
{
model.mm_input_proj_w = get_tensor(TN_MM_INP_PROJ);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this without weight?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already included in the name, but I think that should be refactored at some point:

#define TN_MM_INP_PROJ     "mm.input_projection.weight"

LostRuins added a commit to LostRuins/koboldcpp that referenced this pull request Jun 4, 2026
This reverts commit da0bb97.
TheTom pushed a commit to TheTom/llama-cpp-turboquant that referenced this pull request Jun 5, 2026
* add model

* nits

(cherry picked from commit a731805)
TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request Jun 5, 2026
#168)

* mtmd, model: allow skip build_vit() (ggml-org#24077)

* add model

* nits

(cherry picked from commit a731805)

* mtmd: fix Gemma 4 unified FPE (ggml-org#24088)

(cherry picked from commit 94a220c)

* mtmd: enable non-causal vision for gemma 4 unified (ggml-org#24082)

(cherry picked from commit c8d6a00)

* fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091)

* mtmd: handle Gemma 4 audio projector embedding size

* rm projection_dim from clip_n_mmproj_embd

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
(cherry picked from commit e3ba22d)

* convert: Fix Gemma 4 Unified conversion (ggml-org#24118)

* Fix Gemma 4 Unified conversion

* Set audio hidden size to audio_embed_dim

(cherry picked from commit e802356)

* ggml-metal: fall back to CPU for im2col when KH*KW exceeds threadgroup limit

The Metal im2col kernel launches KH*KW threads per threadgroup (one per
kernel element). For large conv kernels — e.g. the Gemma 4 unified vision
(gemma4uv) patch embedding — KH*KW exceeds the Apple GPU 1024-thread cap and
the kernel hits a runtime GGML_ASSERT instead of producing a result.

Guard supports_op so an oversized im2col is declined; the backend scheduler
then runs that one op on CPU while the rest of the graph stays on the GPU.

Fixes Gemma 4 12B vision on the Metal backend (verified end-to-end:
loads mmproj + describes an image correctly on an M5 Max).

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Andrei <abetlen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants