mtmd : build_vit GQA support by sfallah · Pull Request #23782 · ggml-org/llama.cpp

sfallah · 2026-05-27T14:55:48Z

Overview

This PR adds GQA support to the shared build_vit helper.

build_vit now reads hparams.n_head_kv and uses it for the K/V reshape. If n_head_kv is unset or zero, it falls back to n_head, preserving the existing MHA behavior.

Additional information

This closes a gap for GQA ViT models. Any GGUF that sets clip.vision.attention.head_count_kv can now work through the shared build_vit path instead of needing model-specific handling.

Existing callers are not affected. Current GGUFs that use build_vit do not set clip.vision.attention.head_count_kv, so n_head_kv defaults to n_head and the generated graph remains unchanged for those models.

A safety guard was added for the fused-QKV branch: it asserts that n_head_kv == n_head. That layout cannot represent GQA without splitting the fused tensor at a non-n_embd boundary.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES - AI assistance was used to draft the code changes under my direction, and to review the PR description. I reviewed and tested every change, and I take responsibility for the full contents of this PR.

ngxson

code change looks good, but AI-generated comments look excessive

I don't get why we have to add 4 lines of comment just to explain a trivial line of code

ngxson · 2026-05-27T15:40:40Z

+    // "clip.vision.attention.head_count_kv" (see KEY_N_HEAD_KV). It is 0 for
+    // every existing build_vit caller (those GGUFs do not set the key), so the
+    // fallback to n_head preserves MHA behaviour by default.
+    const int n_kv_head = hparams.n_head_kv > 0 ? hparams.n_head_kv : n_head;


the best way is to default n_head_kv to n_head then allow the loader to override it with gguf value. that's the exact behavior of n_head_kv inside llama.cpp

Sorry about AI generated comment.
I have now tidied it up.
n_head_kv defaults to n_head in load_hparams as you suggested.

removed AI-generated comment

ngxson · 2026-05-28T09:46:15Z

@ggml-org/maintainers can I have the 2nd approval? thanks!

removed AI-generated comment

* origin/master: (32 commits) hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (ggml-org#23835) mtmd-debug: add color and rainbow mode (ggml-org#23829) mtmd: fix gemma 4 projector pre_norm (ggml-org#23822) opencl: move backend info printing into its own function (ggml-org#23702) ci : run ui publish on ubuntu-slim (ggml-org#23818) ui: fix audio and video modality detection (ggml-org#23756) ci : releases use Github-hosted builds for the UI (ggml-org#23823) app : improve help output (ggml-org#23805) mtmd: n_head_kv defaults to n_head (ggml-org#23782) mtmd: fix gemma 4 audio rms norm eps (ggml-org#23815) ci : change Vulkan builds to Release to reduce ccache (ggml-org#23820) arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (ggml-org#23167) test-llama-archs: fix table format [no release] (ggml-org#23810) ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (ggml-org#23729) CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ggml-org#23227) server: minor tweaks to use more cpp features (ggml-org#23785) hexagon: minor refresh for HMX FA and MM (ggml-org#23796) vulkan: fast path for walsh-hadamard transform (ggml-org#23687) chat : add Granite 4.1 chat template (ggml-org#23518) ...

removed AI-generated comment

github-actions Bot added the examples label May 27, 2026

sfallah mentioned this pull request May 27, 2026

mtmd: Add DeepSeekOCR 2 Support #20975

Merged

ngxson reviewed May 27, 2026

View reviewed changes

n_head_kv defaults to n_head

f88d2dc

removed AI-generated comment

sfallah force-pushed the sf/clip-vit-gqa branch from f061928 to f88d2dc Compare May 28, 2026 07:47

sfallah marked this pull request as ready for review May 28, 2026 07:51

sfallah requested a review from a team as a code owner May 28, 2026 07:51

sfallah requested a review from ngxson May 28, 2026 07:57

ngxson approved these changes May 28, 2026

View reviewed changes

CISC approved these changes May 28, 2026

View reviewed changes

ngxson merged commit 0b56d28 into ggml-org:master May 28, 2026
26 of 27 checks passed

adrianhoehne pushed a commit to adrianhoehne/llama.cpp that referenced this pull request May 28, 2026

mtmd: n_head_kv defaults to n_head (ggml-org#23782)

d43e708

removed AI-generated comment

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

mtmd: n_head_kv defaults to n_head (ggml-org#23782)

f19719c

removed AI-generated comment

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026

mtmd: n_head_kv defaults to n_head (ggml-org#23782)

c0bc514

removed AI-generated comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd : build_vit GQA support#23782

mtmd : build_vit GQA support#23782
ngxson merged 1 commit into
ggml-org:masterfrom
sfallah:sf/clip-vit-gqa

sfallah commented May 27, 2026 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

ngxson May 27, 2026

Uh oh!

sfallah May 28, 2026

Uh oh!

ngxson commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sfallah commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson May 27, 2026

Choose a reason for hiding this comment

Uh oh!

sfallah May 28, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sfallah commented May 27, 2026 •

edited

Loading