mtmd : build_vit GQA support#23782
Merged
Merged
Conversation
ngxson
reviewed
May 27, 2026
Contributor
ngxson
left a comment
There was a problem hiding this comment.
code change looks good, but AI-generated comments look excessive
I don't get why we have to add 4 lines of comment just to explain a trivial line of code
ngxson
reviewed
May 27, 2026
| // "clip.vision.attention.head_count_kv" (see KEY_N_HEAD_KV). It is 0 for | ||
| // every existing build_vit caller (those GGUFs do not set the key), so the | ||
| // fallback to n_head preserves MHA behaviour by default. | ||
| const int n_kv_head = hparams.n_head_kv > 0 ? hparams.n_head_kv : n_head; |
Contributor
There was a problem hiding this comment.
the best way is to default n_head_kv to n_head then allow the loader to override it with gguf value. that's the exact behavior of n_head_kv inside llama.cpp
Contributor
Author
There was a problem hiding this comment.
Sorry about AI generated comment.
I have now tidied it up.
n_head_kv defaults to n_head in load_hparams as you suggested.
removed AI-generated comment
ngxson
approved these changes
May 28, 2026
Contributor
|
@ggml-org/maintainers can I have the 2nd approval? thanks! |
CISC
approved these changes
May 28, 2026
adrianhoehne
pushed a commit
to adrianhoehne/llama.cpp
that referenced
this pull request
May 28, 2026
removed AI-generated comment
gabe-l-hart
added a commit
to gabe-l-hart/llama.cpp
that referenced
this pull request
May 28, 2026
* origin/master: (32 commits) hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (ggml-org#23835) mtmd-debug: add color and rainbow mode (ggml-org#23829) mtmd: fix gemma 4 projector pre_norm (ggml-org#23822) opencl: move backend info printing into its own function (ggml-org#23702) ci : run ui publish on ubuntu-slim (ggml-org#23818) ui: fix audio and video modality detection (ggml-org#23756) ci : releases use Github-hosted builds for the UI (ggml-org#23823) app : improve help output (ggml-org#23805) mtmd: n_head_kv defaults to n_head (ggml-org#23782) mtmd: fix gemma 4 audio rms norm eps (ggml-org#23815) ci : change Vulkan builds to Release to reduce ccache (ggml-org#23820) arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (ggml-org#23167) test-llama-archs: fix table format [no release] (ggml-org#23810) ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (ggml-org#23729) CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ggml-org#23227) server: minor tweaks to use more cpp features (ggml-org#23785) hexagon: minor refresh for HMX FA and MM (ggml-org#23796) vulkan: fast path for walsh-hadamard transform (ggml-org#23687) chat : add Granite 4.1 chat template (ggml-org#23518) ...
fewtarius
pushed a commit
to fewtarius/llama.cpp
that referenced
this pull request
May 30, 2026
removed AI-generated comment
turbo-tan
pushed a commit
to turbo-tan/llama.cpp-tq3
that referenced
this pull request
Jun 2, 2026
removed AI-generated comment
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR adds GQA support to the shared
build_vithelper.build_vitnow readshparams.n_head_kvand uses it for the K/V reshape. Ifn_head_kvis unset or zero, it falls back ton_head, preserving the existing MHA behavior.Additional information
This closes a gap for GQA ViT models. Any GGUF that sets
clip.vision.attention.head_count_kvcan now work through the sharedbuild_vitpath instead of needing model-specific handling.Existing callers are not affected. Current GGUFs that use
build_vitdo not setclip.vision.attention.head_count_kv, son_head_kvdefaults ton_headand the generated graph remains unchanged for those models.A safety guard was added for the fused-QKV branch: it asserts that
n_head_kv == n_head. That layout cannot represent GQA without splitting the fused tensor at a non-n_embdboundary.Requirements