model: Granite4 Vision by gabe-l-hart · Pull Request #23545 · ggml-org/llama.cpp

gabe-l-hart · 2026-05-22T20:39:13Z

Overview

This PR adds support for the Granite4VisionForConditionalGeneration mtmd architecture. It specifically targets the following models:

Additional information

The Granite4Vision models leverage several key architectural patterns that have not been previously supported:

Deepstack + Spatial projectors injected at non-contiguous points in the LLM layer stack
Llava-next encoder / assembler with learned newline token

Because of these two architectural patterns, this PR makes several key architectural shifts in the project:

Arch Changes in `libllama`

llama_hparams.n_deepstack_layers -> llama_hparams.deepstack_layers_arr
- This allows G4V to inject projector outputs at specific LLM layers
- Backwards-compatibility is maintained for existing models using n_deepstack_layers, specifically the Qwen3VL family, by loading deepstack_layers_arr as either a single-valued number or a multi-valued array
- Open Question: This type of try/catch backwards compatibility is not something I've seen elsewhere, so want to see whether this is a strong enough anti-pattern that I should instead just use a net-new hparam with overlapping meaning.

Arch Changes in `mtmd`

~~Introduced a new class hierarchy for clip_assembler in clip-graph.h that parallels the clip_graph factory pattern~~
- ~~This class hierarchy will support models that have graph operations that need to happen after the individual image tiles have been encoded (eg llava-next style with learned newlines)~~
Introduced clip_image_f32.append_token field that can be used by individual graphs to determine how to handle injecting learned newlines for each image tile.
~~New public methods in clip.h to support the model-agnostic assembler logic~~
- ~~clip_image_assemble: This is the factory function for using the clip_assembler hierarchy to perform assembly~~
- ~~clip_n_assembled_output_tokens: This allows model-specific logic for counting output tokens based on how the assembly will work~~
~~New hparam section for values that will be explicitly shared between an LLM and its MMPROJ~~
- ~~This is needed to bind the embedding_scale value to both the LLM and the MMPROJ so that the base stream can be pre-scaled to invert the embedding scaling that happens in the LLM~~
^ not needed anymore given skip logic for embedding scale w/ input embeddings
clip_hparams.vision_feature_layer changed from an unordered_set to a vector to support a strict ordering and duplicate values. The ordering will map to the order of the projectors, multiple of which will pull from the same vision layer.
Hoist QFormer tensors in clip_model into a qf_block struct and hold a vector of them in clip_model
- This maintains backwards compatibility for Granite Speech which uses a vector of sized 1 while allowing G4V to support multiple blocks

Open Questions

Before merging, I want to address the following open questions:

~~Maintainer alignment on introduction of clip_assembler paradigm~~ Removed in favor of clip_image_f32.append_token
~~Maintainer alignment on hparam try/catch single vs multi value parsing paradigm~~
Mathematical alignment with alternate implementations (transformers and pure Claude implementation, see AI usage disclosure)
Is there a cleaner way to skip the f_embedding_scale in llama-graph.cpp if (and only if) the input embeddings have valid image embeddings that doesn't require multimodal knowledge to leak into the core library?

Requirements

I have read and agree with the contributing guidelines: YES
AI usage disclosure: YES

AI Usage Disclosure

AI was used a lot in the creation of this PR! That said, the bulk of the work was actually meshing the AI's efforts into the existing architecture in a way that caused the least possible friction. I've annotated each commit with an AI-usage line (see stats below). There were two key ways that AI was used:

Granite Vision teammate @EliSchwartz built a working version of this branch entirely using Claude Code (here). This was heavily used as a reference implementation to check the implementation here that was more closely aligned with project patterns. Sections of this were referenced/copied verbatim (see commits with Co-authored-by).
Various agent/model combinations were used to assist in design/refactor throughout the branch

NOTE: I also failed with AI a bunch of times. Most agent/model combos couldn't handle the complexity of the architectural merger between G4V's architecture quirks and the various components of mtmd.

git-ai-stats

╔══════════════════════════════════════════════════════════╗
║ GIT AI USAGE ANALYSIS ║
╚══════════════════════════════════════════════════════════╝

📊 COMMITS BY AGENT

--- Aggregate ---
Commits | Count

none | 53
OpenCode + qwen3.5:122b | 5
Claude Code + Opus 4.7 | 4
IBM Bob | 1
Claude Code, IBM Bob | 1
OpenCode + Qwen 3.6-35B | 1
Claude Code | 1

TOTAL | 66

📊 COMMITS BY USAGE TYPE

--- Aggregate ---
Commits | Count

none | 53
draft | 6
full | 7

TOTAL | 66

📈 LINES OF CODE BY AGENT

--- Aggregate ---
Agent | Commits | Additions | Deletions

none | 53 | 1355 | 886
OpenCode + qwen3.5:122b | 5 | 50 | 3
Claude Code + Opus 4.7 | 4 | 463 | 296
IBM Bob | 1 | 13 | 0
Claude Code, IBM Bob | 1 | 600 | 7
OpenCode + Qwen 3.6-35B | 1 | 4 | 0
Claude Code | 1 | 30 | 0

TOTAL | 66 | 2515 | 1192

📈 LINES OF CODE BY USAGE TYPE

--- Aggregate ---
Usage Type | Commits | Additions | Deletions

none | 53 | 1355 | 886
draft | 6 | 802 | 27
full | 7 | 358 | 279

TOTAL | 66 | 2515 | 1192

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

…ybrid Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

There are several awkward things here: 1. Most of these are essentially identical to the audio qformer tensors. On the c++ side, that's mapped using the prefix, so the rest of the GGUF name needs to align, but on the python side there's no prefix notion, so they all get duplicated. 2. There are a couple of net-new tensors for vision, in particular PROJ_NORM. In both speech and vision, the QF_PROJ_NORM is qualified as belonging to the qformer portion, but the GGUF name is simply proj_norm which conflicts with the ideal name for this new PROJ_NORM that is not qualified as part of the qformer. To get around this, I used "proj_layernorm" as the GGUF name. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

NOTE: Usage of these hasn't been updated to include prefix yet Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

We need to preserve the ordering of these feature index values so that they can be mapped to the sub-tensors within the stacked projectors. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: full (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

This handles stacking the projector tensors and setting the new harams Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

…ack layer arr Branch: Granite4Vision AI-usage: draft (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: full (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

This defaults to False, but allows a user to enable it programmaticly instead of using the interactive prompt. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: full (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

…e block This is cleaner than stacking them. The modeling file hard-codes single-layer qformers, so we can punt on the multiipule multi-layer projectors problem. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: draft (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

New hparams: - KEY_PROJ_SAMPLE_QUERY_SIDE - KEY_PROJ_SAMPLE_WINDOW_SIDE - KEY_PROJ_SPATIAL_OFFSETS New tensors: - TN_MULTI_PROJ_IMG_POS - TN_MULTI_PROJ_QUERY - TN_MULTI_PROJ_LAYERNORM - TN_MULTI_PROJ_LINEAR - TN_MULTI_PROJ_NORM Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

This appears to have been added during Qwen3 VL (ggml-org#16780), but it was never actually used. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

The old logic hard coded a correspondence between the first N layers of the LLM and the 1->N entries in the input embeddings. Now, that relationship is maintained at loading time if the GGUF value is single-valued. If it is multi-valued, it loads directly allowing for deepstack layers to be spaced out throughout the model. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

The alternative would be to use get_key_or_arr, but then the single value would be populated through the entire array and we'd need to detect that and update it with the right correspondence. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

The use of ggml_add here assumes that the elements of inp_embd will be pre- arranged to be the full embedding length with only the vision-mask'ed portions non-zero from the projector. This matches how Qwen3VL does it. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: full (OpenCode + Qwen 3.6-35B) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

…ulti-proj Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

yikes! Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

…ld_attn Branch: Granite4Vision AI-usage: full (Bob, OpenCode + Qwen3.6-35b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* origin/master: (32 commits) hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (ggml-org#23835) mtmd-debug: add color and rainbow mode (ggml-org#23829) mtmd: fix gemma 4 projector pre_norm (ggml-org#23822) opencl: move backend info printing into its own function (ggml-org#23702) ci : run ui publish on ubuntu-slim (ggml-org#23818) ui: fix audio and video modality detection (ggml-org#23756) ci : releases use Github-hosted builds for the UI (ggml-org#23823) app : improve help output (ggml-org#23805) mtmd: n_head_kv defaults to n_head (ggml-org#23782) mtmd: fix gemma 4 audio rms norm eps (ggml-org#23815) ci : change Vulkan builds to Release to reduce ccache (ggml-org#23820) arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (ggml-org#23167) test-llama-archs: fix table format [no release] (ggml-org#23810) ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (ggml-org#23729) CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ggml-org#23227) server: minor tweaks to use more cpp features (ggml-org#23785) hexagon: minor refresh for HMX FA and MM (ggml-org#23796) vulkan: fast path for walsh-hadamard transform (ggml-org#23687) chat : add Granite 4.1 chat template (ggml-org#23518) ...

* origin/master: vocab : support tokenizer for LFM2.5-8B-A1B (ggml-org#23826) graph : ensure DS32 kq_mask_lid is F32 (ggml-org#23864) server: remove obsolete scripts (ggml-org#23870) ci : update macos release to use macos-26 runner (ggml-org#23878) download: add option to skip_download (ggml-org#23059) mtmd: Add DeepSeekOCR 2 Support (ggml-org#20975) CUDA: Check PTX version on host side to guard PDL dispatch (ggml-org#23530) server: bump timeout to 3600s (ggml-org#23842) model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (ggml-org#23346) llama: use f16 mask for FA to save VRAM (ggml-org#23764) sync : ggml ggml : bump version to 0.13.1 (ggml/1523) ngram-mod : Add missing include (ggml-org#23857) llama: add llm_graph_input_mtp (ggml-org#23643) app : move licences to llama-app (ggml-org#23824) cuda : disables launch_fattn PDL enrollment due to compiler bug (ggml-org#23825) meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear (ggml-org#23480)

This was inherited from the Claude Code implementation that pushed the negative index inversion down into the model file. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

face. palm. :( Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* origin/master: server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884) ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879) ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760) server-bench : add speed-bench for speculative decoding benchmarking (ggml-org#23869) app: add llama update self updater (ggml-org#23865) ui: handle audio/vnd.wave as audio WAV file (ggml-org#23754)

gabe-l-hart · 2026-05-29T23:38:50Z

Ok, I think this is fully ready @ngxson. I found my two bugs that were causing the mathematical delta with Eli's version, so I've got exact matching output now!

* origin/master: (36 commits) vendor : update cpp-httplib to 0.46.1 (ggml-org#23980) llama: limit max outputs of `llama_context` (ggml-org#23861) metal: template GLU kernels to support f16/f32 (ggml-org#23882) vulkan: don't hold the device mutex while compiling pipelines (ggml-org#23641) vulkan: reduce host memory lock contention (ggml-org#23376) vocab: add normalizer.lowercase support to WPM (ggml-org#23899) TP: quantized KV cache support (ggml-org#23792) security : disable private disclosures (ggml-org#23963) model: Add EXAONE 4.5 implementations (ggml-org#21733) vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints (ggml-org#23056) vulkan: Removed unused functions (ggml-org#23175) common : support manually triggering the reasoning budget end sequence (ggml-org#23949) ci : add missing Linux label to cpu-x64-high-perf runner (ggml-org#23958) [SYCL] Support Q4_1, Q5_0, Q5_1 in Flash-attention (ggml-org#23812) [SYCL] Add more types in GET_ROWS OP (ggml-org#23710) sycl : Optimize Q3_K mul_mat by reorder (ggml-org#23725) ci: remove redundant or duplicate jobs (ggml-org#23927) server : handle If-None-Match weak ETags (ggml-org#23916) ci : limit trigger paths for the CPU workflow (ggml-org#23938) vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756) ...

gabe-l-hart · 2026-06-02T04:12:51Z

I think these test failures look unrelated since they're in test-backend-ops and this PR doesn't touch any kernels.

* origin/master: (57 commits) server : disable on-device spec checkpoints (ggml-org#24108) arg: fix double mtp downloads (ggml-org#24128) webui: [a11y] fix keyboard navigation issues in chat interface and sidebar (ggml-org#23132) Move duplicated imatrix code into single common imatrix-loader.cpp (ggml-org#22445) ui: Fixed packages (ggml-org#24119) ui: added single line reasoning preview (ggml-org#23601) return filter to save memory (ggml-org#24125) convert: Fix Gemma 4 Unified conversion (ggml-org#24118) ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (ggml-org#22209) server: avoid unnecessary checkpoint restore when new tokens are present (ggml-org#24110) agents: refactor, include more guidelines (ggml-org#24111) webui: fix tool selector toggle/counter, key tools by stable identity (ggml-org#24065) build : use umbrella Headers directory for XCFramework module map (ggml-org#23974) server : add header to tools/server/server-http.h (ggml-org#24089) cmake: skip cvector-generator and export-lora when CPU backend is disabled (ggml-org#24053) fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091) readme : add status badges (ggml-org#24104) tests : refactor test-save-load-state to accept token input (ggml-org#24073) metal : reduce rset heartbeat from 500ms -> 5ms (ggml-org#24074) ggml-webgpu: FlashAttention refactor + standardize quantization support (ggml-org#23834) ...

gabe-l-hart · 2026-06-04T19:41:32Z

@ngxson Gentle nudge. This PR should be ready for final review now.

AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

ngxson · 2026-06-04T21:39:40Z

+    std::unordered_set<uint32_t> unique_deepstack_idxs;
+    for (const auto val : hparams.deepstack_mapping_arr) {
+        if (val >= 0) {
+            unique_deepstack_idxs.insert(val);


may worth checking upper bound for val too

Actually, this is just counting the number of unique values, so I'm not sure this is the right place to guard against malicious values. That should probably be right above while loading (maybe just an assertion that the values are within the right range)

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

NOTE: format_string is not available in granite.cpp (and including clip-impl.h to get it doesn't compile, so I think it violates the intended encapsulation), so std::stringstream is the simplest answer. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

gabe-l-hart · 2026-06-05T15:46:37Z

Thanks for all the review help @ngxson !

gabe-l-hart added 30 commits May 13, 2026 12:46

feat(convert): Get language model conversion working for 4.1 vision

12750a7

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat(convert): Skip multimodal tensors for GraniteMoeHybrid (vision 4.0)

47dd3b1

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: Disable vocab padding for non-hybrid models that use GraniteMoeH…

b3a6914

…ybrid Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat: Add python side architecture name

5b23f80

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat: Add python-side plumbing for setting FEATURE_LAYERS hparam

a176cbf

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat: Add c++ side tensor naming defines

79412a4

NOTE: Usage of these hasn't been updated to include prefix yet Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat(mtmd): Add architecture label plumbing

7dda78f

Branch: Granite4Vision AI-usage: full (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat(wip): Add partial conversion for mmproj

5e6184f

This handles stacking the projector tensors and setting the new harams Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat: Add gguf_writer and constant support for new hparams and deepst…

97600c7

…ack layer arr Branch: Granite4Vision AI-usage: draft (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat: Full conversion for mmproj w/ tensor mappings

f6d1975

Branch: Granite4Vision AI-usage: full (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: Add lm_head skip for mmproj for 4.0

97e612a

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: De-alias text_config architecture in convert_lora_to_gguf.py

2a969d3

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat: Add --trust-remote-code arg to convert_lora_to_gguf.py

2332686

This defaults to False, but allows a user to enable it programmaticly instead of using the interactive prompt. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: De-alias model.language_model. -> model. for lora adapters

fc31cca

Branch: Granite4Vision AI-usage: full (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: Extend language model tensor dealiasing in adapters

0b03ada

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: Remove unnecessary registration for GraniteSpeech in language model

fb6075b

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat: Plumb through mm prefix formatting for qformer tensors

8e4c0b5

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat: Add spatial offests array hparam conversion

14fd2cc

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

feat: Add stub plumbing for granite vision in mtmd

0feeb29

Branch: Granite4Vision AI-usage: draft (OpenCode + qwen3.5:122b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: Move deepstack_layer_arr to llm hparam instead of mmproj

234973d

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: Remove IS_DEEPSTACK_LAYERS

cb05a27

This appears to have been added during Qwen3 VL (ggml-org#16780), but it was never actually used. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: add missing vision attn layernorm eps

acf0e98

Branch: Granite4Vision AI-usage: full (OpenCode + Qwen 3.6-35B) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

refactor: Hoist qformer tensors into qf_block and hold a vector for m…

5ce4b81

…ulti-proj Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

gabe-l-hart added 8 commits May 28, 2026 16:51

fix: Remove unused get_model api

54546ff

yikes! Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

refactor: Rearrange helpers for g4v to be private members and use bui…

70c2302

…ld_attn Branch: Granite4Vision AI-usage: full (Bob, OpenCode + Qwen3.6-35b) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: Fix off-by-one in vision layer index

ecb247b

This was inherited from the Claude Code implementation that pushed the negative index inversion down into the model file. Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix: Fix norm/post_norm mixup in conversion

d8d37df

face. palm. :( Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

style: More descriptive tensor names

255f934

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

planetf1 mentioned this pull request Jun 4, 2026

Blog: Getting structured data out of images with Granite Vision 4.1 generative-computing/mellea-website#48

Draft

4 tasks

CISC reviewed Jun 4, 2026

View reviewed changes

Comment thread conversion/granite.py Outdated

Comment thread convert_lora_to_gguf.py Outdated

fix: Apply PR cleanup for new conversion changes

9eb1762

AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

gabe-l-hart force-pushed the Granite4Vision branch from 401dc66 to 9eb1762 Compare June 4, 2026 20:00

CISC approved these changes Jun 4, 2026

View reviewed changes

fix(convert): Remove duplicate V_ENC_EMBD_IMGNL

c5afa80

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

gabe-l-hart force-pushed the Granite4Vision branch from 74feacf to c5afa80 Compare June 4, 2026 21:38

ngxson reviewed Jun 4, 2026

View reviewed changes

gabe-l-hart added 3 commits June 4, 2026 16:22

refactor: append_token -> add_newline

c12a262

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

style: Comment cleanup

d3d5a08

Branch: Granite4Vision AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

ngxson approved these changes Jun 4, 2026

View reviewed changes

ngxson merged commit 64086f2 into ggml-org:master Jun 5, 2026
25 of 37 checks passed

gabe-l-hart deleted the Granite4Vision branch June 5, 2026 15:46

ngxson mentioned this pull request Jun 5, 2026

model: fix build failed #24193

Merged

simlay mentioned this pull request Jun 5, 2026

Compile bug: invalid use of non-static member function ‘uint32_t llama_hparams::n_layer() const #24194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model: Granite4 Vision#23545

model: Granite4 Vision#23545
ngxson merged 103 commits into
ggml-org:masterfrom
gabe-l-hart:Granite4Vision

gabe-l-hart commented May 22, 2026 •

edited

Loading

Uh oh!

gabe-l-hart commented May 29, 2026

Uh oh!

gabe-l-hart commented Jun 2, 2026

Uh oh!

gabe-l-hart commented Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson Jun 4, 2026

Uh oh!

gabe-l-hart Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gabe-l-hart commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gabe-l-hart commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Arch Changes in libllama

Arch Changes in mtmd

Open Questions

Requirements

AI Usage Disclosure

--- Aggregate --- Commits | Count

none | 53 OpenCode + qwen3.5:122b | 5 Claude Code + Opus 4.7 | 4 IBM Bob | 1 Claude Code, IBM Bob | 1 OpenCode + Qwen 3.6-35B | 1 Claude Code | 1

--- Aggregate --- Commits | Count

none | 53 draft | 6 full | 7

--- Aggregate --- Agent | Commits | Additions | Deletions

none | 53 | 1355 | 886 OpenCode + qwen3.5:122b | 5 | 50 | 3 Claude Code + Opus 4.7 | 4 | 463 | 296 IBM Bob | 1 | 13 | 0 Claude Code, IBM Bob | 1 | 600 | 7 OpenCode + Qwen 3.6-35B | 1 | 4 | 0 Claude Code | 1 | 30 | 0

--- Aggregate --- Usage Type | Commits | Additions | Deletions

none | 53 | 1355 | 886 draft | 6 | 802 | 27 full | 7 | 358 | 279

Uh oh!

gabe-l-hart commented May 29, 2026

Uh oh!

gabe-l-hart commented Jun 2, 2026

Uh oh!

gabe-l-hart commented Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gabe-l-hart Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gabe-l-hart commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gabe-l-hart commented May 22, 2026 •

edited

Loading

Arch Changes in `libllama`

Arch Changes in `mtmd`

--- Aggregate ---
Commits | Count

none | 53
OpenCode + qwen3.5:122b | 5
Claude Code + Opus 4.7 | 4
IBM Bob | 1
Claude Code, IBM Bob | 1
OpenCode + Qwen 3.6-35B | 1
Claude Code | 1

--- Aggregate ---
Commits | Count

none | 53
draft | 6
full | 7

--- Aggregate ---
Agent | Commits | Additions | Deletions

none | 53 | 1355 | 886
OpenCode + qwen3.5:122b | 5 | 50 | 3
Claude Code + Opus 4.7 | 4 | 463 | 296
IBM Bob | 1 | 13 | 0
Claude Code, IBM Bob | 1 | 600 | 7
OpenCode + Qwen 3.6-35B | 1 | 4 | 0
Claude Code | 1 | 30 | 0

--- Aggregate ---
Usage Type | Commits | Additions | Deletions

none | 53 | 1355 | 886
draft | 6 | 802 | 27
full | 7 | 358 | 279