mtmd: add mtmd_image_tokens_get_decoder_pos() API by ngxson · Pull Request #21851 · ggml-org/llama.cpp

ngxson · 2026-04-13T13:19:19Z

Overview

Add a new mtmd API: mtmd_image_tokens_get_decoder_pos()
Deprecate mtmd_image_tokens_get_nx/ny()

Additional information

Target support #21045

The mentioned PR proposes a new API mtmd_image_tokens_get_n_prefix which is not very flexible. mtmd_image_tokens_get_decoder_pos addresses this issue by providing non-linear t,x,y positions for each of the token.

In the case of falcon-ocr, mtmd_image_tokens_get_decoder_pos returns linear temporal t and x=0;y=0 for the first few text tokens, then switch to spatial position for image tokens:

<|image_cls|>: t=0,x=0,y=0
<|image_reg_1|>: t=1,x=0,y=0
...
<|image_reg_4|>: t=4,x=0,y=0
image patch 0: t=5,x=0,y=0
image patch 1: t=5,x=1,y=0
...
image patch N: t=5,x=nx,y=ny

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: no

ngxson · 2026-04-13T13:40:34Z

Tests seem ok:

[vision] OK:   ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
[vision] OK:   ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
[vision] OK:   ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
[vision] OK:   ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
[vision] OK:   THUDM/glm-edge-v-5b-gguf:Q4_K_M
[vision] OK:   second-state/Llava-v1.5-7B-GGUF:Q2_K
[vision] OK:   cjpais/llava-1.6-mistral-7b-gguf:Q3_K_M
[vision] OK:   ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
[vision] OK:   second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
[vision] OK:   openbmb/MiniCPM-V-2_6-gguf:Q2_K
[vision] OK:   openbmb/MiniCPM-o-2_6-gguf:Q4_0
[vision] OK:   bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
[vision] OK:   ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
[vision] OK:   ggml-org/InternVL2_5-1B-GGUF:Q8_0
[vision] OK:   ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0
[vision] OK:   ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[vision] OK:   ggml-org/LFM2-VL-450M-GGUF:Q8_0
[vision] OK:   ggml-org/granite-docling-258M-GGUF:Q8_0
[vision] OK:   ggml-org/LightOnOCR-1B-1025-GGUF:Q8_0
[vision] OK:   ggml-org/DeepSeek-OCR-GGUF:Q8_0
[vision] OK:   ggml-org/dots.ocr-GGUF:Q8_0
[vision] OK:   ggml-org/HunyuanOCR-GGUF:Q8_0
[vision] OK:   ggml-org/gemma-4-E2B-it-GGUF:Q8_0
[audio]  OK:   ggml-org/ultravox-v0_5-llama-3_2-1b-GGUF:Q8_0
[audio]  OK:   ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[audio]  OK:   ggml-org/Voxtral-Mini-3B-2507-GGUF:Q4_K_M
[audio]  OK:   ggml-org/LFM2-Audio-1.5B-GGUF:Q8_0
[audio]  OK:   ggml-org/gemma-4-E2B-it-GGUF:Q8_0

ngxson · 2026-04-14T13:38:57Z

pinging @ggml-org/maintainers for approval

* mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build

mtmd: add mtmd_image_tokens_get_decoder_pos() API

ffaf3e0

ngxson requested review from a team and ggerganov as code owners April 13, 2026 13:19

ngxson added 2 commits April 13, 2026 15:25

consistent naming

5ac7f49

fix build

1ed9381

ngxson requested a review from a team April 13, 2026 13:40

github-actions Bot added testing Everything test related examples labels Apr 13, 2026

ngxson mentioned this pull request Apr 13, 2026

feat: add video support to mtmd #20224

Closed

ServeurpersoCom approved these changes Apr 14, 2026

View reviewed changes

pwilkin approved these changes Apr 14, 2026

View reviewed changes

ngxson merged commit 707c0b7 into ggml-org:master Apr 14, 2026
41 of 47 checks passed

avirajBevli mentioned this pull request Apr 15, 2026

model: add Falcon OCR support #21045

Open

mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026

mtmd: add mtmd_image_tokens_get_decoder_pos() API (ggml-org#21851)

704706b

* mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026

mtmd: add mtmd_image_tokens_get_decoder_pos() API (ggml-org#21851)

1cc487c

* mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026

mtmd: add mtmd_image_tokens_get_decoder_pos() API (ggml-org#21851)

0eb7be5

* mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

mtmd: add mtmd_image_tokens_get_decoder_pos() API (ggml-org#21851)

319fe7d

* mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

mtmd: add mtmd_image_tokens_get_decoder_pos() API (ggml-org#21851)

623f208

* mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

mtmd: add mtmd_image_tokens_get_decoder_pos() API (ggml-org#21851)

da5af28

* mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

mtmd: add mtmd_image_tokens_get_decoder_pos() API (ggml-org#21851)

5d64b64

* mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

mtmd: add mtmd_image_tokens_get_decoder_pos() API (ggml-org#21851)

962c701

* mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd: add mtmd_image_tokens_get_decoder_pos() API#21851

mtmd: add mtmd_image_tokens_get_decoder_pos() API#21851
ngxson merged 3 commits into
ggml-org:masterfrom
ngxson:xsn/mtmd_get_decoder_pos_api

ngxson commented Apr 13, 2026 •

edited

Loading

Uh oh!

ngxson commented Apr 13, 2026

Uh oh!

ngxson commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ngxson commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

ngxson commented Apr 13, 2026

Uh oh!

ngxson commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented Apr 13, 2026 •

edited

Loading