mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API by ngxson · Pull Request #23913 · ggml-org/llama.cpp

ngxson · 2026-05-30T14:21:08Z

Overview

Tokenizing / preprocessing multimodal input is much more CPU-intensive than tokenizing, because it runs on single thread. This can be wasteful if the user just want to count the number of tokens occupied by an image/audio chunk, without actually using the underlay data.

This PR allow create a "placeholder" bitmap that only contains the dimension, no data buffer will be allocated. Preprocessing ops (i.e. image manipulation) will skip processing it

New server APIs are also added to demonstrate this (support counting both tools input tokens and multimodal input tokens)

/v1/chat/completions/input_tokens
/v1/responses/input_tokens

In next PRs:

Skip preprocess audio
Move std/mean f32 to cgraph
Update for places where process_mtmd_prompt is being used for counting tokens

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: no

ngxson · 2026-05-30T16:27:53Z

not quite sure why the server linux CI fails (same on master branch), but I ran the test locally on my mac and it passes 100%

aldehir · 2026-05-31T00:42:03Z

ERROR unit/test_basic.py::test_server_start_simple - RuntimeError: Server process died with return code -4

Looks like the server process is getting killed with SIGILL (Illegal operation), -N maps to signal N according to https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode

       Signal        x86/ARM     Alpha/   MIPS   PARISC   Notes
                   most others   SPARC
       ─────────────────────────────────────────────────────────────────
       SIGHUP           1           1       1       1
       SIGINT           2           2       2       2
       SIGQUIT          3           3       3       3
       SIGILL           4           4       4       4

CISC · 2026-05-31T11:23:42Z

Looks like the server process is getting killed with SIGILL (Illegal operation), -N maps to signal N according to https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode

Yep, this is a ccache issue, I've deleted the caches and all is fine now.

ngxson · 2026-06-01T09:27:09Z

gentle ping @ggml-org/llama-server , I kinda need this to continue with other fixes

ngxson · 2026-06-05T22:57:34Z

@ggml-org/maintainers could someone give approval(s) for this PR please 🙏 need this one to unblock #21858

I've already tested it (tested with vision + audio input), plus unit tests are added so I guess that should ok

the mtmd/test.sh fails on some cases but that's the same result on master, that should be fixed in another PR:

[vision] OK:   ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
[vision] OK:   ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
[vision] OK:   ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
[vision] OK:   ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
[vision] OK:   THUDM/glm-edge-v-5b-gguf:Q4_K_M
[vision] OK:   second-state/Llava-v1.5-7B-GGUF:Q2_K
[vision] OK:   cjpais/llava-1.6-mistral-7b-gguf:Q3_K_M
[vision] FAIL: ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
[vision] OK:   second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
[vision] OK:   openbmb/MiniCPM-V-2_6-gguf:Q2_K
[vision] OK:   openbmb/MiniCPM-o-2_6-gguf:Q4_0
[vision] OK:   bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
[vision] OK:   ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
[vision] OK:   ggml-org/InternVL2_5-1B-GGUF:Q8_0
[vision] OK:   ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0
[vision] OK:   ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[vision] OK:   ggml-org/LFM2-VL-450M-GGUF:Q8_0
[vision] OK:   ggml-org/granite-docling-258M-GGUF:Q8_0
[vision] OK:   ggml-org/LightOnOCR-1B-1025-GGUF:Q8_0
[vision] OK:   ggml-org/DeepSeek-OCR-GGUF:Q8_0
[vision] OK:   ggml-org/dots.ocr-GGUF:Q8_0
[vision] OK:   ggml-org/HunyuanOCR-GGUF:Q8_0
[vision] FAIL: ggml-org/HunyuanVL-4B-GGUF:Q8_0
[vision] OK:   ggml-org/gemma-4-E2B-it-GGUF:Q8_0
[audio]  FAIL: ggml-org/ultravox-v0_5-llama-3_2-1b-GGUF:Q8_0
[audio]  FAIL: ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[audio]  FAIL: ggml-org/Voxtral-Mini-3B-2507-GGUF:Q4_K_M
[audio]  OK:   ggml-org/LFM2-Audio-1.5B-GGUF:Q8_0
[audio]  OK:   ggml-org/gemma-4-E2B-it-GGUF:Q8_0
[audio]  OK:   ggml-org/Qwen3-ASR-0.6B-GGUF:Q8_0

mtmd: add "placeholder bitmap" for counting tokens w/o preprocessing

924bbab

github-actions Bot added the examples label May 30, 2026

ngxson added 6 commits May 30, 2026 16:25

fast path skip preproc for placeholder

064c2d7

fix build

d1a098d

correct the api

58171a6

add server endpoint + tests

f1503cf

add object name

aec9eff

update docs

035d72c

github-actions Bot added python python script changes server labels May 30, 2026

ngxson changed the title ~~mtmd: add "placeholder bitmap" for counting tokens w/o preprocessing~~ mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API May 30, 2026

ngxson added 2 commits May 30, 2026 17:50

add proxy handling

3cb2d8c

fix build

447e418

ngxson marked this pull request as ready for review May 30, 2026 16:27

ngxson requested review from a team as code owners May 30, 2026 16:27

ngxson mentioned this pull request May 30, 2026

Xsn/mtmd placeholder chunks ngxson/llama.cpp#106

Open

ngxson added 4 commits May 30, 2026 18:58

fix audio input path

8f67dfb

use is_placeholder in process_mtmd_prompt()

8351aaf

nits

1945165

nits (2)

c72ef5c

aldehir mentioned this pull request May 31, 2026

ui: fix ETag truncation with MSVC compiler #23917

Merged

docs: clarify chat/completions/input_tokens is not official

53e3e88

ngxson mentioned this pull request Jun 5, 2026

mtmd: support input sequence of images (initial video support) #21858

Draft

3 tasks

ngxson added 2 commits June 6, 2026 00:18

Merge branch 'master' into xsn/mtmd_placeholder_chunks

acca080

fix merge problem

5b0cfdf

aldehir approved these changes Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API#23913

mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API#23913
ngxson wants to merge 16 commits into
masterfrom
xsn/mtmd_placeholder_chunks

ngxson commented May 30, 2026 •

edited

Loading

Uh oh!

ngxson commented May 30, 2026

Uh oh!

aldehir commented May 31, 2026 •

edited

Loading

Uh oh!

CISC commented May 31, 2026

Uh oh!

ngxson commented Jun 1, 2026

Uh oh!

ngxson commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ngxson commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

ngxson commented May 30, 2026

Uh oh!

aldehir commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented May 31, 2026

Uh oh!

ngxson commented Jun 1, 2026

Uh oh!

ngxson commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented May 30, 2026 •

edited

Loading

aldehir commented May 31, 2026 •

edited

Loading