mtmd: add batching API by ngxson · Pull Request #24384 · ggml-org/llama.cpp

ngxson · 2026-06-09T23:06:24Z

Overview

Supersede #24300

Also fix #24380

Add a generic batching API to mtmd and wire it up to llama-server, the goal is to speed up llava-uhd-style models and at the same time, improve video processing speed

Current state:

llama-server can use it correctly
mtmd API implement is mock up, need to implement the proper logic

TODO:

add notion of max batch size in mtmd
add CLI argument for it
mtmd_batch_add_chunk should only accept input with same size
wire up mtmd_batch_encode to use the 4th batch dim, added via mtmd: build_vit batching #24352
blacklist / whitelist models that can support it --> maybe only support build_vit() models for now
maybe update mtmd-cli to reflect the usage --> not sure, maybe a follow-up PR is better

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: no

mtmd: add batching API

b62c305

github-actions Bot added examples server labels Jun 9, 2026

ngxson mentioned this pull request Jun 9, 2026

mtmd: DeepSeek-OCR multi-tile dynamic resolution batched encoding #24300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd: add batching API#24384

mtmd: add batching API#24384
ngxson wants to merge 1 commit into
ggml-org:masterfrom
ngxson:xsn/mtmd_batch_api

ngxson commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ngxson commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ngxson commented Jun 9, 2026 •

edited

Loading