Skip to content

mtmd: add batching API#24384

Draft
ngxson wants to merge 1 commit into
ggml-org:masterfrom
ngxson:xsn/mtmd_batch_api
Draft

mtmd: add batching API#24384
ngxson wants to merge 1 commit into
ggml-org:masterfrom
ngxson:xsn/mtmd_batch_api

Conversation

@ngxson

@ngxson ngxson commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Overview

Supersede #24300

Also fix #24380

Add a generic batching API to mtmd and wire it up to llama-server, the goal is to speed up llava-uhd-style models and at the same time, improve video processing speed

Current state:

  • llama-server can use it correctly
  • mtmd API implement is mock up, need to implement the proper logic

TODO:

  • add notion of max batch size in mtmd
  • add CLI argument for it
  • mtmd_batch_add_chunk should only accept input with same size
  • wire up mtmd_batch_encode to use the 4th batch dim, added via mtmd: build_vit batching #24352
  • blacklist / whitelist models that can support it --> maybe only support build_vit() models for now
  • maybe update mtmd-cli to reflect the usage --> not sure, maybe a follow-up PR is better

Requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid re-encoding mtmd chunk when prefill MTP context

1 participant