webui: support video files as input by foldl · Pull Request #22830 · ggml-org/llama.cpp

foldl · 2026-05-08T07:49:05Z

Overview

Support adding video files as input. This can fix #20741.

Everything is done almost the same as audio files.

Note

To make video input work, at least 3 modifications are needed:

mtmd.
server.
webui.

This PR only updated WebUI.

Detailed Modifications

Add a menu item for uploading video files;
Show an icon in the chat input box (ChatAttachmentsListItemThumbnailFile) like ChatAttachmentsPreviewThumbnailStrip;
A new preview window for video files;
Video files are sent to the server through input_video (just like input_audio for audio files);
Two types of video files are defined (mp4 and ogg);
On Model Information window, video modality is shown as "Vision (Video)", and the vision modality is shown as "Vision (Image)";
Add a new bool field video to Modalities.

Test & Sceenshots

I have tested this with chatllm.cpp.

Additional information

Some findings or thoughts that are out of the scope of this PR.

How to properly show the modalities of image-only, and image-video?
Video files often contain audio. At present, when sending to servers, media types are inferred from file extension but not the menu item which is clicked by users.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO. I coded this all by myself (copied and modified some existing codes).

ngxson · 2026-05-08T17:44:54Z

IMO this can be an acceptable stop-gap solution. But just one concern is that we will eventually have native video support in mtmd, so we should make sure changes from this PR can be easily reverted when it happens.

foldl · 2026-05-08T22:39:30Z

This is for mtmd (see the issue #20741). Why this needs to be reverted when video support in mtmd is ready?

allozaur · 2026-05-16T08:37:18Z

Please rebase this on latest commit on master and solve conflicts.

ServeurpersoCom

LGTM, video has been added symmetrically to image and audio.
If refactor is required, all three can be maintained in the same way.

chigkim · 2026-05-17T09:39:16Z

Is that possible to submit a video file via OpenAI Chat Completion or Responses API? If so, would you mind showing an example snippet of json payload?
Thanks!

ServeurpersoCom · 2026-05-17T09:57:00Z

Is that possible to submit a video file via OpenAI Chat Completion or Responses API? If so, would you mind showing an example snippet of json payload? Thanks!

The PR adds an input_video content part that mirrors input_audio exactly. Based on the diff (chat.service.ts and api.d.ts), the WebUI sends something like this on /v1/chat/completions:

{
  "model": "your-model",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe this video." },
        {
          "type": "input_video",
          "input_video": {
            "data": "<base64 encoded video bytes>",
            "format": "mp4"
          }
        }
      ]
    }
  ]
}

format is one of "mp4", "ogg" or "auto". The server side needs to understand input_video for this to work end to end.

OPerepadia · 2026-05-19T08:51:31Z

Is it required to set a specific flag to enable video input?

I tried with Gemma 4 E4B but there is no option to upload a video file.

In the model info, Vision (Video) is not shown either

But E4B has video capability

unsloth/gemma-4-E4B-it-GGUF · Hugging Face

Extended Multimodalities – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models).

Starting the llama-server with this command

./build/bin/llama-server \
    -hf unsloth/gemma-4-E4B-it-GGUF:Q8_K_XL \
    -c 32768 \
    --n-gpu-layers auto \
    --mmproj-auto --mmproj-offload

foldl · 2026-05-19T09:11:54Z

@OPerepadia this is implemented ahead of mtmd. Once video support is ready in mtmd (and a little update in server), video support will works.

See also #20741.

chigkim · 2026-05-21T04:07:39Z

Thanks @ServeurpersoCom for the format.

However, if I send the following message, I get this error below.

Data has base64.b64encode video. I encoded the same way I encode image.

def encode_video(path):
    with open(path, "rb") as video_file:
        content = video_file.read()
    return base64.b64encode(content).decode("utf-8")

data = encode_video(path)

Here's a message.

{
  "role": "user",
  "content": [
    { "type": "text", "text": "Describe this video." },
    {
      "type": "input_video",
      "input_video": {
        "data": data,
        "format": "mp4"
      }
    }
  ]
}

srv operator(): got exception: {"error":{"code":400,"message":"unsupported content[].type","type":"invalid_request_error"}}

If I send without the block for video below, it works, but the model says it can't find video as expected.

    {
      "type": "input_video",
      "input_video": {
        "data": data,
        "format": "mp4"
      }
    }

I tried with both qwen-3.6 and gemma-4.
Thanks for your help!

ServeurpersoCom · 2026-05-21T05:07:01Z

However, if I send the following message, I get this error below.

That's exactly the request llama-ui sends, your JSON is correct. The piece that's missing is the backend: the server's content parser doesn't handle input_video yet, so it rejects it. The client half is in, the server half isn't, which is why even the llama-ui itself can't do video end to end right now.

foldl · 2026-05-21T05:10:59Z

@chigkim To make video input work, at least 3 modifications are needed:

mtmd.
server.
webui.

This PR only updated WebUI. If you want to see how the whole thing works, you can try it with chatllm.cpp (SmolVL, Gemma-4-E2B).

chigkim · 2026-05-21T11:01:00Z

Ah ok, for some reason I thought the entire workflow was ready. :)

* turboquant/HEAD: (82 commits) docs(readme): credit Google's original TurboQuant + explain the '+' docs(readme): fix turbo ladder ordering + cite K-compression paper docs(readme): reorder KV configs as a ladder + 'start light' guidance docs(readme): add Chronara to deployments + AtomicChat link docs: restructure README — professional layout, deployments, paper links docs: tighten README — add turbo2, missing features, paper links docs: keep upstream README, prepend fork-specific summary docs: replace upstream README with fork-specific summary fix(xxd.cmake): handle missing input file (not just empty) fix(ci): 4 cross-vendor -Werror failures + defensive xxd.cmake cmake : fix LLAMA_BUILD_UI logic (ggml-org#23190) fix(ggml-cuda): HIP nodiscard + MUSA cudaMemcpyToSymbol alias fix(turbo-quant): add forward declaration for turbo_cpu_fwht_inverse fix(metal): set ne12/ne13/r2/r3 function constants in mul_mm_tq_rotated pipeline webui: support video files as input (ggml-org#22830) server: (router) alloc tmp buffer on heap (ggml-org#23159) server: skip device enumeration in router mode to avoid creating CUDA primary context (ggml-org#23137) vulkan: removed duplicate #include <memory> in headers (ggml-org#23144) ui: Add request timeout for MCP tool calls (ggml-org#23138) sync : ggml ...

foldl requested a review from a team as a code owner May 8, 2026 07:49

github-actions Bot added server/webui examples server labels May 8, 2026

allozaur assigned ServeurpersoCom May 16, 2026

webui: support video files as input

eb04056

foldl force-pushed the webui-video-files branch from 7713550 to eb04056 Compare May 16, 2026 12:53

ServeurpersoCom approved these changes May 16, 2026

View reviewed changes

github-actions Bot added the server/ui label May 16, 2026

allozaur approved these changes May 17, 2026

View reviewed changes

allozaur merged commit 4f13cb7 into ggml-org:master May 17, 2026
6 checks passed

kj-c0d3s mentioned this pull request May 17, 2026

Eval bug: System Memory leak 9010 #22925

Open

kgrama pushed a commit to kgrama/llama.cpp that referenced this pull request May 19, 2026

webui: support video files as input (ggml-org#22830)

65285de

xxmustafacooTR pushed a commit to xxPlayground/llama-cpp-turboquant that referenced this pull request May 19, 2026

webui: support video files as input (ggml-org#22830)

ba62870

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026

webui: support video files as input (ggml-org#22830)

32a5e79

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request May 19, 2026

webui: support video files as input (ggml-org#22830)

409719a

baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026

webui: support video files as input (ggml-org#22830)

8f851ae

srossitto79 pushed a commit to srossitto79/llama.cpp that referenced this pull request May 23, 2026

webui: support video files as input (ggml-org#22830)

e99da3c

winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026

webui: support video files as input (ggml-org#22830)

771c485

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

webui: support video files as input (ggml-org#22830)

2d6f905

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webui: support video files as input#22830

webui: support video files as input#22830
allozaur merged 1 commit into
ggml-org:masterfrom
foldl:webui-video-files

foldl commented May 8, 2026 •

edited

Loading

Uh oh!

ngxson commented May 8, 2026

Uh oh!

foldl commented May 8, 2026

Uh oh!

allozaur commented May 16, 2026

Uh oh!

ServeurpersoCom left a comment

Uh oh!

Uh oh!

chigkim commented May 17, 2026

Uh oh!

ServeurpersoCom commented May 17, 2026

Uh oh!

OPerepadia commented May 19, 2026 •

edited

Loading

Uh oh!

foldl commented May 19, 2026

Uh oh!

chigkim commented May 21, 2026 •

edited

Loading

Uh oh!

ServeurpersoCom commented May 21, 2026

Uh oh!

foldl commented May 21, 2026

Uh oh!

chigkim commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

foldl commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Detailed Modifications

Test & Sceenshots

Additional information

Requirements

Uh oh!

ngxson commented May 8, 2026

Uh oh!

foldl commented May 8, 2026

Uh oh!

allozaur commented May 16, 2026

Uh oh!

ServeurpersoCom left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chigkim commented May 17, 2026

Uh oh!

ServeurpersoCom commented May 17, 2026

Uh oh!

OPerepadia commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

foldl commented May 19, 2026

Uh oh!

chigkim commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented May 21, 2026

Uh oh!

foldl commented May 21, 2026

Uh oh!

chigkim commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

foldl commented May 8, 2026 •

edited

Loading

OPerepadia commented May 19, 2026 •

edited

Loading

chigkim commented May 21, 2026 •

edited

Loading