server: add /v1/responses support by RodriMora · Pull Request #1184 · ikawrakow/ik_llama.cpp

RodriMora · 2026-01-23T17:30:35Z

Summary

add /v1/responses endpoint by converting Responses payloads to chat-completions and emitting Responses-style SSE events
track response IDs/state in server slots and serialize Responses output/final payloads
document /v1/responses usage and add responses compatibility scenarios

Testing

cmake --build build --target llama-server (works too with CUDA enabled on my system)
behave -i server.feature -n "OAI Responses Compatibility"

Notes

i tried to run the full server test suite but it does not run, embeddings scenario fails on main due to /embedding response shape. This seems to be failing too on the main branch, I can look into those in a separate pr

Curently as a draft as I do more test. Currently works via regular curl

curl http://localhost:5000/v1/responses \
                       -H "Content-Type: application/json" \
                       -H "Authorization: Bearer no-key" \
                       -d '{
                     "model": "MiniMax-M2.1",
                     "input": "Say hello in one short sentence."
                   }'

Based off ggml-org/llama.cpp#18486

Edit: Used an AI agent to help create the draft, as this seems like a good use case of an implementation for it as it doesn't involve a new model or architecture which seems to yield bad results. I didn't see any contributing rules against it. If that's the case feel free to close it.

firecoperana · 2026-01-23T18:29:05Z

I don't think these tests are well maintained. Just need to test in llama-server

ikawrakow · 2026-01-30T15:24:12Z

This is still draft?

RodriMora · 2026-02-02T11:15:04Z

Tool calling doesn't work perfectly but I checked mainline llama.cpp and it has the same problems, so it has feature parity in terms of regular text completion for streaming and non-streaming test:

Launch command:

./build/bin/llama-server \
                                                                  --model /mnt/llms/models/bartowski/MiniMaxAI_MiniMax-M2.1-GGUF/MiniMaxAI_MiniMax-M2.1-Q6_K/MiniMaxAI_MiniMax-M2.1-Q6_K-00001-of-00005.gguf \
                                                                  --alias "MiniMax-M2.1" \
                                                                  --ctx-size 128000 \
                                                                  -ger \
                                                                  -ngl 99 \
                                                                  --host 0.0.0.0 \
                                                                  --port 5000 \
                                                                  --jinja -np 4

Test:

curl -s http://192.168.10.115:5000/v1/responses \
                         -H "Content-Type: application/json" \
                         -H "Authorization: Bearer no-key" \
                         -d '{
                       "model": "MiniMax-M2.1",
                       "instructions": "You are a helpful assistant.",
                       "input": "Write a limerick about exceptions",
                       "max_output_tokens": 32
                     }'
{"completed_at":1770028662,"created_at":1770028662,"id":"resp_L1bEyCME6xhrIk2ePfniibiVoa6s6zNR","model":"MiniMax-M2.1","object":"response","output":[{"id":"rs_F0JvQ8MRkVid9hpBBHxxUzzj51zXrK1L","summary":[],"type":"reasoning","content":[{"text":"The user wants a limerick about exceptions. A limerick is a humorous five-line poem with an AABBA rhyme scheme, where lines 1","type":"reasoning_text"}],"encrypted_content":"","status":"completed"}],"status":"completed","usage":{"input_tokens":29,"output_tokens":32,"total_tokens":61}}

examples/server/server.cpp

firecoperana · 2026-02-03T01:00:03Z

The model name is defaulted to "gpt-3.5-turbo-0613" when I leave the model empty in the request message. Mainline returns the correct model name. Otherwise, they look good.

RodriMora · 2026-02-06T11:21:05Z

The model name is defaulted to "gpt-3.5-turbo-0613" when I leave the model empty in the request message. Mainline returns the correct model name. Otherwise, they look good.

Nice catch, thanks. If model is empty/missing now, it uses the loaded model name instead of defaulting to "gpt-3.5-turbo-0613".

firecoperana · 2026-02-06T14:11:29Z

Yeah, it's a bug not related to this PR. I will fix later.

firecoperana · 2026-02-14T03:46:02Z

It is fine to merge now.

server: add /v1/responses support

0bfb8bd

ikawrakow requested review from firecoperana January 24, 2026 07:10

RodriMora marked this pull request as ready for review February 2, 2026 11:02

firecoperana reviewed Feb 2, 2026

View reviewed changes

examples/server/server.cpp Show resolved Hide resolved

server: fix Responses API model fallback and SSE branching

79255d1

firecoperana approved these changes Feb 6, 2026

View reviewed changes

firecoperana mentioned this pull request Feb 7, 2026

server: fix model name missing in final response #1250

Merged

ikawrakow merged commit 102f77b into ikawrakow:main Feb 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: add /v1/responses support#1184

server: add /v1/responses support#1184
ikawrakow merged 2 commits intoikawrakow:mainfrom
RodriMora:feature/responses-api

RodriMora commented Jan 23, 2026 •

edited

Loading

Uh oh!

firecoperana commented Jan 23, 2026

Uh oh!

ikawrakow commented Jan 30, 2026

Uh oh!

RodriMora commented Feb 2, 2026

Uh oh!

Uh oh!

firecoperana commented Feb 3, 2026

Uh oh!

RodriMora commented Feb 6, 2026

Uh oh!

firecoperana commented Feb 6, 2026

Uh oh!

firecoperana commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RodriMora commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Notes

Uh oh!

firecoperana commented Jan 23, 2026

Uh oh!

ikawrakow commented Jan 30, 2026

Uh oh!

RodriMora commented Feb 2, 2026

Uh oh!

Uh oh!

firecoperana commented Feb 3, 2026

Uh oh!

RodriMora commented Feb 6, 2026

Uh oh!

firecoperana commented Feb 6, 2026

Uh oh!

firecoperana commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RodriMora commented Jan 23, 2026 •

edited

Loading