UPSTREAM PR #19773: server : merge contiguous Responses input items into a single assistant message by loci-dev · Pull Request #1196 · auroralabs-loci/llama.cpp

loci-dev · 2026-02-21T02:59:52Z

Note

Source pull request: ggml-org/llama.cpp#19773

The Responses API endpoint constructs separate chat completion messages for each input item, except for reasoning. This causes problems with many chat templates that expect content, reasoning, and tool calls to appear in a single assistant message.

This PR merges contiguous assistant inputs into a single message before passing them to templates.

This also preserves reasoning content that isn't coupled with a tool call. A few models, such as Ministral 3, support interleaved reasoning within regular messages. Models that don't typically handle pruning the reasoning in their own templates.

ref: ggml-org/llama.cpp#19765 (comment)
fixes #19513

cc @bfroemel

loci-review · 2026-02-21T04:15:52Z

No meaningful performance changes were detected across 111693 analyzed functions in the following binaries: build.bin.llama-tts, build.bin.libmtmd.so, build.bin.libllama.so, build.bin.llama-cvector-generator, build.bin.llama-tokenize, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-bench, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

aldehir added 3 commits February 20, 2026 18:13

server : merge contiguous input items into a single assistant message

070b68d

cont : simplify tool call msg

38dc926

cont : reduce and combine content

304a0e9

loci-dev temporarily deployed to PROD__AL_DEMO February 21, 2026 02:59 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 9 times, most recently from 61b4303 to ef246cc Compare March 1, 2026 02:17

loci-dev force-pushed the main branch 8 times, most recently from 0db6c47 to 8019888 Compare March 8, 2026 02:17

loci-dev force-pushed the main branch 7 times, most recently from 56aaa36 to 21147c2 Compare March 13, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19773: server : merge contiguous Responses input items into a single assistant message#1196

UPSTREAM PR #19773: server : merge contiguous Responses input items into a single assistant message#1196
loci-dev wants to merge 3 commits intomainfrom
loci/pr-19773-merge-response-items-to-chatcmpl

loci-dev commented Feb 21, 2026

Uh oh!

loci-review bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Feb 21, 2026

Uh oh!

loci-review bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants