Skip to content

UPSTREAM PR #19773: server : merge contiguous Responses input items into a single assistant message#1196

Open
loci-dev wants to merge 3 commits intomainfrom
loci/pr-19773-merge-response-items-to-chatcmpl
Open

UPSTREAM PR #19773: server : merge contiguous Responses input items into a single assistant message#1196
loci-dev wants to merge 3 commits intomainfrom
loci/pr-19773-merge-response-items-to-chatcmpl

Conversation

@loci-dev
Copy link

Note

Source pull request: ggml-org/llama.cpp#19773

The Responses API endpoint constructs separate chat completion messages for each input item, except for reasoning. This causes problems with many chat templates that expect content, reasoning, and tool calls to appear in a single assistant message.

This PR merges contiguous assistant inputs into a single message before passing them to templates.

This also preserves reasoning content that isn't coupled with a tool call. A few models, such as Ministral 3, support interleaved reasoning within regular messages. Models that don't typically handle pruning the reasoning in their own templates.

ref: ggml-org/llama.cpp#19765 (comment)
fixes #19513

cc @bfroemel

@loci-review
Copy link

loci-review bot commented Feb 21, 2026

No meaningful performance changes were detected across 111693 analyzed functions in the following binaries: build.bin.llama-tts, build.bin.libmtmd.so, build.bin.libllama.so, build.bin.llama-cvector-generator, build.bin.llama-tokenize, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-bench, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

@loci-dev loci-dev force-pushed the main branch 9 times, most recently from 61b4303 to ef246cc Compare March 1, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 8 times, most recently from 0db6c47 to 8019888 Compare March 8, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 7 times, most recently from 56aaa36 to 21147c2 Compare March 13, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants