Skip to content

UPSTREAM PR #16603: llama-cli: add support for reasoning#283

Open
loci-dev wants to merge 21 commits intomainfrom
upstream-PR16603-branch_bandoti-llamacli-reasoning2
Open

UPSTREAM PR #16603: llama-cli: add support for reasoning#283
loci-dev wants to merge 21 commits intomainfrom
upstream-PR16603-branch_bandoti-llamacli-reasoning2

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#16603

This change adds a "partial formatter" that processes partially collected messages (like the server streaming logic) in order to render reasoning logic prior to EOG token arrival.

In addition, the chat_add_and_format lambda has been moved to a functor, and this now calls common_chat_templates_apply directly to allow more robust template-application options.

Logic has been put in place to suppress the system/prompt tags to clean up output.

Example output :

./build/bin/llama-cli.exe -m ./models/gpt-oss-20b-mxfp4.gguf -c 2048 -sys "You are a wizard" -p "please recite me a haiku about llamas" --jinja -co
image

@loci-dev loci-dev force-pushed the main branch 9 times, most recently from 462a79d to dd481ae Compare November 24, 2025 01:37
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 48ec5ba to 56f593b Compare December 2, 2025 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants