Skip to content

UPSTREAM PR #16603: llama-cli: add support for reasoning#135

Closed
DajanaV wants to merge 19 commits intomainfrom
upstream-PR16603-branch_bandoti-llamacli-reasoning2
Closed

UPSTREAM PR #16603: llama-cli: add support for reasoning#135
DajanaV wants to merge 19 commits intomainfrom
upstream-PR16603-branch_bandoti-llamacli-reasoning2

Conversation

@DajanaV
Copy link
Collaborator

@DajanaV DajanaV commented Nov 8, 2025

Mirrored from ggml-org/llama.cpp#16603

This change adds a "partial formatter" that processes partially collected messages (like the server streaming logic) in order to render reasoning logic prior to EOG token arrival.

In addition, the chat_add_and_format lambda has been moved to a functor, and this now calls common_chat_templates_apply directly to allow more robust template-application options.

Logic has been put in place to suppress the system/prompt tags to clean up output.

Example output :

./build/bin/llama-cli.exe -m ./models/gpt-oss-20b-mxfp4.gguf -c 2048 -sys "You are a wizard" -p "please recite me a haiku about llamas" --jinja -co
image

@DajanaV DajanaV force-pushed the main branch 9 times, most recently from 96c975c to aa2fc28 Compare November 9, 2025 16:08
@DajanaV DajanaV force-pushed the main branch 29 times, most recently from 24733fb to 4b4bb7c Compare November 13, 2025 12:15
@DajanaV DajanaV closed this Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants