Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

ExtReMLapin · 2025-08-11T16:25:21Z

There was this issue that we would not use tool_choice at required with reasoning because of forced tool call imposed by grammar.

Grammar now allows the model to think first.

Reasoning = big brain

Tool calling = strong arms

Now you can be very smart and very strong

Fixes #15247

ExtReMLapin · 2025-08-11T16:31:44Z

All am I sure is that it fixes issues with Qwen3 + reasoning (enabled or disabled)+ tool calling.
I fear it also has to be implemented for others reasoning parsers (non hermes)

CC hermes2 contributor : @ochafik

ExtReMLapin · 2025-08-24T13:47:59Z

Back at the office on tuesday, re-reading the PR i might reconsider the logic around

thinking_forced_open

…equired

…already opened grammar)

…r accepting piece:`

ExtReMLapin · 2025-08-25T19:52:49Z

For a future PR, the following functions needs the same kind of patch :

common_chat_params_init_granite : same as qwen3, but i'm not sure as the thinking tags are not always there
common_chat_params_init_command_r7b : "<|START_THINKING|>", "<|END_THINKING|>"
common_chat_params_init_deepseek_r1 : same as qwen3
common_chat_params_init_gpt_oss ???

ready for review @ggerganov

Not sure exactly who I should ping as ochafik seems to be busy this week

common/chat.cpp

…it, just disable it, don't GGML_ABORT

… it's own issue

…equired

ExtReMLapin · 2025-09-23T11:08:34Z

Not sure who I should ping

@slaren

ggerganov · 2025-09-24T07:20:55Z

Same comment as #15019 (comment)

This is a smaller change, so I can take a look and merge this, but prefer if we have someone who would take over this part of the code.

ExtReMLapin · 2025-09-24T07:38:57Z

Got it, thanks for keeping me updated !

ExtReMLapin · 2025-10-14T12:25:53Z

If you are being held hostage, blink twice @ochafik

…equired

ochafik

Thanks @ExtReMLapin ! Looks very promising :-)

Could you add some tests in test-chat.cpp (will need to force git add models/templates/qwen3-something.jinja, see models/templates/README.md )

common/chat.cpp

…equired

Co-authored-by: Olivier Chafik <[email protected]>

ExtReMLapin · 2025-11-04T15:03:02Z

So I added two sets of tests, one in test-chat.cpp (100% ai generated, but tested) and I'm less that meh-ly conviced of their point.

I would have much more liked some kind of grammar check that filter/accepts/prune the reasoning part.

And this is why I added another test inside test_tool_call.py of the server tool, which is in my opinion not the perfect place because chat.cpp is not only used in the server, but it's an e2e test and much more straighforward at checking if things works.

Test results before this PR :

===================================================================================================== short test summary info ======================================================================================================
FAILED unit/test_tool_call.py::test_required_tool_with_reasoning[tool0-unsloth/Qwen3-0.6B-GGUF:Q4_K_M-None-deepseek-CompletionMode.NORMAL] - AssertionError: Expected reasoning content, but got None
FAILED unit/test_tool_call.py::test_required_tool_with_reasoning[tool0-unsloth/Qwen3-0.6B-GGUF:Q4_K_M-None-deepseek-CompletionMode.STREAMED] - AssertionError: Expected reasoning content, but got None
FAILED unit/test_tool_call.py::test_required_tool_with_reasoning[tool1-unsloth/Qwen3-0.6B-GGUF:Q4_K_M-None-deepseek-CompletionMode.NORMAL] - AssertionError: Expected reasoning content, but got None
FAILED unit/test_tool_call.py::test_required_tool_with_reasoning[tool1-unsloth/Qwen3-0.6B-GGUF:Q4_K_M-None-deepseek-CompletionMode.STREAMED] - AssertionError: Expected reasoning content, but got None
================================================================================================ 4 failed, 463 deselected in 5.79s =================================================================================================

[Inferior 1 (process 936362) detached]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Test failed
Abandon (core dumped)

After :

unit/test_tool_call.py ....                                                                                                                                                                                                  [100%]

================================================================================================ 4 passed, 463 deselected in 6.07s =================================================================================================

[chat] All tests passed!

Update chat.cpp to support (at least) qwen3 + tool_choice = required

c6c4f7c

ExtReMLapin marked this pull request as ready for review August 11, 2025 16:31

ExtReMLapin mentioned this pull request Aug 11, 2025

Misc. bug: [chat] (hermes 2) Impossible de to use both tool_choice: "required" and reasoning #15247

Closed

ExtReMLapin added 2 commits August 11, 2025 23:21

refactored changes to follow string tern op

42937a5

fixing editorconfig-checker CI (tailing whitespace)

5796938

ExtReMLapin mentioned this pull request Aug 12, 2025

[Feature]: support tool and reasoning together vllm-project/vllm#14429

Closed

1 task

broadbit-hu mentioned this pull request Aug 21, 2025

Eval bug: Nondeterministic output with ROCm backend despite zero temperature #14727

Open

ExtReMLapin and others added 5 commits August 25, 2025 18:19

Merge branch 'ggml-org:master' into fix_qwen_reasoning_tool_calling_r…

de07a43

…equired

hermes 2 pro tool calling, better support for thinking (thinking tag …

79e4a7b

…already opened grammar)

qwen hermes tool calling : fixed grammar rules names

dbae921

fixed really weird grammar crash `Unexpected empty grammar stack afte…

86493dd

…r accepting piece:`

also apply the hotcrashfix here, just in case

bb5e352

ExtReMLapin commented Aug 25, 2025

View reviewed changes

common/chat.cpp Outdated Show resolved Hide resolved

Pierre F and others added 3 commits August 26, 2025 12:30

reverted changes done to grammar_lazy for hermes 2

6d5f561

if there is enable_thinking enabled but hermes model doesn't support …

352274e

…it, just disable it, don't GGML_ABORT

fix thinking-content eating closing think tag | ref ggml-org#8953

0e55830

ExtReMLapin marked this pull request as draft August 26, 2025 12:59

removed ? from grammar as it doesn't crash on linux, probably worth…

e62cd70

… it's own issue

ExtReMLapin mentioned this pull request Aug 27, 2025

Misc. bug: Tool calling CRASH : Unexpected empty grammar stack after accepting piece<tool_call> #15608

Closed

ExtReMLapin and others added 2 commits August 28, 2025 20:37

Merge branch 'ggml-org:master' into fix_qwen_reasoning_tool_calling_r…

2f28a1c

…equired

fixed crash with "auto" mode, trigger was missing

310701b

ExtReMLapin marked this pull request as ready for review August 29, 2025 12:49

This was referenced Aug 29, 2025

Model: Seed OSS thinking + tool call support #15552

Merged

feat: nemotron thinking & toolcalling support #15676

Merged

Merge branch 'ggml-org:master' into fix_qwen_reasoning_tool_calling_r…

5688afa

…equired

ggerganov requested a review from ochafik September 24, 2025 07:19

Merge branch 'ggml-org:master' into fix_qwen_reasoning_tool_calling_r…

dc75a57

…equired

ExtReMLapin mentioned this pull request Oct 20, 2025

[Bug]: Hybrid Attention models broken after switching to flashinfer 0.4 (tested on Granite 4.0 H, Qwen3-Next, Jamba-3B, Nemotron-H-8b) vllm-project/vllm#26936

Open

1 task

ochafik suggested changes Oct 30, 2025

View reviewed changes

common/chat.cpp Outdated Show resolved Hide resolved

ExtReMLapin and others added 4 commits November 3, 2025 10:37

Merge branch 'ggml-org:master' into fix_qwen_reasoning_tool_calling_r…

9381f69

…equired

Applied @ochafik 's suggested code after testing locally, no regression

73010a1

Co-authored-by: Olivier Chafik <[email protected]>

updated qwen3 0.6B chat template with official one

6441ad4

added tests

cc18ecc

ExtReMLapin requested a review from ggerganov as a code owner November 4, 2025 14:58

github-actions bot added testing Everything test related examples python python script changes server labels Nov 4, 2025

ExtReMLapin requested a review from ochafik November 4, 2025 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

ExtReMLapin commented Aug 11, 2025 •

edited

Loading

Uh oh!

ExtReMLapin commented Aug 11, 2025

Uh oh!

ExtReMLapin commented Aug 24, 2025

Uh oh!

ExtReMLapin commented Aug 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

ExtReMLapin commented Sep 23, 2025

Uh oh!

ggerganov commented Sep 24, 2025

Uh oh!

ExtReMLapin commented Sep 24, 2025

Uh oh!

ExtReMLapin commented Oct 14, 2025

Uh oh!

ochafik left a comment •

edited

Loading

Uh oh!

Uh oh!

ExtReMLapin commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

Are you sure you want to change the base?

Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required #15248

Conversation

ExtReMLapin commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ExtReMLapin commented Aug 11, 2025

Uh oh!

ExtReMLapin commented Aug 24, 2025

Uh oh!

ExtReMLapin commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ExtReMLapin commented Sep 23, 2025

Uh oh!

ggerganov commented Sep 24, 2025

Uh oh!

ExtReMLapin commented Sep 24, 2025

Uh oh!

ExtReMLapin commented Oct 14, 2025

Uh oh!

ochafik left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ExtReMLapin commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ExtReMLapin commented Aug 11, 2025 •

edited

Loading

ExtReMLapin commented Aug 25, 2025 •

edited

Loading

ochafik left a comment •

edited

Loading

ExtReMLapin commented Nov 4, 2025 •

edited

Loading