Improve tool call success rate: Allow arbitrary tool call parameter ordering for up to 8 params by florianbrede-ayet · Pull Request #13 · pwilkin/llama.cpp

florianbrede-ayet · 2026-03-05T11:19:46Z

Thanks @pwilkin for working on this branch, it's a nice improvement over the fixed (and often partially broken) chat templates.

However, I noticed tool call failures especially under mistral-vibe which I could reliably reproduce (read_file with offset and limit).

This seems to happen for any qwen35 model, including 27b dense (albeit with a higher chance of correct order).

I debugged your autoparser with claude and found that tool calls enforce a strict parameter order.

Qwen with his native XML tool calls has a very high chance to generate the parameter entities in a very particular order (also tested at different temperatures and penalties - with different seeds you can have a random chance of tool calls succeeding) which does not neccessarily match the expected order in the autoparser.

To limit the number of permutations, I set the "allow any order" to a hard cap of 8 parameters (fallback to sequential order otherwise).

Disclosure: Code was mostly written by Opus, I ran the tests and built it against ROCm and tested it with several hundred tool calls. I don't have a CI/CD setup locally.

Make sure to read the contributing guidelines before submitting a PR

… fix Apertus template to require proper OpenAI compatible paths to tools

…rver

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

…tent in template [no ci]

…d Kimi 2.5 Thinking PEG parser

…rdering for up to 8 params.

pwilkin · 2026-03-06T16:51:06Z

See ggml-org#20171

florianbrede-ayet · 2026-03-06T22:37:27Z

@pwilkin I built your branch and ran my toolcall-tester.py which I used before to verify my solution.
Your implementation passes 100/100 turns for all qwen35 >= 27b, qwen3codernext, minimax m2.5 and gpt-oss-120b (as reference), so no objection.

I still have the feeling that your narrower implementation could generate tool call failures for XML trained models.
But since none of my tests triggered any failure, it's more of an academic question.

I'll close this PR.

pwilkin · 2026-03-06T23:38:11Z

@florianbrede-ayet originally I had the same approach as you did with permutations, but the co-maintainers convinced me that it's needless complication in practice.

* FlashAttention (pwilkin#13) * Add inplace softmax * Move rms_norm to split row approach * Update debug for supports_op * clean up debug statements * neg f16xf32xip builds and runs, havent actually ran a model that uses neg kernel yet though * neg passes backend test * unary operators pass ggml tests * rms_norm double declaration bug atoned * abides by editor-config * removed vestigial files * fixed autoconfig * All operators (inlcluding xielu) working * removed unnecesarry checking if node->src[1] exists for unary operators * responded and dealt with PR comments * implemented REPL_Template support and removed bug in unary operators kernel * formatted embed wgsl and ggml-webgpu.cpp * Faster tensors (pwilkin#8) Add fast matrix and matrix/vector multiplication. * Use map for shader replacements instead of pair of strings * Wasm (pwilkin#9) * webgpu : fix build on emscripten * more debugging stuff * test-backend-ops: force single thread on wasm * fix single-thread case for init_tensor_uniform * use jspi * add pthread * test: remember to set n_thread for cpu backend * Add buffer label and enable dawn-specific toggles to turn off some checks * Intermediate state * Fast working f16/f32 vec4 * Working float fast mul mat * Clean up naming of mul_mat to match logical model, start work on q mul_mat * Setup for subgroup matrix mat mul * Basic working subgroup matrix * Working subgroup matrix tiling * Handle weirder sg matrix sizes (but still % sg matrix size) * Working start to gemv * working f16 accumulation with shared memory staging * Print out available subgroup matrix configurations * Vectorize dst stores for sg matrix shader * Gemv working scalar * Minor set_rows optimization (pwilkin#4) * updated optimization, fixed errors * non vectorized version now dispatches one thread per element * Simplify * Change logic for set_rows pipelines --------- Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Comment on dawn toggles * Working subgroup matrix code for (semi)generic sizes * Remove some comments * Cleanup code * Update dawn version and move to portable subgroup size * Try to fix new dawn release * Update subgroup size comment * Only check for subgroup matrix configs if they are supported * Add toggles for subgroup matrix/f16 support on nvidia+vulkan * Make row/col naming consistent * Refactor shared memory loading * Move sg matrix stores to correct file * Working q4_0 * Formatting * Work with emscripten builds * Fix test-backend-ops emscripten for f16/quantized types * Use emscripten memory64 to support get_memory * Add build flags and try ci --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> * Remove extra whitespace * Move wasm single-thread logic out of test-backend-ops for cpu backend * Disable multiple threads for emscripten single-thread builds in ggml_graph_plan * Refactored pipelines and workgroup calculations (pwilkin#10) * refactored pipelines * refactored workgroup calculation * removed commented out block of prior maps * Clean up ceiling division pattern --------- Co-authored-by: Neha Abbas <nehaabbas@eduroam-169-233-141-223.ucsc.edu> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Start work on flash attention * Shader structure set up (many bugs still) * debugging * Working first test * Working with head grouping, head sizes to 128, logit softcap, mask/sinks enabled, f32 * Generalize softmax to work with multiple subgroups, f16 accumulation, mask shared memory tiling * Start work on integrating pre-wgsl * Separate structs/initial shader compilation library into separate files * Work on compilation choices for flashattention * Work on subgroup matrix/tile size portability * subgroup size agnostic online softmax * Cleanups, quantization types * more cleanup * fix wasm build * Refactor flashattention to increase parallelism, use direct loads for KV in somce cases * Checkpoint * formatting * Update to account for default kv cache padding * formatting shader * Add workflow for ggml-ci webgpu * Try passing absolute path to dawn in ggml-ci * Avoid error on device destruction, add todos for proper cleanup * Fix unused warning * Forgot one parameter unused * Move some flashattn computation to f32 for correctness

pwilkin and others added 21 commits March 5, 2026 12:00

Autoparser - full single commit squish

9ebcb62

Remove redundant tests

0c2c660

Apply proper solution to function name detection from ggml-org#19785;…

2615501

… fix Apertus template to require proper OpenAI compatible paths to tools

Description-only schemas

fc53986

Fix schema conversion further / make parser failures not crash the se…

c418f4f

…rver

Update common/jinja/caps.cpp

09ecf53

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Update docs/autoparser.md

cd89cc5

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Final (?) cleanup

f5f0696

Zero might be a bit exaggerated :)

52d26d1

Refactor headers, change main class to "autoparser"

505bb27

Finally add *grammar verification* to the tests

3dd53c2

Fix unicode parsing

592d5d2

Remove single-quote stuff, revert peg-parser changes

0d9bc33

Fix whitespace changes

7216aed

Only apply Qwen workaround to old models that don't use reasoning_con…

86a7e81

…tent in template [no ci]

Wrap tool calls in atomic() to prevent partial parse errors, dedicate…

a60bf36

…d Kimi 2.5 Thinking PEG parser

Handle all the crazies in Kimi 2.5 - that model is insane...

14e2faf

Did I mention this model is crazy?

a0f893a

explicit <|im_end|> for Kimi

08218cb

Remove <|im_end|> from preserved tokens and rollback the parsing for it

42b105f

improve tool call success rate: allow arbitrary tool call parameter o…

96c45ab

…rdering for up to 8 params.

pwilkin force-pushed the autoparser branch from 42b105f to d21ec53 Compare March 5, 2026 22:57

pwilkin self-requested a review as a code owner March 5, 2026 22:57

pwilkin force-pushed the autoparser branch 2 times, most recently from c8f7024 to 2595444 Compare March 6, 2026 12:07

florianbrede-ayet closed this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve tool call success rate: Allow arbitrary tool call parameter ordering for up to 8 params#13

Improve tool call success rate: Allow arbitrary tool call parameter ordering for up to 8 params#13
florianbrede-ayet wants to merge 21 commits intopwilkin:autoparserfrom
florianbrede-ayet:autoparser

florianbrede-ayet commented Mar 5, 2026

Uh oh!

pwilkin commented Mar 6, 2026

Uh oh!

florianbrede-ayet commented Mar 6, 2026

Uh oh!

pwilkin commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

florianbrede-ayet commented Mar 5, 2026

Uh oh!

pwilkin commented Mar 6, 2026

Uh oh!

florianbrede-ayet commented Mar 6, 2026

Uh oh!

pwilkin commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants