Skip to content

(internal) llamacpp-llm: chat template tests with Qwen3 model#1036

Closed
mialso wants to merge 57 commits into
mainfrom
internal/llm-dynamic-tools-tests-improved
Closed

(internal) llamacpp-llm: chat template tests with Qwen3 model#1036
mialso wants to merge 57 commits into
mainfrom
internal/llm-dynamic-tools-tests-improved

Conversation

@mialso

@mialso mialso commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

asana task

Description

improve tests, as per comment

  • test_chat_template_utils.cpp: cover the isQwen3Model branch logic that was previously untested

mialso and others added 30 commits March 13, 2026 09:20
…MakeLists

The new source file was added to the test CMakeLists but missing from the addon and cli_tool targets, causing an undefined symbol linker error on CI win64 builds.

Made-with: Cursor
Reorder TextLlmContext members so threadpools are declared before llamaInit_. C++ destroys members in reverse declaration order, so llamaInit_ (which calls llama_free) now runs while threadpools are still alive, preventing use-after-free when llama_free accesses attached threadpool pointers.

Made-with: Cursor
…exit

The ThreadPoolDeleter was doing ggml backend registry lookups during destruction, which is fragile during process teardown when the registry may already be torn down. Additionally, threadpools attached to llama_context could be freed before the context itself, causing use-after-free. Fix: cache ggml_threadpool_free fn pointer at construction time, and add explicit destructor that detaches threadpools before freeing them.

Made-with: Cursor
When a prefill run leaves nPast_ > 0 and the next run is a non-cached single-shot, the stale KV cache and dynamic-tools bookkeeping (nPastBeforeTools_, nConversationOnlyTokens_) caused token duplication and incorrect cache trimming. Clear state eagerly when shouldResetAfterInference is true and nPast_ is non-zero.

Made-with: Cursor
…_end

When tools_at_end is true and a session continues without explicit save between turns, old tool+response tokens remained in the KV cache. New tool tokens were appended, causing conflicting tool definitions.

Add a guard in processPrompt() that trims from nPastBeforeTools_ to nPast_ before eval when stale tool tokens are detected. Includes new dynamic-tools integration tests covering changing tools, same tools, and single-shot regression.

Made-with: Cursor
…e selection race

The toolsAtEnd flag was set via setToolsAtEnd() after context creation,
but getChatTemplateForModel() was called during construction — always
seeing toolsAtEnd=0 and selecting the wrong Qwen3 template.

Pass the flag through createContext() into TextLlmContext and
MtmdLlmContext constructors so the correct template is selected
from the start. Also restore the conditional template selection
in ChatTemplateUtils that was previously hardcoded.
Add stripInternalBlocks() helper to testToolRemoval.js and
benchToolsPlacement.js to remove <tool_call> and <think> blocks
from assistant responses before including them in conversation
history. Prevents model from pattern-matching on old tool calls
and hallucinating removed tools.

Also extend benchToolsPlacement to 20 turns and add HTML chart.
olyasir and others added 25 commits March 16, 2026 15:10
Implement all 10 reviewer requests from PR #706 (jesusmb1995, gianni-cor).

| # | Reviewer | Request | Result |
|---|---------|---------|--------|
| R1 | @jesusmb1995 | Extract DynamicToolsState class | Done - new class in LlmContext.hpp with toolsAtEnd_, nConversationOnlyTokens_, nPastBeforeTools_, recordToolBoundary(), reset() |
| R2 | @jesusmb1995 | Collapse 3 virtual methods into single dynamicToolsState() accessor | Done - removed setToolsAtEnd, getNPastBeforeTools, setNPastBeforeTools virtuals; added dynamicToolsState() non-virtual accessor on base class |
| R3 | @gianni-cor | Remove redundant setToolsAtEnd() after createContext() | Done - removed the 4-line block in LlamaModel::init() |
| R4 | @gianni-cor | Add assert: nConversationOnlyTokens_ <= inputTokens.size() | Done - added in TextLlmContext::tokenizeChat |
| R5 | @gianni-cor | Reset nConversationOnlyTokens_ in TextLlmContext::resetState | Done - both contexts now call dynamicToolsState().reset() which resets both values |
| R6 | @gianni-cor | Guard tools_at_end for non-Qwen3 models | Done - architecture check after config parsing, logs warning and disables flag |
| R7 | @gianni-cor | Fix off-by-A trim error (disable add_generation_prompt) | Done - both TextLlmContext and MtmdLlmContext save/restore add_generation_prompt=false during no-tools tokenization |
| R8 | @gianni-cor | Add cold-start reset in MtmdLlmContext::tokenizeChat | Done - dynamicToolsState().reset() added at cold-start path |
| R9 | @gianni-cor | Cap firstMsgTokens_ after post-eval trim | Done - setFirstMsgTokens(getNPast()) if inflated after trim |
| R10 | @gianni-cor | Remove duplicate toolsAtEnd_ from LlamaModel | Done - runtime code in processPromptImpl queries dynamicToolsState().toolsAtEnd() instead of state_->toolsAtEnd_ |

Made-with: Cursor
…le source of truth in DynamicToolsState

Made-with: Cursor
N3: Save/restore inputs.use_jinja around no-tools tokenization to
    prevent getPrompt() Jinja fallback from corrupting the flag.
N4: Remove dead Jinja template variables (ns.multi_step_tool,
    ns.last_query_index) from Qwen3ToolsDynamicTemplate.
N5: Add missing assert(conversationOnlyTokens <= totalTokens) in
    MtmdLlmContext::tokenizeChat, matching TextLlmContext.
N6: Document Qwen3-only model support in tools-at-end.md.
N7: Merge duplicate if(nPast_==0 && !isCacheLoaded) blocks in
    TextLlmContext::tokenizeChat.
N8: Remove unnecessary save/restore of inputs.tools and
    inputs.add_generation_prompt (locals not read after).

Also: merge main into feature branch, move dynamic-tools changelog
to separate 0.13.1 entry.

Made-with: Cursor
The file packages/ocr-onnx/big_and_clear_watermarks.png was unintentionally staged during merge conflict resolution.

Made-with: Cursor
Base automatically changed from feature/llm-dynamic-tools to main March 21, 2026 09:09
@mialso

mialso commented Mar 24, 2026

Copy link
Copy Markdown
Contributor Author

closing in flavor of #1121 since this has unsigned commits

@mialso mialso closed this Mar 24, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ❌ PENDING

**Requirements:**
- 1 Team Member approval ❌ (0/1)
- 1 Team Lead OR Management approval ❌ (0/1)



---
*This comment is automatically updated when reviews change.*

@mialso mialso deleted the internal/llm-dynamic-tools-tests-improved branch April 15, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants