(internal) llamacpp-llm: chat template tests with Qwen3 model#1036
Closed
mialso wants to merge 57 commits into
Closed
(internal) llamacpp-llm: chat template tests with Qwen3 model#1036mialso wants to merge 57 commits into
mialso wants to merge 57 commits into
Conversation
…MakeLists The new source file was added to the test CMakeLists but missing from the addon and cli_tool targets, causing an undefined symbol linker error on CI win64 builds. Made-with: Cursor
Made-with: Cursor
Reorder TextLlmContext members so threadpools are declared before llamaInit_. C++ destroys members in reverse declaration order, so llamaInit_ (which calls llama_free) now runs while threadpools are still alive, preventing use-after-free when llama_free accesses attached threadpool pointers. Made-with: Cursor
…t (linux)" This reverts commit 7d9c237.
…exit The ThreadPoolDeleter was doing ggml backend registry lookups during destruction, which is fragile during process teardown when the registry may already be torn down. Additionally, threadpools attached to llama_context could be freed before the context itself, causing use-after-free. Fix: cache ggml_threadpool_free fn pointer at construction time, and add explicit destructor that detaches threadpools before freeing them. Made-with: Cursor
…SEGV on exit" This reverts commit 4e66b38.
When a prefill run leaves nPast_ > 0 and the next run is a non-cached single-shot, the stale KV cache and dynamic-tools bookkeeping (nPastBeforeTools_, nConversationOnlyTokens_) caused token duplication and incorrect cache trimming. Clear state eagerly when shouldResetAfterInference is true and nPast_ is non-zero. Made-with: Cursor
…_end When tools_at_end is true and a session continues without explicit save between turns, old tool+response tokens remained in the KV cache. New tool tokens were appended, causing conflicting tool definitions. Add a guard in processPrompt() that trims from nPastBeforeTools_ to nPast_ before eval when stale tool tokens are detected. Includes new dynamic-tools integration tests covering changing tools, same tools, and single-shot regression. Made-with: Cursor
…e selection race The toolsAtEnd flag was set via setToolsAtEnd() after context creation, but getChatTemplateForModel() was called during construction — always seeing toolsAtEnd=0 and selecting the wrong Qwen3 template. Pass the flag through createContext() into TextLlmContext and MtmdLlmContext constructors so the correct template is selected from the start. Also restore the conditional template selection in ChatTemplateUtils that was previously hardcoded.
Add stripInternalBlocks() helper to testToolRemoval.js and benchToolsPlacement.js to remove <tool_call> and <think> blocks from assistant responses before including them in conversation history. Prevents model from pattern-matching on old tool calls and hallucinating removed tools. Also extend benchToolsPlacement to 20 turns and add HTML chart.
Implement all 10 reviewer requests from PR #706 (jesusmb1995, gianni-cor). | # | Reviewer | Request | Result | |---|---------|---------|--------| | R1 | @jesusmb1995 | Extract DynamicToolsState class | Done - new class in LlmContext.hpp with toolsAtEnd_, nConversationOnlyTokens_, nPastBeforeTools_, recordToolBoundary(), reset() | | R2 | @jesusmb1995 | Collapse 3 virtual methods into single dynamicToolsState() accessor | Done - removed setToolsAtEnd, getNPastBeforeTools, setNPastBeforeTools virtuals; added dynamicToolsState() non-virtual accessor on base class | | R3 | @gianni-cor | Remove redundant setToolsAtEnd() after createContext() | Done - removed the 4-line block in LlamaModel::init() | | R4 | @gianni-cor | Add assert: nConversationOnlyTokens_ <= inputTokens.size() | Done - added in TextLlmContext::tokenizeChat | | R5 | @gianni-cor | Reset nConversationOnlyTokens_ in TextLlmContext::resetState | Done - both contexts now call dynamicToolsState().reset() which resets both values | | R6 | @gianni-cor | Guard tools_at_end for non-Qwen3 models | Done - architecture check after config parsing, logs warning and disables flag | | R7 | @gianni-cor | Fix off-by-A trim error (disable add_generation_prompt) | Done - both TextLlmContext and MtmdLlmContext save/restore add_generation_prompt=false during no-tools tokenization | | R8 | @gianni-cor | Add cold-start reset in MtmdLlmContext::tokenizeChat | Done - dynamicToolsState().reset() added at cold-start path | | R9 | @gianni-cor | Cap firstMsgTokens_ after post-eval trim | Done - setFirstMsgTokens(getNPast()) if inflated after trim | | R10 | @gianni-cor | Remove duplicate toolsAtEnd_ from LlamaModel | Done - runtime code in processPromptImpl queries dynamicToolsState().toolsAtEnd() instead of state_->toolsAtEnd_ | Made-with: Cursor
…le source of truth in DynamicToolsState Made-with: Cursor
…e cleanup Made-with: Cursor
…match test" This reverts commit 181b98a.
This reverts commit 27e6a5c.
Made-with: Cursor
N3: Save/restore inputs.use_jinja around no-tools tokenization to
prevent getPrompt() Jinja fallback from corrupting the flag.
N4: Remove dead Jinja template variables (ns.multi_step_tool,
ns.last_query_index) from Qwen3ToolsDynamicTemplate.
N5: Add missing assert(conversationOnlyTokens <= totalTokens) in
MtmdLlmContext::tokenizeChat, matching TextLlmContext.
N6: Document Qwen3-only model support in tools-at-end.md.
N7: Merge duplicate if(nPast_==0 && !isCacheLoaded) blocks in
TextLlmContext::tokenizeChat.
N8: Remove unnecessary save/restore of inputs.tools and
inputs.add_generation_prompt (locals not read after).
Also: merge main into feature branch, move dynamic-tools changelog
to separate 0.13.1 entry.
Made-with: Cursor
Made-with: Cursor
The file packages/ocr-onnx/big_and_clear_watermarks.png was unintentionally staged during merge conflict resolution. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Contributor
Author
|
closing in flavor of #1121 since this has unsigned commits |
Contributor
Tier-based Approval Status |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
asana task
Description
improve tests, as per comment