Autoparser - complete refactoring of parser architecture#1376
Autoparser - complete refactoring of parser architecture#1376firecoperana wants to merge 2 commits intomainfrom
Conversation
342b06e to
8e38c07
Compare
|
Streaming for tool call is enabled. |
So is the PEG parser PR. Are we confident that there are no regressions? |
|
I see some new issues with auto parser mentioned in mainline. In PEG parser PR, PEG is just added without being used. It's mostly the new jinja template engine changes that matters. It's been added there for a while, so most bugs should have been fixed. |
|
These changes in mainline llama.cpp appear to work well. All models except MiroThinker (which was added by me) have already been tested by upstream developers. In principle, this should not introduce regressions, unless there are additional unmerged differences between mainline and ik_llama.cpp. I'll test this branch. |
|
This breaks MiroThinker, but seems an upstream bug log: |
|
There are more issues showing up in mainline. I will wait before they are full resolved. |
796aa23 to
59233a7
Compare
|
In examples/server/server-common.cpp, can we change This capability is from late 2024, so all the newer models should already support this. |
|
I could add command line arg to enable it. |
|
@firecoperana I'm planning to send PRs to upstream llama.cpp adding support for MiroThinker with the Refactored chat template. I have tested this patch will work. You can merge it. diff --git a/common/chat.cpp b/common/chat.cpp
index b799912a..7a76f8a9 100644
--- a/common/chat.cpp
+++ b/common/chat.cpp
@@ -1278,6 +1278,116 @@ static common_chat_params common_chat_params_init_kimi_k2(const common_chat_temp
return data;
}
+// MiroThinker - uses MCP style toolcalling
+static common_chat_params common_chat_params_init_mirothinker(const common_chat_template & tmpl,
+ const autoparser::templates_params & inputs) {
+ common_chat_params data;
+
+ data.prompt = common_chat_template_direct_apply(tmpl, inputs);
+ data.format = COMMON_CHAT_FORMAT_PEG_NATIVE;
+ data.supports_thinking = true;
+ data.thinking_start_tag = "<think>";
+ data.thinking_end_tag = "</think>";
+ data.preserved_tokens = {
+ "<think>",
+ "</think>",
+ };
+
+ auto has_tools = inputs.tools.is_array() && !inputs.tools.empty();
+ auto extract_reasoning = inputs.reasoning_format != COMMON_REASONING_FORMAT_NONE;
+ auto include_grammar = has_tools && inputs.tool_choice != COMMON_CHAT_TOOL_CHOICE_NONE;
+
+ auto parser = build_chat_peg_parser([&](common_chat_peg_builder & p) {
+ // MiroThinker Thinking format:
+ // - Reasoning: <think>{reasoning}</think>
+ // - Content: text after reasoning
+ // - Tool calls section:
+ // <use_mcp_tool>
+ // <server_name>{server_name}</server_name>
+ // <tool_name>{tool_name}</tool_name>
+ // <arguments>
+ // {json_args}
+ // </arguments>
+ // ...
+ // </use_mcp_tool>
+
+ auto reasoning = extract_reasoning ? p.optional("<think>" + p.reasoning(p.until("</think>")) + "</think>") : p.eps();
+
+ // Tool call markers
+ const std::string SECTION_BEGIN = "<use_mcp_tool>";
+ const std::string SECTION_END = "</use_mcp_tool>";
+ const std::string CALL_BEGIN = "<server_name>";
+ const std::string ARGS_BEGIN = "<arguments>";
+ const std::string CALL_END = "</arguments>";
+
+ auto end = p.end();
+
+ // Content only parser (no tools)
+ if (!has_tools || inputs.tool_choice == COMMON_CHAT_TOOL_CHOICE_NONE) {
+ return reasoning + p.content(p.rest()) + end;
+ }
+
+ // Build tool call parsers for each available function
+ // Function name format is: <tool_name>{tool_name}</tool_name>
+ // We need to match: {what_ever}</server_name>{spaces}<tool_name>{tool_name}</tool_name>
+ auto tool_choice = p.choice();
+ foreach_function(inputs.tools, [&](const json & tool) {
+ const auto & function = tool.at("function");
+ std::string name = function.at("name");
+ const auto & schema = function.at("parameters");
+
+ // Match: {what_ever}</server_name>{spaces}<tool_name>{tool_name}</tool_name>
+ auto tool_parser = p.tool(
+ p.tool_open(
+ p.until("</server_name>") +
+ p.literal("</server_name>") +
+ p.space() +
+ p.literal("<tool_name>") +
+ p.tool_name(p.literal(name)) +
+ p.literal(ARGS_BEGIN)
+ ) + p.space() +
+ p.tool_args(p.schema(p.json(), "tool-" + name + "-schema", schema)) +
+ p.space() + p.tool_close(p.literal(CALL_END))
+ );
+
+ tool_choice |= p.rule("tool-" + name, tool_parser);
+ });
+
+ // Tool calls section: <use_mcp_tool> tool_calls </use_mcp_tool>
+ auto min_calls = inputs.tool_choice == COMMON_CHAT_TOOL_CHOICE_REQUIRED ? 1 : 0;
+ auto max_calls = inputs.parallel_tool_calls ? -1 : 1;
+ auto tool_calls = p.trigger_rule("tool-calls",
+ p.literal(SECTION_BEGIN) + p.space() +
+ p.rule("tool-call", p.repeat(CALL_BEGIN + tool_choice, min_calls, max_calls) +
+ p.space() + p.literal(SECTION_END))
+ );
+
+ auto content_before_tools = p.content(p.until(SECTION_BEGIN));
+
+ return reasoning + content_before_tools + tool_calls + end;
+ });
+
+ data.parser = parser.save();
+
+ if (include_grammar) {
+ data.grammar_lazy = inputs.tool_choice == COMMON_CHAT_TOOL_CHOICE_AUTO;
+ data.grammar = build_grammar([&](const common_grammar_builder & builder) {
+ foreach_function(inputs.tools, [&](const json & tool) {
+ const auto & function = tool.at("function");
+ auto schema = function.at("parameters");
+ builder.resolve_refs(schema);
+ });
+ parser.build_grammar(builder, data.grammar_lazy);
+ });
+
+ data.grammar_triggers = {
+ { COMMON_GRAMMAR_TRIGGER_TYPE_WORD, "<use_mcp_tool>" }
+ };
+ }
+
+ return data;
+}
+
// LFM2 format:
// - Reasoning: <think>{reasoning}</think> (optional, only if enable_thinking is true)
// - Content: text after reasoning (optional)
@@ -1517,6 +1627,14 @@ static common_chat_params common_chat_templates_apply_jinja(const struct common_
return common_chat_params_init_kimi_k2(tmpl, params);
}
+ // MiroThinker - uses MCP style toolcalling <use_mcp_tool> ... </use_mcp_tool>
+ // Detection: template has "</use_mcp_tool>" and "</server_name>"
+ if (src.find("</use_mcp_tool>") != std::string::npos &&
+ src.find("</server_name>") != std::string::npos) {
+ LOG_DBG("Using specialized template: MiroThinker\n");
+ return common_chat_params_init_mirothinker(tmpl, params);
+ }
+
// LFM2 - uses <|tool_list_start|>/<|tool_list_end|> markers and <|tool_call_start|>[name(args)]<|tool_call_end|> format
// Detection: template has "<|tool_list_start|>" and "<|tool_list_end|>" markers
if (src.find("<|tool_list_start|>") != std::string::npos && |
59233a7 to
d0ea90e
Compare
Autoparser: add optional argument reshuffle capability
Autoparser: True streaming (#20177)
* Relax atomicity constraint for nicer, more pleasent, True Streaming parsing
* Whitespace
* Remove redundant atomics
Revert to OAI-compatible args (#20213)
* Revert to OAI-compatible args
* Apply workaround::func_args_not_string
Fix structured outputs (#20223)
* Fix structured outputs
* Update common/chat-auto-parser-generator.cpp
Co-authored-by: Aldehir Rojas <hello@alde.dev>
---------
Co-authored-by: Aldehir Rojas <hello@alde.dev>
Fix compile bug (#20203)
* Fix compile bug
* Update common/chat-auto-parser-helpers.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
# Conflicts:
# common/chat-auto-parser-helpers.cpp
common : gracefully handle incomplete output (#20191)
* common : handle incomplete UTF-8 at end of input in PEG parser
* cont : if reached end prematurely, emit needs_more_input to propagate partial output
* cont: refactor peg parse context to add lenient flag
* cont : remove partial flag, keep lenient flag
PEG parser for LFM2 (#20251)
* PEG parser for LFM2
* Simplify using python_value()
common: map developer role to system (#20215)
* Map developer role to system
* Simplify
common: consolidate PEG string parsers (#20263)
* common : consolidate PEG string parsers
* cont : fix json_string_content()
examples : fix empty items in json_schema_to_grammar.py [no ci] (#19968)
* Fix logic for retrieving schema items in `json_schema_to_grammar.py`
If `schema['items']` is `{}` and `prefixItems not in schema', as `{}` is Falsy, the original code here will raise an error.
I think if `schema['items']` is `{}`, them items should just be `{}`
* Apply suggestion from @CISC
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add tests for arrays with empty items
Add two unit tests to `tests/test-json-schema-to-grammar.cpp` that validate handling of arrays when 'items' is an empty schema and when 'prefixItems' is present alongside an empty 'items'. Both tests expect the same generated grammar, ensuring the JSON Schema->grammar conversion treats an empty 'items' schema (and the presence of 'prefixItems') correctly and covering this edge case.
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Reduce level of content parser warning message to avoid log spam on non-debug verbosity (#20347)
do not return if template parse failed
add arg to enable parallel tool call
common : fix incorrect uses of stoul (#20313)
# Conflicts:
# common/arg.cpp
# src/llama-grammar.cpp
examples : fix empty items in json_schema_to_grammar.py [no ci] (#19968)
* Fix logic for retrieving schema items in `json_schema_to_grammar.py`
If `schema['items']` is `{}` and `prefixItems not in schema', as `{}` is Falsy, the original code here will raise an error.
I think if `schema['items']` is `{}`, them items should just be `{}`
* Apply suggestion from @CISC
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add tests for arrays with empty items
Add two unit tests to `tests/test-json-schema-to-grammar.cpp` that validate handling of arrays when 'items' is an empty schema and when 'prefixItems' is present alongside an empty 'items'. Both tests expect the same generated grammar, ensuring the JSON Schema->grammar conversion treats an empty 'items' schema (and the presence of 'prefixItems') correctly and covering this edge case.
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
d0ea90e to
4fc7a15
Compare
|
Add support for MiroThinker with the new template engine as well as add |
Port the new Autoparser and optional argument reshuffle capability PR from mainline
ggml-org/llama.cpp#18675 and ggml-org/llama.cpp#20171
Continues #1369
@ikawrakow Can you merge the PEG parser PR first and then this one? This is a large PR and I don't want to squash them into one commit.