UPSTREAM PR #17136: common : implement parser combinators for chat parsing [WIP] by DajanaV · Pull Request #153 · auroralabs-loci/llama.cpp

DajanaV · 2025-11-10T03:47:43Z

Putting this out there as a proof-of-concept and to gather feedback. It is still a WIP.

Problem

Each model currently requires a custom parser to handle reasoning and tool calls. XML-based models are particularly challenging to parse. For example, Qwen3-Coder outputs:

<tool_call>
<function={name}>
<parameter={arg-name}>
{arg_value as json or string}
</parameter>
...
</function>
</tool_call>

Supporting this format requires the parser to know the type of each argument based on the provided schema.

Proposal

I propose using parser combinators to simplify parsing. We can compose parsers suitable for PEG grammars, which should handle model output effectively. This PR implements a proof-of-concept.

Here's an example from test/test-chat-parser-combinator.cpp:

auto parser = build_parser([](parser_builder & p) {
    auto space = p.add_rule("space", p.space());

    auto reasoning = p.add_rule("reasoning",
        p.literal("<think>") + space +
        p.group("reasoning-content",
            p.zero_or_more(~(space + p.literal("</think>")) + p.any())) +
        space + p.literal("</think>"));

    auto content = p.add_rule("content",
        p.group("content",
            p.zero_or_more(~(space + p.literal("<tool_call>")) + p.any())));

    auto ident_chars = p.add_rule("ident-chars", p.char_class("[a-zA-Z\\-_]"));
    auto json = p.add_json_rule("json");

    auto tool_call_name = p.add_rule("tool-call-name",
        p.literal("<name>") + space +
        p.group("tool-name", p.one_or_more(~p.literal("</name>") + ident_chars)) +
        space + p.literal("</name>"));

    auto tool_call_args = p.add_rule("tool-call-args",
        p.literal("<args>") + space +
        p.group("tool-args", json) +
        space + p.literal("</args>"));

    auto tool_call = p.add_rule("tool-call",
        p.literal("<tool_call>") + space +
        tool_call_name + space +
        tool_call_args + space +
        p.literal("</tool_call>"));

    return p.add_rule("root", reasoning + p.optional(content) + p.optional(tool_call));
});

std::string input = R"(<think>I need to call get_weather with city = New York</think><tool_call><name>get_weather</name><args>{"city": "New York"}</args></tool_call>)";
parser_context ctx{input, parse_cache()};

auto result = parser.parse(ctx);

assert_equals(true, result.is_success());
assert_equals(input.size(), result.end);
assert_equals(std::string("I need to call get_weather with city = New York"), *result.group("reasoning-content", ctx.input));
assert_equals(std::string("get_weather"), *result.group("tool-name", ctx.input));
assert_equals(std::string(R"({"city": "New York"})"), *result.group("tool-args", ctx.input));

The parser supports partial parsing for streaming output:

input = R"(<think>I need to call get_weather</think><tool_call><name>get_weather</name><args>{"cit)";
ctx = parser_context{input, parse_cache(), /* .is_input_complete = */ false};
result = parser.parse(ctx);

assert_equals(true, result.is_success());
assert_equals(std::string("I need to call get_weather"), *result.group("reasoning-content", ctx.input));
assert_equals(std::string("get_weather"), *result.group("tool-name", ctx.input));
assert_equals(std::string(R"({"cit)"), *result.group("tool-args", ctx.input));

The generated parse tree can be used to produce a GBNF grammar. The plan is to build the parser during chat param initialization and derive grammar rules with support for lazy triggers. This should support both tool_choice = auto and tool_choice = required.

Specifics

This PR implements parser combinators for PEG grammars. It uses caching to implement packrat parsing. The following are implemented:

parser literal(const std::string & literal);
parser sequence(std::initializer_list<parser> parsers);
parser choice(std::initializer_list<parser> parsers);
parser one_or_more(const parser & p);
parser zero_or_more(const parser & p);
parser optional(const parser & p);
parser negate(const parser & p);
parser any();
parser char_class(const std::string & classes);
parser group(const std::string & name, const parser & p);
parser rule(const std::string & name);
parser space();

The operators +, |, and ~ construct sequence, choice, and negate parsers respectively.

Drawbacks

Parsers that match content while excluding certain patterns, such as end tags, have a less obvious syntax. For example, p.zero_or_more(~(space + p.literal("</think>")) + p.any()) matches any character that isn't followed by </think>. This can be generalized through an excluding() parser
Packrat parsing requires caching all intermediate parse results, which introduces memory overhead proportional to input size and grammar complexity
Each model still requires a custom parser, though they share a common framework that simplifies implementation
Parser combinators may offer less flexibility for handling malformed model output compared to hand-written parsers, though constrained decoding should prevent malformed tool calls

To do

Basic implementation
Support parsing of partial input for streaming
Implement a JSON parser using parser combinators to replace the current healing system
Implement content() and reasoning() parsers to populate content/reasoning fields.
Implement tool(), tool_name(), tool_args(), as well as tool_arg_name() and tool_arg_value() for models such as Qwen3-Coder.
Construct GBNF grammar from the final parser
Implement json-schema-to-grammar support. The JSON parser will parse any JSON, but the generated GBNF grammar should still be constructed from the user-provided schema.
Allow building of the parser during chat param initialization.

common : implement parser combinators to simplify chat parsing

c822e73

DajanaV had a problem deploying to PROD__AL_DEMO November 10, 2025 03:47 — with GitHub Actions Failure

add virtual destructor to parser_base

e6153bb

DajanaV had a problem deploying to PROD__AL_DEMO November 10, 2025 04:39 — with GitHub Actions Failure

fix memory leak from circular references of rules

4ced999

DajanaV had a problem deploying to PROD__AL_DEMO November 10, 2025 06:43 — with GitHub Actions Failure

DajanaV force-pushed the main branch 24 times, most recently from 930eefd to db9060f Compare November 12, 2025 23:09

DajanaV force-pushed the main branch 7 times, most recently from 24733fb to 4b4bb7c Compare November 13, 2025 12:15

DajanaV closed this Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17136: common : implement parser combinators for chat parsing [WIP]#153

UPSTREAM PR #17136: common : implement parser combinators for chat parsing [WIP]#153
DajanaV wants to merge 3 commits intomainfrom
upstream-PR17136-branch_aldehir-parser-combinators

DajanaV commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DajanaV commented Nov 10, 2025

Problem

Proposal

Specifics

Drawbacks

To do

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants