UPSTREAM PR #17136: common : introduce composable PEG parser combinators for chat parsing by loci-dev · Pull Request #359 · auroralabs-loci/llama.cpp

loci-dev · 2025-11-29T06:41:50Z

Supporting new models requires implementing several features:

Lazy grammar for tool calling (tool_choice = auto)
Full grammar for forced tool calls and response_format (reasoning models)
Parallel tool calls support
Parsing of reasoning and tool call outputs

For reasoning models, the grammar must include reasoning or performance degrades significantly.

The real challenge is that each model uses a different output format:

Harmony response output (gpt-oss)
XML with typed parameters (Qwen3-Coder, MiniMax M2)
- These models expect string arguments as raw content rather than JSON, which requires type awareness at parse time.
Pseudo-function call (LFM2 e.g. [get_weather(location="..."), ...])

Currently, the grammar and parsing exist as separate functions, which works but feels a bit fragile. I believe we can unify the two by using parser combinators to compose a PEG parser. That way the grammar definition becomes the parser.

Proposed Solution

This PR introduces a generic PEG (Parsing Expression Grammar) parser to the common library, along with chat-specific extensions and a complete reference implementation for Qwen3-Coder.

I've noticed there's often a lag between when a model is supported by llama.cpp and when proper tool calling is fully implemented. This parser aims to close that gap by letting you define the grammar and parser at the same time, making it easier to add full tool calling support for new models.

Parsing Expression Grammars (PEG)

PEG parsers are straightforward to implement as recursive descent parsers. While recursive descent parsers are known for backtracking, the majority of model output can be parsed with minimal backtracking, making them practical for this use case.

Parser combinators allow us to compose complex parsers from simple, reusable building blocks. This creates a DSL that closely mimics the grammar itself.

Rather than defining both a grammar and parsing function, we can build a PEG parser that generates a compatible GBNF grammar (with exceptions) and parses model output.

Features

Partial parsing for streaming input
Built-in JSON parsers for common patterns
Grammar generation for generating compatible GBNF grammars
AST generation with semantic tags for structured extraction
Three common AST shapes covering most model formats:
- simple - Content with optional reasoning
- native - Tool arguments as JSON objects
- constructed - Tool arguments as separate entities (XML or pseudo-functions)

Examples

Parser for models that emit tool arguments as JSON

auto parser = build_chat_peg_native_parser([&](common_chat_peg_native_builder & p) {
    // Build choice of available tools
    auto tool_choice = p.choice();
    for (const auto & tool : tools) {
        const auto & function = tool.at("function");
        std::string name = function.at("name");
        const auto & schema = function.at("parameters");

        auto tool_name = p.json_member("name", "\"" + p.literal(name) + "\"");
        auto tool_args = p.json_member("arguments", p.schema(p.json(), "tool-" + name + "-schema", schema));

        tool_choice |= p.rule("tool-" + name, "{" << tool_name << "," << tool_args << "}");
    }

    // Define tool call structure
    auto tool_call = p.trigger_rule("tool-call",
        p.sequence({
            p.literal("<tool_call>["),
            tool_choice,
            p.literal("]</tool_call>")
        })
    );

    return p.sequence({
        p.content(p.until("<tool_call>")),
        p.optional(tool_call),
        p.end()
    });
});

Parser for models that emit XML tags for each argument

auto parser = build_chat_peg_constructed_parser([&](common_chat_peg_constructed_builder & p) {
    auto location_arg = p.tool_arg(
        p.tool_arg_open("<parameter name=\"" + p.tool_arg_name(p.literal("location")) + "\">"),
        p.tool_arg_string_value(p.until("</parameter>")),
        p.tool_arg_close(p.literal("</parameter>"))
    );

    auto get_weather_tool = p.tool(p.sequence({
        p.tool_open("<function name=\"" + p.tool_name(p.literal("get_weather")) + "\">"),
        location_arg,
        p.tool_close(p.literal("</function>"))
    }));

    return p.sequence({
        p.content(p.until("<tool_call>")),
        p.literal("<tool_call>"),
        get_weather_tool,
        p.literal("</tool_call>"),
        p.end()
    });
});

Grammar generation

data.grammar = build_grammar([&](const common_grammar_builder & builder) {
    foreach_function(params.tools, [&](const json & fn) {
        builder.resolve_refs(fn.at("parameters"));
    });
    parser.build_grammar(builder, data.grammar_lazy);
});

Implementation Details

The PEG parsers are implemented using std::variant rather than traditional inheritance. This reduces boilerplate and leverages std::visit for type-safety. I initially had an OOP implementation, but it started becoming quite cumbersome and this seems like the lesser evil of the two.

using common_peg_parser_variant = std::variant<
    common_peg_epsilon_parser,
    common_peg_start_parser,
    common_peg_end_parser,
    common_peg_literal_parser,
    common_peg_sequence_parser,
    common_peg_choice_parser,
    common_peg_repetition_parser,
    common_peg_and_parser,
    common_peg_not_parser,
    common_peg_any_parser,
    common_peg_space_parser,
    common_peg_chars_parser,
    common_peg_json_string_parser,
    common_peg_until_parser,
    common_peg_schema_parser,
    common_peg_rule_parser,
    common_peg_ref_parser,
    common_peg_atomic_parser,
    common_peg_tag_parser
>;

Both parsers and AST nodes are allocated in arena structures to minimize memory allocations.

class common_peg_arena {
    std::vector<common_peg_parser_variant> parsers_;
    std::unordered_map<std::string, common_peg_parser_id> rules_;
    common_peg_parser_id root_ = COMMON_PEG_INVALID_PARSER_ID;
    ...

class common_peg_ast_arena {
    std::vector<common_peg_ast_node> nodes_;
    ...

Each parser variant is wrapped in a common_peg_parser value type to produce a DSL for composing parser combinators.

Parsers can return results FAIL, SUCCESS, or NEED_MORE_INPUT. This is how the partial parsing is implemented. It does not raise an exception on partial parse like common/chat-parser.cpp, because partial parses are still valid for streaming.

Additional Changes

Added common_chat_peg_parse() to common/chat.cpp and chat formats COMMON_CHAT_FORMAT_PEG_(SIMPLE|NATIVE|CONSTRUCTED) to support models parsed by a PEG parser.
- The parser must be passed from chat param initialization to the parse function. To do this, I currently serialize the parser to JSON and then deserialize to common_chat_syntax.parser. I'm not a fan, but this seems the least intrusive method to integrate. I'll implement any alternative mechanisms if desired.
Added common/unicode.{cpp,h} derived from src/unicode.{cpp,h}. As I understand, we should not include headers from src/, so I had to copy the implementation. It does deviate by returning a result rather than raising an exception.

More comprehensive documentation is added in docs/development/parsing.md. The tests are also fairly comprehensive, tests/test-chat-peg-parser.cpp.

I know this is a big PR. I tried to minimize the implementation, while keeping enough to demonstrate value. #15703 shows community desire for something like this, although it doesn't have to be this implementation.

…g functions for each parser

…nators

loci-review · 2025-12-17T16:36:41Z

Explore the complete analysis inside the Version Insights

This reverts commit 98cb7a6.

loci-review · 2025-12-18T11:48:47Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary

Project: llama.cpp
Versions Compared: 5b4d46ec (target) vs 56a6e1ed (base)

Analysis Scope

This PR introduces minimal code changes: one blank line in chat-parser.cpp and a 6-line validation check in server-common.cpp for the OpenAI-compatible API n parameter. The PR description references a large PEG parser implementation, but only these minor changes are visible in the current diff.

Performance Impact

Modified Function:

common_chat_msg_parser::str in chat-parser.cpp shows throughput increase of 5-7 ns across three binaries (llama-run: +7 ns, llama-cvector-generator: +6 ns, llama-tts: +5 ns)

Inference Path: No functions in the critical inference path (llama_decode, llama_encode, llama_tokenize) were modified. The chat parser function operates outside the token generation pipeline, handling post-processing of model outputs for tool calling and structured responses.

Tokens Per Second: No impact. The modified function processes chat message parsing after token generation completes, not during the inference loop. Model throughput remains unchanged.

Power Consumption: All binaries show changes within measurement noise (< 0.001%). The 5-7 ns overhead in a non-critical utility function has no measurable energy impact.

Assessment: The visible code changes (whitespace and parameter validation) introduce negligible overhead. The 5-7 ns increase in common_chat_msg_parser::str likely originates from changes not present in the current diff, possibly related to the PEG parser implementation mentioned in the PR description but not yet committed.

aldehir added 30 commits November 9, 2025 20:12

common : implement parser combinators to simplify chat parsing

c822e73

add virtual destructor to parser_base

e6153bb

fix memory leak from circular references of rules

4ced999

implement gbnf grammar building

2a9a13d

remove unused private variable

2286532

create a base visitor and implement id assignment as a visitor

3e6662f

fix const ref for grammar builder

76cf0b5

clean up types, friend classes, and class declarations

9c7b3e8

remove builder usage from until_parser

f02e2b0

Use a counter class to help assign rule ids

66cf038

cache everything

2b3caef

add short description for each parser

adac6ba

create a type for the root parser

0be2a93

implement repetition parser

31b386f

Make optional, one_or_more, and zero_or_more subclasses of repetition

ffb7a6f

improve context constructor

085404a

improve until parsing and add benchmarks

6bd9a95

remove cached() pattern, cache in parser_base with specialized parsin…

62656db

…g functions for each parser

improve json parsing performance to better match legacy parsing

18557f3

fix const auto * it for windows

f6aa608

move id assignment to classes instead of using a visitor

d58dace

create named rules in the command r7b example

20f9a1b

use '.' for any in GBNF

35b1640

fix parens around choices in gbnf grammar

bcb1c03

add convenience operators to turn strings to literals

4bed84d

add free-form operators for const char * to simplify defining literals

c02aaa6

simplify test case parser

8e82127

implement semantic actions

9685b69

remove groups in favor of actions and a scratchpad

d9a6229

add built in actions for common operations

117d908

loci-dev force-pushed the main branch 24 times, most recently from c217e38 to a73de67 Compare December 6, 2025 10:08

DajanaV and others added 3 commits December 17, 2025 14:09

Apply overlay (.github from overlay)

98cb7a6

run loci

b85a8c0

Merge branch 'main' into upstream-PR17136-branch_aldehir-parser-combi…

a09368d

…nators

Revert "Apply overlay (.github from overlay)"

aae43f4

This reverts commit 98cb7a6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17136: common : introduce composable PEG parser combinators for chat parsing#359

UPSTREAM PR #17136: common : introduce composable PEG parser combinators for chat parsing#359
loci-dev wants to merge 219 commits intomainfrom
upstream-PR17136-branch_aldehir-parser-combinators

loci-dev commented Nov 29, 2025

Uh oh!

loci-review bot commented Dec 17, 2025

Uh oh!

loci-review bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

loci-dev commented Nov 29, 2025

Proposed Solution

Parsing Expression Grammars (PEG)

Features

Examples

Implementation Details

Additional Changes

Uh oh!

loci-review bot commented Dec 17, 2025

Uh oh!

loci-review bot commented Dec 18, 2025

Performance Analysis Summary

Analysis Scope

Performance Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants