UPSTREAM PR #17136: common : introduce composable PEG parser combinators for chat parsing#359
UPSTREAM PR #17136: common : introduce composable PEG parser combinators for chat parsing#359
Conversation
…g functions for each parser
c217e38 to
a73de67
Compare
|
Explore the complete analysis inside the Version Insights |
This reverts commit 98cb7a6.
|
Explore the complete analysis inside the Version Insights Performance Analysis SummaryProject: llama.cpp Analysis ScopeThis PR introduces minimal code changes: one blank line in Performance ImpactModified Function:
Inference Path: No functions in the critical inference path (llama_decode, llama_encode, llama_tokenize) were modified. The chat parser function operates outside the token generation pipeline, handling post-processing of model outputs for tool calling and structured responses. Tokens Per Second: No impact. The modified function processes chat message parsing after token generation completes, not during the inference loop. Model throughput remains unchanged. Power Consumption: All binaries show changes within measurement noise (< 0.001%). The 5-7 ns overhead in a non-critical utility function has no measurable energy impact. Assessment: The visible code changes (whitespace and parameter validation) introduce negligible overhead. The 5-7 ns increase in |
Mirrored from ggml-org/llama.cpp#17136
Supporting new models requires implementing several features:
tool_choice = auto)response_format(reasoning models)For reasoning models, the grammar must include reasoning or performance degrades significantly.
The real challenge is that each model uses a different output format:
[get_weather(location="..."), ...])Currently, the grammar and parsing exist as separate functions, which works but feels a bit fragile. I believe we can unify the two by using parser combinators to compose a PEG parser. That way the grammar definition becomes the parser.
Proposed Solution
This PR introduces a generic PEG (Parsing Expression Grammar) parser to the common library, along with chat-specific extensions and a complete reference implementation for Qwen3-Coder.
I've noticed there's often a lag between when a model is supported by llama.cpp and when proper tool calling is fully implemented. This parser aims to close that gap by letting you define the grammar and parser at the same time, making it easier to add full tool calling support for new models.
Parsing Expression Grammars (PEG)
PEG parsers are straightforward to implement as recursive descent parsers. While recursive descent parsers are known for backtracking, the majority of model output can be parsed with minimal backtracking, making them practical for this use case.
Parser combinators allow us to compose complex parsers from simple, reusable building blocks. This creates a DSL that closely mimics the grammar itself.
Rather than defining both a grammar and parsing function, we can build a PEG parser that generates a compatible GBNF grammar (with exceptions) and parses model output.
Features
simple- Content with optional reasoningnative- Tool arguments as JSON objectsconstructed- Tool arguments as separate entities (XML or pseudo-functions)Examples
Parser for models that emit tool arguments as JSON
Parser for models that emit XML tags for each argument
Grammar generation
Implementation Details
The PEG parsers are implemented using
std::variantrather than traditional inheritance. This reduces boilerplate and leveragesstd::visitfor type-safety. I initially had an OOP implementation, but it started becoming quite cumbersome and this seems like the lesser evil of the two.using common_peg_parser_variant = std::variant< common_peg_epsilon_parser, common_peg_start_parser, common_peg_end_parser, common_peg_literal_parser, common_peg_sequence_parser, common_peg_choice_parser, common_peg_repetition_parser, common_peg_and_parser, common_peg_not_parser, common_peg_any_parser, common_peg_space_parser, common_peg_chars_parser, common_peg_json_string_parser, common_peg_until_parser, common_peg_schema_parser, common_peg_rule_parser, common_peg_ref_parser, common_peg_atomic_parser, common_peg_tag_parser >;Both parsers and AST nodes are allocated in arena structures to minimize memory allocations.
Each parser variant is wrapped in a
common_peg_parservalue type to produce a DSL for composing parser combinators.Parsers can return results
FAIL,SUCCESS, orNEED_MORE_INPUT. This is how the partial parsing is implemented. It does not raise an exception on partial parse likecommon/chat-parser.cpp, because partial parses are still valid for streaming.Additional Changes
common_chat_peg_parse()tocommon/chat.cppand chat formatsCOMMON_CHAT_FORMAT_PEG_(SIMPLE|NATIVE|CONSTRUCTED)to support models parsed by a PEG parser.common_chat_syntax.parser. I'm not a fan, but this seems the least intrusive method to integrate. I'll implement any alternative mechanisms if desired.common/unicode.{cpp,h}derived fromsrc/unicode.{cpp,h}. As I understand, we should not include headers fromsrc/, so I had to copy the implementation. It does deviate by returning a result rather than raising an exception.More comprehensive documentation is added in
docs/development/parsing.md. The tests are also fairly comprehensive,tests/test-chat-peg-parser.cpp.I know this is a big PR. I tried to minimize the implementation, while keeping enough to demonstrate value. #15703 shows community desire for something like this, although it doesn't have to be this implementation.