UPSTREAM PR #17136: common : implement parser combinators for chat parsing [WIP]#153
Closed
UPSTREAM PR #17136: common : implement parser combinators for chat parsing [WIP]#153
Conversation
930eefd to
db9060f
Compare
24733fb to
4b4bb7c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mirrored from ggml-org/llama.cpp#17136
Putting this out there as a proof-of-concept and to gather feedback. It is still a WIP.
cc @pwilkin
Problem
Each model currently requires a custom parser to handle reasoning and tool calls. XML-based models are particularly challenging to parse. For example, Qwen3-Coder outputs:
Supporting this format requires the parser to know the type of each argument based on the provided schema.
Proposal
I propose using parser combinators to simplify parsing. We can compose parsers suitable for PEG grammars, which should handle model output effectively. This PR implements a proof-of-concept.
Here's an example from
test/test-chat-parser-combinator.cpp:The parser supports partial parsing for streaming output:
The generated parse tree can be used to produce a GBNF grammar. The plan is to build the parser during chat param initialization and derive grammar rules with support for lazy triggers. This should support both
tool_choice = autoandtool_choice = required.Specifics
This PR implements parser combinators for PEG grammars. It uses caching to implement packrat parsing. The following are implemented:
The operators
+,|, and~constructsequence,choice, andnegateparsers respectively.Drawbacks
Parsers that match content while excluding certain patterns, such as end tags, have a less obvious syntax. For example,
p.zero_or_more(~(space + p.literal("</think>")) + p.any())matches any character that isn't followed by</think>. This can be generalized through anexcluding()parserPackrat parsing requires caching all intermediate parse results, which introduces memory overhead proportional to input size and grammar complexity
Each model still requires a custom parser, though they share a common framework that simplifies implementation
Parser combinators may offer less flexibility for handling malformed model output compared to hand-written parsers, though constrained decoding should prevent malformed tool calls
To do
content()andreasoning()parsers to populate content/reasoning fields.tool(),tool_name(),tool_args(), as well astool_arg_name()andtool_arg_value()for models such as Qwen3-Coder.json-schema-to-grammarsupport. The JSON parser will parse any JSON, but the generated GBNF grammar should still be constructed from the user-provided schema.