UPSTREAM PR #17816: llama : add token matching support to llama-grammar#468
UPSTREAM PR #17816: llama : add token matching support to llama-grammar#468
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #468 Token Matching SupportOverviewPR #468 introduces token matching support to llama-grammar, adding 327 lines across 4 files. The implementation enables grammars to match tokens by ID using Key FindingsGrammar Processing Functions: The most significant changes occur in The Structural Changes: The Inference Impact: Core inference functions Power Consumption: Binary-level analysis shows Code Implementation: The changes implement legitimate functionality for token-level grammar matching. The |
a2add8a to
6d9272a
Compare
adf9533 to
7103504
Compare
|
Explore the complete analysis inside the Version Insights |
This reverts commit 98cb7a6.
|
Explore the complete analysis inside the Version Insights Performance Analysis SummaryPR #468: Token Matching Support in Grammar This PR adds a single line to Performance ImpactThe modification affects grammar-constrained generation, not the core inference pipeline. Analysis of the top 10 functions by response time change shows variations in STL container operations and grammar processing functions, but none are in the tokenization or inference hot path. Core Inference Functions: No changes detected in Grammar Processing: The function Power Consumption: The binary Affected Functions: The performance variations observed are in STL accessors ( Tokens Per Second: No impact expected. The change does not modify |
Mirrored from ggml-org/llama.cpp#17816
Implementation of idea by @ngxson: ggml-org/llama.cpp#17750 (comment)
cc: @pwilkin @aviallon
Problem
The
llama-grammarimplementation doesn't have a way to accept tokens directly, which creates a few problems:<|end|>) and the tokenized form<|, end, |>that may occur in content.( [^<] | "<" [^|] | "<|" [^e] | ... | "<|end|" [^>] )*to match chunks of characters that don't accumulate to the desired delimiter (<|end|>).Proposed Solution
Borrowing some ideas from llguidance, you can define a token by id
<[id]>or as raw token text<token>if encased in</>. I'm leaving out support for token id ranges/alternates since I don't see an immediate need for it.You can negate by prefixing the token with
!, e.g.!<|end|>.Example (gpt-oss)
By token id:
That's not very readable, but useful for tokens not wrapped in
</>. If they are, you can use them directly:Use Case: Reasoning Budget Enforcement
Assuming the model's vocab has unique tokens for its thinking tags, adopting a reasoning budget is fairly trivial via grammar:
Notes:
gpt-ossmay be a poor example since it hasreasoning_effort, but the budget approach works pretty well.To Do
llama-grammartrigger_patternsby temporarily turning all token rules to literals (char rules).grammars/AI Disclosure: I used an LLM at the start to help dissect the code, but its understanding had some holes. I didn't use an LLM to write the code.