grammar: increase MAX_REPETITION_THRESHOLD + make it configurable via envvar by pwilkin · Pull Request #21003 · ggml-org/llama.cpp

pwilkin · 2026-03-25T18:08:10Z

Overview

For very big tool calling environments (like OpenClaw) the current limit is insufficient. Even a bigger limit might be insufficient, so on top of increasing it I'm making it configurable.

Additional information

Together with #20961 should help with #20879

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, told Claude to add the envvar config

… envvar

pwilkin · 2026-03-31T14:46:08Z

@CISC @ngxson or @ggerganov maybe care to help? Need 1 more approval :)

pwilkin · 2026-03-31T14:46:46Z

Fixes #20867

ggerganov · 2026-03-31T18:31:32Z

Should we wait to see if #21216 fixes the issue? AFAIU, if it works, we won't have to adjust the threshold.

pwilkin · 2026-03-31T20:05:48Z

Should we wait to see if #21216 fixes the issue? AFAIU, if it works, we won't have to adjust the threshold.

No, people have requested the restriction be modifiable even before the explosion of OpenClaw models because they have some custom grammars that require lots of repetitions.

aldehir · 2026-04-01T03:33:42Z

I think it's important we understand why it's exploding in the first place. Then we can make an informed decision.

Anyway, I fixed it in #21216. ~~Need to refine the grammar a bit more, it's causing weird generations on tinyllama-function-call masquerading as Qwen3-Coder.~~

pwilkin · 2026-04-01T13:49:19Z

I think it's important we understand why it's exploding in the first place. Then we can make an informed decision.

Anyway, I fixed it in #21216. ~~Need to refine the grammar a bit more, it's causing weird generations on tinyllama-function-call masquerading as Qwen3-Coder.~~

For the exploding stuff, yes, but people have called for this to be configurable way before the exploding stuff happened, I just didn't get to it. Some people have grammars that legitimately need more than 2k repetitions.

NeuralNotwerk · 2026-05-10T14:17:34Z

Data point in favor: this throw is also hit by hand-authored GBNF, not just JSON-Schema-derived grammars. We hit it writing rules like string ::= "\"" prose-char{200,2000} "\"" in /completion requests — silently rejected with grammar_error: null in the HTTP response (server logs parse: error parsing grammar: … to stderr only). Reproducer + analysis at #22314.

Even with the autoparser root-cause fixes from #21216 in place, consumers writing their own grammars still bump into the cap whenever a single field needs ≥2000 chars (long extracted quotes, summary paragraphs, structured report bodies, etc.). The current workaround — splitting one field into a list of bounded chunks — works but adds a render step and isn't obvious until you've debugged it. Either raising the threshold or surfacing it as a configurable knob would unblock that without affecting the chat-parser rationale that motivated the original cap.

Would also help if the parse-time throw were surfaced in the response body (currently swallowed → looks identical to "model ignored grammar"), but that's a separate change.

grammar: increase MAX_REPETITION_THRESHOLD + make it configurable via…

8100810

… envvar

pwilkin requested a review from ggerganov as a code owner March 25, 2026 18:08

pwilkin requested review from aldehir, ggerganov and ngxson and removed request for ggerganov and ngxson March 25, 2026 18:08

make repetition test more evil

ae274f1

github-actions Bot added the testing Everything test related label Mar 25, 2026

pwilkin mentioned this pull request Mar 29, 2026

grammar: make MAX_REPETITION_THRESHOLD configurable via env var #21139

Open

pwilkin requested a review from CISC March 29, 2026 01:40

aldehir approved these changes Mar 29, 2026

View reviewed changes

loci-dev mentioned this pull request Mar 29, 2026

UPSTREAM PR #21003: grammar: increase MAX_REPETITION_THRESHOLD + make it configurable via envvar auroralabs-loci/llama.cpp#1314

Open

ggerganov mentioned this pull request Mar 31, 2026

Misc. bug: many grammar errors when called by OpenClaw #20879

Closed

gucasbrg mentioned this pull request Mar 31, 2026

json_schema with $ref/$defs silently fails: grammar rule count exceeds MAX_REPETITION_THRESHOLD #21228

Closed

This was referenced Apr 8, 2026

Eval bug: Parse errors increased with build 8515 (062cca58f) and current 8660 (d00685831) #21013

Closed

common : simplify autoparser tagged parser rules #21216

Merged

NeuralNotwerk mentioned this pull request May 10, 2026

Misc. bug: JSON Schema to GBNF grammar fails with tools that use PCRE shorthands #22314

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grammar: increase MAX_REPETITION_THRESHOLD + make it configurable via envvar#21003

grammar: increase MAX_REPETITION_THRESHOLD + make it configurable via envvar#21003
pwilkin wants to merge 2 commits into
ggml-org:masterfrom
pwilkin:config-max-repetition-threshold

pwilkin commented Mar 25, 2026

Uh oh!

pwilkin commented Mar 31, 2026

Uh oh!

pwilkin commented Mar 31, 2026

Uh oh!

ggerganov commented Mar 31, 2026

Uh oh!

pwilkin commented Mar 31, 2026

Uh oh!

aldehir commented Apr 1, 2026 •

edited

Loading

Uh oh!

pwilkin commented Apr 1, 2026

Uh oh!

NeuralNotwerk commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pwilkin commented Mar 25, 2026

Overview

Additional information

Requirements

Uh oh!

pwilkin commented Mar 31, 2026

Uh oh!

pwilkin commented Mar 31, 2026

Uh oh!

ggerganov commented Mar 31, 2026

Uh oh!

pwilkin commented Mar 31, 2026

Uh oh!

aldehir commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Apr 1, 2026

Uh oh!

NeuralNotwerk commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aldehir commented Apr 1, 2026 •

edited

Loading