Skip to content

grammar: increase MAX_REPETITION_THRESHOLD + make it configurable via envvar#21003

Open
pwilkin wants to merge 2 commits into
ggml-org:masterfrom
pwilkin:config-max-repetition-threshold
Open

grammar: increase MAX_REPETITION_THRESHOLD + make it configurable via envvar#21003
pwilkin wants to merge 2 commits into
ggml-org:masterfrom
pwilkin:config-max-repetition-threshold

Conversation

@pwilkin
Copy link
Copy Markdown
Member

@pwilkin pwilkin commented Mar 25, 2026

Overview

For very big tool calling environments (like OpenClaw) the current limit is insufficient. Even a bigger limit might be insufficient, so on top of increasing it I'm making it configurable.

Additional information

Together with #20961 should help with #20879

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, told Claude to add the envvar config

@pwilkin pwilkin requested a review from ggerganov as a code owner March 25, 2026 18:08
@pwilkin pwilkin requested review from aldehir, ggerganov and ngxson and removed request for ggerganov and ngxson March 25, 2026 18:08
@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 31, 2026

@CISC @ngxson or @ggerganov maybe care to help? Need 1 more approval :)

@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 31, 2026

Fixes #20867

@ggerganov
Copy link
Copy Markdown
Member

Should we wait to see if #21216 fixes the issue? AFAIU, if it works, we won't have to adjust the threshold.

@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 31, 2026

Should we wait to see if #21216 fixes the issue? AFAIU, if it works, we won't have to adjust the threshold.

No, people have requested the restriction be modifiable even before the explosion of OpenClaw models because they have some custom grammars that require lots of repetitions.

@aldehir
Copy link
Copy Markdown
Contributor

aldehir commented Apr 1, 2026

I think it's important we understand why it's exploding in the first place. Then we can make an informed decision.

Anyway, I fixed it in #21216. Need to refine the grammar a bit more, it's causing weird generations on tinyllama-function-call masquerading as Qwen3-Coder.

@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Apr 1, 2026

I think it's important we understand why it's exploding in the first place. Then we can make an informed decision.

Anyway, I fixed it in #21216. Need to refine the grammar a bit more, it's causing weird generations on tinyllama-function-call masquerading as Qwen3-Coder.

For the exploding stuff, yes, but people have called for this to be configurable way before the exploding stuff happened, I just didn't get to it. Some people have grammars that legitimately need more than 2k repetitions.

@NeuralNotwerk
Copy link
Copy Markdown

Data point in favor: this throw is also hit by hand-authored GBNF, not just JSON-Schema-derived grammars. We hit it writing rules like string ::= "\"" prose-char{200,2000} "\"" in /completion requests — silently rejected with grammar_error: null in the HTTP response (server logs parse: error parsing grammar: … to stderr only). Reproducer + analysis at #22314.

Even with the autoparser root-cause fixes from #21216 in place, consumers writing their own grammars still bump into the cap whenever a single field needs ≥2000 chars (long extracted quotes, summary paragraphs, structured report bodies, etc.). The current workaround — splitting one field into a list of bounded chunks — works but adds a render step and isn't obvious until you've debugged it. Either raising the threshold or surfacing it as a configurable knob would unblock that without affecting the chat-parser rationale that motivated the original cap.

Would also help if the parse-time throw were surfaced in the response body (currently swallowed → looks identical to "model ignored grammar"), but that's a separate change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants